subject:"Re\: \[PATCH 1\/2, x86\] Add palignr support for AVX2."

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-02 Thread Uros Bizjak

On Wed, Oct 1, 2014 at 10:43 PM, Jakub Jelinek ja...@redhat.com wrote:

  For PR62128, IMHO the right fix is attached.  Note, this is again covered
  in vshuf-*.c tests (test 22 in both vshuf-v32*.c and vshuf-v16*.c).
  With that attached patch, pr62128.c (aka test_22 in vshuf-v32qi.c), changes:
  -   vpshufb .LC0(%rip), %ymm0, %ymm1
  -   vpshufb .LC1(%rip), %ymm0, %ymm0
  -   vpermq  $78, %ymm1, %ymm1
  -   vpor%ymm1, %ymm0, %ymm0
  +   vpermq  $78, %ymm0, %ymm1
  +   vpalignr$1, %ymm0, %ymm1, %ymm0
  ret
 
  2014-10-01  Jakub Jelinek  ja...@redhat.com
 
PR target/62128
* config/i386/i386.c (expand_vec_perm_1): Try expand_vec_perm_palignr
if it expands to a single insn only.
(expand_vec_perm_palignr): Add SINGLE_INSN_ONLY_P argument.  If true,
fail unless in_order is true.  Add forward declaration.
(expand_vec_perm_vperm2f128): Fix up comment about which permutation
is useful for one_operand_p.
(ix86_expand_vec_perm_const_1): Adjust expand_vec_perm_palignr caller.

 Now bootstrapped/regtested on x86_64-linux and i686-linux (and additionally
 tested also with --target_board=unix/-mavx2), ok for trunk?


OK.

Thanks,
Uros.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Evgeny Stupachenko

Getting back to initial patch, is it ok?
It fixes gcc.target/i386/pr52252-atom.c for AVX2 make check.
X86 bootstrap is also ok.

2014-10-01  Evgeny Stupachenko  evstu...@gmail.com

* config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
PALINGR instruction.
* config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try
for AVX2.

On Wed, Sep 17, 2014 at 9:26 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 The test in pr62128 is exactly TEST 22 from
 gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
 or not.
 Resubmitting patch looks good as current mail thread is already too 
 complicated.

 On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 It fix gcc.target/i386/pr52252-atom.c in core-avx2 make check and pr62128.


 I suggest you resubmit the patch as a bug fix for pr62128 with
 testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.


 --
 H.J.


palignr_avx2.patch
Description: Binary data

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Uros Bizjak

On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Getting back to initial patch, is it ok?

IMO, we should start with Jakub's proposed patch [1]

[1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html

Uros.

 It fixes gcc.target/i386/pr52252-atom.c for AVX2 make check.
 X86 bootstrap is also ok.

 2014-10-01  Evgeny Stupachenko  evstu...@gmail.com

 * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
 PALINGR instruction.
 * config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try
 for AVX2.

 On Wed, Sep 17, 2014 at 9:26 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 The test in pr62128 is exactly TEST 22 from
 gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
 or not.
 Resubmitting patch looks good as current mail thread is already too 
 complicated.

 On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 It fix gcc.target/i386/pr52252-atom.c in core-avx2 make check and 
 pr62128.


 I suggest you resubmit the patch as a bug fix for pr62128 with
 testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.


 --
 H.J.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Jakub Jelinek

On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote:
 On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
  Getting back to initial patch, is it ok?
 
 IMO, we should start with Jakub's proposed patch [1]
 
 [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html

That doesn't compile, will post a new version; got interrupted when
I found that in
GCC_TEST_RUN_EXPENSIVE=1 make check-gcc 
RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
one test is miscompiled even with unpatched compiler, debugging that now.

That said, my patch will not do anything about the
case Mark mentioned { 1, 2, 3, ..., 31, 0 } permutation,
for that we can't do vpalignr followed by vpshufb or similar,
but need to do some permutation first and then vpalignr on
the result.  So it would need a new routine.  It is still a 2
insn permutation, not 6, and needs different algorithm, so sharing
the same routine for that is undesirable.

Jakub

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Evgeny Stupachenko

This is not a fix for the case { 1, 2, 3, ..., 31, 0 }. This patch is
an extension of expand_vec_perm_palignr on AVX2 case.
For the case { 1, 2, 3, ..., 31, 0 } we should use separate
function/pattern. I like split as it is similar to already handled SSE
byte rotate {1,2,3,.,15, 0}: ssse3_palignrmode_perm and AVX2
split: *avx_vperm_broadcast_mode.

On Wed, Oct 1, 2014 at 2:35 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote:
 On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
  Getting back to initial patch, is it ok?

 IMO, we should start with Jakub's proposed patch [1]

 [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html

 That doesn't compile, will post a new version; got interrupted when
 I found that in
 GCC_TEST_RUN_EXPENSIVE=1 make check-gcc 
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 one test is miscompiled even with unpatched compiler, debugging that now.

 That said, my patch will not do anything about the
 case Mark mentioned { 1, 2, 3, ..., 31, 0 } permutation,
 for that we can't do vpalignr followed by vpshufb or similar,
 but need to do some permutation first and then vpalignr on
 the result.  So it would need a new routine.  It is still a 2
 insn permutation, not 6, and needs different algorithm, so sharing
 the same routine for that is undesirable.

 Jakub

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Jakub Jelinek

On Wed, Oct 01, 2014 at 12:35:14PM +0200, Jakub Jelinek wrote:
 On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote:
  On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko evstu...@gmail.com 
  wrote:
   Getting back to initial patch, is it ok?
  
  IMO, we should start with Jakub's proposed patch [1]
  
  [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html
 
 That doesn't compile, will post a new version; got interrupted when
 I found that in
 GCC_TEST_RUN_EXPENSIVE=1 make check-gcc 
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 one test is miscompiled even with unpatched compiler, debugging that now.

Let's start with the bugfix.  The || doesn't make any sense, and we really
want to fill in 4 bits (0, 1, 4, 5) of the immediate, not just two, anyway.
valid_perm_using_mode_p (V2TImode, d) should already guarantee that
it is possible to permutate it as V2TI, so all we care about are the
values of d-perm[0] and d-perm[nelt / 2], but we care not just which
lane it is, but also which operand (src1 or src2).

Tested with
GCC_TEST_RUN_EXPENSIVE=1 make check-gcc 
RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
Ok for trunk/4.9/4.8?

2014-10-01  Jakub Jelinek  ja...@redhat.com

PR target/63428
* config/i386/i386.c (expand_vec_perm_pshufb): Fix up rperm[0]
argument to avx2_permv2ti.

* gcc.dg/torture/vshuf-4.inc: Move test 122 from EXPTESTS
to test 24 in TESTS.

--- gcc/config/i386/i386.c.jj   2014-10-01 11:22:09.0 +0200
+++ gcc/config/i386/i386.c  2014-10-01 13:00:30.835809031 +0200
@@ -42961,8 +42961,8 @@ expand_vec_perm_pshufb (struct expand_ve
  op0 = gen_lowpart (V4DImode, d-op0);
  op1 = gen_lowpart (V4DImode, d-op1);
  rperm[0]
-   = GEN_INT (((d-perm[0]  (nelt / 2)) ? 1 : 0)
-  || ((d-perm[nelt / 2]  (nelt / 2)) ? 2 : 0));
+   = GEN_INT ((d-perm[0] / (nelt / 2))
+  | ((d-perm[nelt / 2] / (nelt / 2)) * 16));
  emit_insn (gen_avx2_permv2ti (target, op0, op1, rperm[0]));
  if (target != d-target)
emit_move_insn (d-target, gen_lowpart (d-vmode, target));
--- gcc/testsuite/gcc.dg/torture/vshuf-4.inc.jj 2012-03-20 08:51:25.0 
+0100
+++ gcc/testsuite/gcc.dg/torture/vshuf-4.inc2014-10-01 13:23:07.163090945 
+0200
@@ -23,7 +23,8 @@ T (19,3, 2, 1, 0) \
 T (20, 0, 4, 1, 5) \
 T (21, 2, 6, 3, 7) \
 T (22, 1, 2, 3, 0) \
-T (23, 2, 1, 0, 3)
+T (23, 2, 1, 0, 3) \
+T (24, 2, 5, 6, 3)
 #define EXPTESTS \
 T (116,1, 2, 4, 3) \
 T (117,7, 3, 3, 0) \
@@ -31,7 +32,6 @@ T (118,   5, 3, 2, 7) \
 T (119,0, 3, 5, 6) \
 T (120,0, 0, 1, 5) \
 T (121,4, 6, 2, 1) \
-T (122,2, 5, 6, 3) \
 T (123,4, 6, 3, 2) \
 T (124,4, 7, 5, 6) \
 T (125,0, 4, 2, 4) \


Jakub

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Uros Bizjak

On Wed, Oct 1, 2014 at 1:38 PM, Jakub Jelinek ja...@redhat.com wrote:

 That doesn't compile, will post a new version; got interrupted when
 I found that in
 GCC_TEST_RUN_EXPENSIVE=1 make check-gcc 
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 one test is miscompiled even with unpatched compiler, debugging that now.

 Let's start with the bugfix.  The || doesn't make any sense, and we really
 want to fill in 4 bits (0, 1, 4, 5) of the immediate, not just two, anyway.
 valid_perm_using_mode_p (V2TImode, d) should already guarantee that
 it is possible to permutate it as V2TI, so all we care about are the
 values of d-perm[0] and d-perm[nelt / 2], but we care not just which
 lane it is, but also which operand (src1 or src2).

 Tested with
 GCC_TEST_RUN_EXPENSIVE=1 make check-gcc 
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 Ok for trunk/4.9/4.8?

 2014-10-01  Jakub Jelinek  ja...@redhat.com

 PR target/63428
 * config/i386/i386.c (expand_vec_perm_pshufb): Fix up rperm[0]
 argument to avx2_permv2ti.

 * gcc.dg/torture/vshuf-4.inc: Move test 122 from EXPTESTS
 to test 24 in TESTS.

OK.

Thanks,
Uros.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Jakub Jelinek

On Wed, Oct 01, 2014 at 01:45:54PM +0200, Uros Bizjak wrote:
 OK.

Thanks.  Second step is a tiny optimization, for the
simplified 122 (now 24) vshuf-v4di.c testcase:
typedef unsigned long long V __attribute__ ((vector_size (32)));
V a, b, c, d;

int
main ()
{
  int i;
  for (i = 0; i  4; ++i)
{
  a[i] = i + 2;
  b[i] = 4 + i + 2;
}
  asm volatile ( : : : memory);
  c = __builtin_shuffle (a, b, (V) { 2, 5, 6, 3 });
  d = __builtin_shuffle ((V) { 2, 3, 4, 5 }, (V) { 6, 7, 8, 9 }, (V) { 2, 5, 6, 
3 });
  if (__builtin_memcmp (c, d, sizeof (c)))
__builtin_abort ();
  return 0;
}

this patch allows better code to be generated:
-   vmovdqa b(%rip), %ymm0
+   vpermq  $238, a(%rip), %ymm1
movl$32, %edx
-   movl$d, %esi
-   vmovdqa a(%rip), %ymm1
+   vmovdqa b(%rip), %ymm0
+   movl$d, %esi
movl$c, %edi
-   vperm2i128  $17, %ymm0, %ymm1, %ymm1
vpblendd$195, %ymm1, %ymm0, %ymm0
vmovdqa %ymm0, c(%rip)

That is because vperm2i128 $17 unnecessarily uses
two operands when all the data it grabs are from a single one.
So, by canonicalizing the permutation we can emit
vpermq $238 instead.  Perhaps more places might benefit from
extra canonicalize_perm calls (two spots already use that beyond
the single one on the expansion/testing entry point).

Tested again with
GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
on x86_64-linux.  Ok for trunk?

2014-10-01  Jakub Jelinek  ja...@redhat.com

* config/i386/i386.c (expand_vec_perm_vperm2f128): Canonicalize
dfirst permutation.

--- gcc/config/i386/i386.c.jj   2014-10-01 13:00:30.0 +0200
+++ gcc/config/i386/i386.c  2014-10-01 13:59:40.061956852 +0200
@@ -43905,15 +43905,16 @@ expand_vec_perm_vperm2f128 (struct expan
dfirst.perm[i] = (i  (nelt2 - 1))
 + ((perm  (2 * (i = nelt2)))  3) * nelt2;
 
+ canonicalize_perm (dfirst);
  ok = expand_vec_perm_1 (dfirst);
  gcc_assert (ok);
 
  /* And dsecond is some single insn shuffle, taking
 d-op0 and result of vperm2f128 (if perm  16) or
 d-op1 and result of vperm2f128 (otherwise).  */
- dsecond.op1 = dfirst.target;
  if (perm = 16)
-   dsecond.op0 = dfirst.op1;
+   dsecond.op0 = dsecond.op1;
+ dsecond.op1 = dfirst.target;
 
  ok = expand_vec_perm_1 (dsecond);
  gcc_assert (ok);


Jakub

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Uros Bizjak

On Wed, Oct 1, 2014 at 2:17 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Oct 01, 2014 at 01:45:54PM +0200, Uros Bizjak wrote:
 OK.

 Thanks.  Second step is a tiny optimization, for the
 simplified 122 (now 24) vshuf-v4di.c testcase:
 typedef unsigned long long V __attribute__ ((vector_size (32)));
 V a, b, c, d;

 int
 main ()
 {
   int i;
   for (i = 0; i  4; ++i)
 {
   a[i] = i + 2;
   b[i] = 4 + i + 2;
 }
   asm volatile ( : : : memory);
   c = __builtin_shuffle (a, b, (V) { 2, 5, 6, 3 });
   d = __builtin_shuffle ((V) { 2, 3, 4, 5 }, (V) { 6, 7, 8, 9 }, (V) { 2, 5, 
 6, 3 });
   if (__builtin_memcmp (c, d, sizeof (c)))
 __builtin_abort ();
   return 0;
 }

 this patch allows better code to be generated:
 -   vmovdqa b(%rip), %ymm0
 +   vpermq  $238, a(%rip), %ymm1
 movl$32, %edx
 -   movl$d, %esi
 -   vmovdqa a(%rip), %ymm1
 +   vmovdqa b(%rip), %ymm0
 +   movl$d, %esi
 movl$c, %edi
 -   vperm2i128  $17, %ymm0, %ymm1, %ymm1
 vpblendd$195, %ymm1, %ymm0, %ymm0
 vmovdqa %ymm0, c(%rip)

 That is because vperm2i128 $17 unnecessarily uses
 two operands when all the data it grabs are from a single one.
 So, by canonicalizing the permutation we can emit
 vpermq $238 instead.  Perhaps more places might benefit from
 extra canonicalize_perm calls (two spots already use that beyond
 the single one on the expansion/testing entry point).

 Tested again with
 GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 on x86_64-linux.  Ok for trunk?

 2014-10-01  Jakub Jelinek  ja...@redhat.com

 * config/i386/i386.c (expand_vec_perm_vperm2f128): Canonicalize
 dfirst permutation.

OK.

Thanks,
Uros.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Jakub Jelinek

On Wed, Oct 01, 2014 at 02:25:01PM +0200, Uros Bizjak wrote:
 OK.

And now the expand_vec_perm_palignr improvement, tested
with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
E.g.
typedef unsigned long long V __attribute__ ((vector_size (32)));
extern void abort (void);
V a, b, c, d;
void test_14 (void)
{
  V mask = { 6, 1, 3, 4 };
  int i;
  c = __builtin_shuffle (a, mask);
  d = __builtin_shuffle (a, b, mask);
}
(distilled from test 15 in vshuf-v4di.c) results in:
-   vmovdqa a(%rip), %ymm0
-   vpermq  $54, %ymm0, %ymm1
-   vpshufb .LC1(%rip), %ymm0, %ymm0
-   vmovdqa %ymm1, c(%rip)
-   vmovdqa b(%rip), %ymm1
-   vpshufb .LC0(%rip), %ymm1, %ymm1
-   vpermq  $78, %ymm1, %ymm1
-   vpor%ymm1, %ymm0, %ymm0
+   vmovdqa a(%rip), %ymm1
+   vpermq  $54, %ymm1, %ymm0
+   vmovdqa %ymm0, c(%rip)
+   vmovdqa b(%rip), %ymm0
+   vpalignr$8, %ymm1, %ymm0, %ymm0
+   vpermq  $99, %ymm0, %ymm0
vmovdqa %ymm0, d(%rip)
vzeroupper
ret
change (and two fewer .rodata constants).

Ok for trunk?

2014-10-01  Jakub Jelinek  ja...@redhat.com

* config/i386/i386.c (expand_vec_perm_palignr): Handle
256-bit vectors for TARGET_AVX2.

--- gcc/config/i386/i386.c.jj   2014-10-01 14:24:16.483138899 +0200
+++ gcc/config/i386/i386.c  2014-10-01 14:27:53.577222011 +0200
@@ -43297,44 +43297,75 @@ expand_vec_perm_palignr (struct expand_v
   rtx shift, target;
   struct expand_vec_perm_d dcopy;
 
-  /* Even with AVX, palignr only operates on 128-bit vectors.  */
-  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
+  /* Even with AVX, palignr only operates on 128-bit vectors,
+ in AVX2 palignr operates on both 128-bit lanes.  */
+  if ((!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
+   (!TARGET_AVX2 || GET_MODE_SIZE (d-vmode) != 32))
 return false;
 
-  min = nelt, max = 0;
+  min = 2 * nelt, max = 0;
   for (i = 0; i  nelt; ++i)
 {
   unsigned e = d-perm[i];
+  if (GET_MODE_SIZE (d-vmode) == 32)
+   e = (e  ((nelt / 2) - 1)) | ((e  nelt)  1);
   if (e  min)
min = e;
   if (e  max)
max = e;
 }
-  if (min == 0 || max - min = nelt)
+  if (min == 0
+  || max - min = (GET_MODE_SIZE (d-vmode) == 32 ? nelt / 2 : nelt))
 return false;
 
   /* Given that we have SSSE3, we know we'll be able to implement the
- single operand permutation after the palignr with pshufb.  */
-  if (d-testing_p)
+ single operand permutation after the palignr with pshufb for
+ 128-bit vectors.  */
+  if (d-testing_p  GET_MODE_SIZE (d-vmode) == 16)
 return true;
 
   dcopy = *d;
-  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
-  target = gen_reg_rtx (TImode);
-  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
- gen_lowpart (TImode, d-op0), shift));
-
-  dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target);
-  dcopy.one_operand_p = true;
 
   in_order = true;
   for (i = 0; i  nelt; ++i)
 {
-  unsigned e = dcopy.perm[i] - min;
+  unsigned e = dcopy.perm[i];
+  if (GET_MODE_SIZE (d-vmode) == 32
+  e = nelt
+  (e  (nelt / 2 - 1))  min)
+   e = e - min - (nelt / 2);
+  else
+   e = e - min;
   if (e != i)
in_order = false;
   dcopy.perm[i] = e;
 }
+  dcopy.one_operand_p = true;
+
+  /* For AVX2, test whether we can permute the result in one instruction.  */
+  if (d-testing_p)
+{
+  if (in_order)
+   return true;
+  dcopy.op1 = dcopy.op0;
+  return expand_vec_perm_1 (dcopy);
+}
+
+  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
+  if (GET_MODE_SIZE (d-vmode) == 16)
+{
+  target = gen_reg_rtx (TImode);
+  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
+ gen_lowpart (TImode, d-op0), shift));
+}
+  else
+{
+  target = gen_reg_rtx (V2TImode);
+  emit_insn (gen_avx2_palignrv2ti (target, gen_lowpart (V2TImode, d-op1),
+  gen_lowpart (V2TImode, d-op0), shift));
+}
+
+  dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target);
 
   /* Test for the degenerate case where the alignment by itself
  produces the desired permutation.  */
@@ -43345,7 +43376,7 @@ expand_vec_perm_palignr (struct expand_v
 }
 
   ok = expand_vec_perm_1 (dcopy);
-  gcc_assert (ok);
+  gcc_assert (ok || GET_MODE_SIZE (d-vmode) == 32);
 
   return ok;
 }


Jakub

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Uros Bizjak

On Wed, Oct 1, 2014 at 2:56 PM, Jakub Jelinek ja...@redhat.com wrote:

 And now the expand_vec_perm_palignr improvement, tested
 with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 E.g.
 typedef unsigned long long V __attribute__ ((vector_size (32)));
 extern void abort (void);
 V a, b, c, d;
 void test_14 (void)
 {
   V mask = { 6, 1, 3, 4 };
   int i;
   c = __builtin_shuffle (a, mask);
   d = __builtin_shuffle (a, b, mask);
 }
 (distilled from test 15 in vshuf-v4di.c) results in:
 -   vmovdqa a(%rip), %ymm0
 -   vpermq  $54, %ymm0, %ymm1
 -   vpshufb .LC1(%rip), %ymm0, %ymm0
 -   vmovdqa %ymm1, c(%rip)
 -   vmovdqa b(%rip), %ymm1
 -   vpshufb .LC0(%rip), %ymm1, %ymm1
 -   vpermq  $78, %ymm1, %ymm1
 -   vpor%ymm1, %ymm0, %ymm0
 +   vmovdqa a(%rip), %ymm1
 +   vpermq  $54, %ymm1, %ymm0
 +   vmovdqa %ymm0, c(%rip)
 +   vmovdqa b(%rip), %ymm0
 +   vpalignr$8, %ymm1, %ymm0, %ymm0
 +   vpermq  $99, %ymm0, %ymm0
 vmovdqa %ymm0, d(%rip)
 vzeroupper
 ret
 change (and two fewer .rodata constants).

 Ok for trunk?

 2014-10-01  Jakub Jelinek  ja...@redhat.com

 * config/i386/i386.c (expand_vec_perm_palignr): Handle
 256-bit vectors for TARGET_AVX2.

Please mention PR 62128 and include the testcase from the PR. Also,
please add a version of gcc.target/i386/pr52252-atom.c, compiled with
-mavx2 (perhaps named pr52252-avx2.c)

OK with a small adjustment below.

Thanks,
Uros.

 --- gcc/config/i386/i386.c.jj   2014-10-01 14:24:16.483138899 +0200
 +++ gcc/config/i386/i386.c  2014-10-01 14:27:53.577222011 +0200
 @@ -43297,44 +43297,75 @@ expand_vec_perm_palignr (struct expand_v
rtx shift, target;
struct expand_vec_perm_d dcopy;

 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* Even with AVX, palignr only operates on 128-bit vectors,
 + in AVX2 palignr operates on both 128-bit lanes.  */
 +  if ((!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +   (!TARGET_AVX2 || GET_MODE_SIZE (d-vmode) != 32))

Please simplify the above condition ...

  return false;

 -  min = nelt, max = 0;
 +  min = 2 * nelt, max = 0;
for (i = 0; i  nelt; ++i)
  {
unsigned e = d-perm[i];
 +  if (GET_MODE_SIZE (d-vmode) == 32)
 +   e = (e  ((nelt / 2) - 1)) | ((e  nelt)  1);
if (e  min)
 min = e;
if (e  max)
 max = e;
  }
 -  if (min == 0 || max - min = nelt)
 +  if (min == 0
 +  || max - min = (GET_MODE_SIZE (d-vmode) == 32 ? nelt / 2 : nelt))
  return false;

/* Given that we have SSSE3, we know we'll be able to implement the
 - single operand permutation after the palignr with pshufb.  */
 -  if (d-testing_p)
 + single operand permutation after the palignr with pshufb for
 + 128-bit vectors.  */
 +  if (d-testing_p  GET_MODE_SIZE (d-vmode) == 16)
  return true;

dcopy = *d;
 -  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
 -  target = gen_reg_rtx (TImode);
 -  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
 - gen_lowpart (TImode, d-op0), shift));
 -
 -  dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target);
 -  dcopy.one_operand_p = true;

in_order = true;
for (i = 0; i  nelt; ++i)
  {
 -  unsigned e = dcopy.perm[i] - min;
 +  unsigned e = dcopy.perm[i];
 +  if (GET_MODE_SIZE (d-vmode) == 32
 +  e = nelt
 +  (e  (nelt / 2 - 1))  min)
 +   e = e - min - (nelt / 2);
 +  else
 +   e = e - min;
if (e != i)
 in_order = false;
dcopy.perm[i] = e;
  }
 +  dcopy.one_operand_p = true;
 +
 +  /* For AVX2, test whether we can permute the result in one instruction.  */
 +  if (d-testing_p)
 +{
 +  if (in_order)
 +   return true;
 +  dcopy.op1 = dcopy.op0;
 +  return expand_vec_perm_1 (dcopy);
 +}
 +
 +  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
 +  if (GET_MODE_SIZE (d-vmode) == 16)
 +{
 +  target = gen_reg_rtx (TImode);
 +  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
 + gen_lowpart (TImode, d-op0), shift));
 +}
 +  else
 +{
 +  target = gen_reg_rtx (V2TImode);
 +  emit_insn (gen_avx2_palignrv2ti (target, gen_lowpart (V2TImode, 
 d-op1),
 +  gen_lowpart (V2TImode, d-op0), 
 shift));
 +}
 +
 +  dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target);

/* Test for the degenerate case where the alignment by itself
   produces the desired permutation.  */
 @@ -43345,7 +43376,7 @@ expand_vec_perm_palignr (struct expand_v
  }

ok = expand_vec_perm_1 (dcopy);
 -  gcc_assert (ok);
 +  gcc_assert (ok || GET_MODE_SIZE (d-vmode) == 32);

return ok;
  }

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Jakub Jelinek

On Wed, Oct 01, 2014 at 03:09:59PM +0200, Uros Bizjak wrote:
  2014-10-01  Jakub Jelinek  ja...@redhat.com
 
  * config/i386/i386.c (expand_vec_perm_palignr): Handle
  256-bit vectors for TARGET_AVX2.
 
 Please mention PR 62128 and include the testcase from the PR. Also,
 please add a version of gcc.target/i386/pr52252-atom.c, compiled with
 -mavx2 (perhaps named pr52252-avx2.c)

This patch doesn't fix PR62128, and it is already tested (even without
GCC_RUN_EXPENSIVE_TESTS=1) in the vshuf*.c torture tests (several of them).
If you want coverage not just for the default flags, I'd say we should
say for -O2 only just add gcc.target/i386/{avx,avx2,avx512}-vshuf-*.c
tests that would include ../../gcc.dg/torture/vshuf-*.c and be compiled/run
with -mavx/-mavx2/-mavx512 options instead of the default ones.

For PR62128, IMHO the right fix is attached.  Note, this is again covered
in vshuf-*.c tests (test 22 in both vshuf-v32*.c and vshuf-v16*.c).
With that attached patch, pr62128.c (aka test_22 in vshuf-v32qi.c), changes:
-   vpshufb .LC0(%rip), %ymm0, %ymm1
-   vpshufb .LC1(%rip), %ymm0, %ymm0
-   vpermq  $78, %ymm1, %ymm1
-   vpor%ymm1, %ymm0, %ymm0
+   vpermq  $78, %ymm0, %ymm1
+   vpalignr$1, %ymm0, %ymm1, %ymm0
ret

  --- gcc/config/i386/i386.c.jj   2014-10-01 14:24:16.483138899 +0200
  +++ gcc/config/i386/i386.c  2014-10-01 14:27:53.577222011 +0200
  @@ -43297,44 +43297,75 @@ expand_vec_perm_palignr (struct expand_v
 rtx shift, target;
 struct expand_vec_perm_d dcopy;
 
  -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
  -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
  +  /* Even with AVX, palignr only operates on 128-bit vectors,
  + in AVX2 palignr operates on both 128-bit lanes.  */
  +  if ((!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
  +   (!TARGET_AVX2 || GET_MODE_SIZE (d-vmode) != 32))
 
 Please simplify the above condition ...

How?  I don't see how that can be simplified, it can
be transformed into
  if (!((TARGET_SSSE3  GET_MODE_SIZE (d-vmode) == 16)
|| (TARGET_AVX2  GET_MODE_SIZE (d-vmode) == 32)))
but I don't find that simpler.

Jakub
2014-10-01  Jakub Jelinek  ja...@redhat.com

PR target/62128
* config/i386/i386.c (expand_vec_perm_1): Try expand_vec_perm_palignr
if it expands to a single insn only.
(expand_vec_perm_palignr): Add SINGLE_INSN_ONLY_P argument.  If true,
fail unless in_order is true.  Add forward declaration.
(expand_vec_perm_vperm2f128): Fix up comment about which permutation
is useful for one_operand_p.
(ix86_expand_vec_perm_const_1): Adjust expand_vec_perm_palignr caller.

--- gcc/config/i386/i386.c.jj   2014-10-01 15:42:28.0 +0200
+++ gcc/config/i386/i386.c  2014-10-01 15:50:06.234079887 +0200
@@ -39636,6 +39636,7 @@ struct expand_vec_perm_d
 static bool canonicalize_perm (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
+static bool expand_vec_perm_palignr (struct expand_vec_perm_d *d, bool);
 
 /* Get a vector mode of the same size as the original but with elements
twice as wide.  This is only guaranteed to apply to integral vectors.  */
@@ -43225,6 +43226,10 @@ expand_vec_perm_1 (struct expand_vec_per
   if (expand_vec_perm_pshufb (d))
 return true;
 
+  /* Try the AVX2 vpalignr instruction.  */
+  if (expand_vec_perm_palignr (d, true))
+return true;
+
   /* Try the AVX512F vpermi2 instructions.  */
   rtx vec[64];
   enum machine_mode mode = d-vmode;
@@ -43286,10 +43291,11 @@ expand_vec_perm_pshuflw_pshufhw (struct
the permutation using the SSSE3 palignr instruction.  This succeeds
when all of the elements in PERM fit within one vector and we merely
need to shift them down so that a single vector permutation has a
-   chance to succeed.  */
+   chance to succeed.  If SINGLE_INSN_ONLY_P, succeed if only
+   the vpalignr instruction itself can perform the requested permutation.  */
 
 static bool
-expand_vec_perm_palignr (struct expand_vec_perm_d *d)
+expand_vec_perm_palignr (struct expand_vec_perm_d *d, bool single_insn_only_p)
 {
   unsigned i, nelt = d-nelt;
   unsigned min, max;
@@ -43320,8 +43326,9 @@ expand_vec_perm_palignr (struct expand_v
 
   /* Given that we have SSSE3, we know we'll be able to implement the
  single operand permutation after the palignr with pshufb for
- 128-bit vectors.  */
-  if (d-testing_p  GET_MODE_SIZE (d-vmode) == 16)
+ 128-bit vectors.  If SINGLE_INSN_ONLY_P, in_order has to be computed
+ first.  */
+  if (d-testing_p  GET_MODE_SIZE (d-vmode) == 16  !single_insn_only_p)
 return true;
 
   dcopy = *d;
@@ -43342,6 +43349,9 @@ expand_vec_perm_palignr (struct expand_v
 }
   dcopy.one_operand_p = true;
 
+  if (single_insn_only_p  !in_order)
+return false;
+
   /* For AVX2, test

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Uros Bizjak

On Wed, Oct 1, 2014 at 4:12 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Oct 01, 2014 at 03:09:59PM +0200, Uros Bizjak wrote:
  2014-10-01  Jakub Jelinek  ja...@redhat.com
 
  * config/i386/i386.c (expand_vec_perm_palignr): Handle
  256-bit vectors for TARGET_AVX2.

 Please mention PR 62128 and include the testcase from the PR. Also,
 please add a version of gcc.target/i386/pr52252-atom.c, compiled with
 -mavx2 (perhaps named pr52252-avx2.c)

 This patch doesn't fix PR62128, and it is already tested (even without
 GCC_RUN_EXPENSIVE_TESTS=1) in the vshuf*.c torture tests (several of them).

Ah, OK then.

 If you want coverage not just for the default flags, I'd say we should
 say for -O2 only just add gcc.target/i386/{avx,avx2,avx512}-vshuf-*.c
 tests that would include ../../gcc.dg/torture/vshuf-*.c and be compiled/run
 with -mavx/-mavx2/-mavx512 options instead of the default ones.

No, the above should be good for now. The failure was triggered by the
target that defaults to -mavx2, and for avx512f we can run this suite
in the way you suggest.

 For PR62128, IMHO the right fix is attached.  Note, this is again covered
 in vshuf-*.c tests (test 22 in both vshuf-v32*.c and vshuf-v16*.c).
 With that attached patch, pr62128.c (aka test_22 in vshuf-v32qi.c), changes:
 -   vpshufb .LC0(%rip), %ymm0, %ymm1
 -   vpshufb .LC1(%rip), %ymm0, %ymm0
 -   vpermq  $78, %ymm1, %ymm1
 -   vpor%ymm1, %ymm0, %ymm0
 +   vpermq  $78, %ymm0, %ymm1
 +   vpalignr$1, %ymm0, %ymm1, %ymm0
 ret

  --- gcc/config/i386/i386.c.jj   2014-10-01 14:24:16.483138899 +0200
  +++ gcc/config/i386/i386.c  2014-10-01 14:27:53.577222011 +0200
  @@ -43297,44 +43297,75 @@ expand_vec_perm_palignr (struct expand_v
 rtx shift, target;
 struct expand_vec_perm_d dcopy;
 
  -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
  -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
  +  /* Even with AVX, palignr only operates on 128-bit vectors,
  + in AVX2 palignr operates on both 128-bit lanes.  */
  +  if ((!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
  +   (!TARGET_AVX2 || GET_MODE_SIZE (d-vmode) != 32))

 Please simplify the above condition ...

 How?  I don't see how that can be simplified, it can
 be transformed into
   if (!((TARGET_SSSE3  GET_MODE_SIZE (d-vmode) == 16)
 || (TARGET_AVX2  GET_MODE_SIZE (d-vmode) == 32)))
 but I don't find that simpler.

I was thinking of the above, but you are correct, the change doesn't
bring us anything.

So, the patch is OK as it was.

Thanks,
Uros.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Uros Bizjak

On Wed, Oct 1, 2014 at 2:56 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Oct 01, 2014 at 02:25:01PM +0200, Uros Bizjak wrote:
 OK.

 And now the expand_vec_perm_palignr improvement, tested
 with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
 RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
 E.g.
 typedef unsigned long long V __attribute__ ((vector_size (32)));
 extern void abort (void);
 V a, b, c, d;
 void test_14 (void)
 {
   V mask = { 6, 1, 3, 4 };
   int i;
   c = __builtin_shuffle (a, mask);
   d = __builtin_shuffle (a, b, mask);
 }
 (distilled from test 15 in vshuf-v4di.c) results in:
 -   vmovdqa a(%rip), %ymm0
 -   vpermq  $54, %ymm0, %ymm1
 -   vpshufb .LC1(%rip), %ymm0, %ymm0
 -   vmovdqa %ymm1, c(%rip)
 -   vmovdqa b(%rip), %ymm1
 -   vpshufb .LC0(%rip), %ymm1, %ymm1
 -   vpermq  $78, %ymm1, %ymm1
 -   vpor%ymm1, %ymm0, %ymm0
 +   vmovdqa a(%rip), %ymm1
 +   vpermq  $54, %ymm1, %ymm0
 +   vmovdqa %ymm0, c(%rip)
 +   vmovdqa b(%rip), %ymm0
 +   vpalignr$8, %ymm1, %ymm0, %ymm0
 +   vpermq  $99, %ymm0, %ymm0
 vmovdqa %ymm0, d(%rip)
 vzeroupper
 ret
 change (and two fewer .rodata constants).

On a related note, I would like to point out that
gcc.target/i386/pr61403.c also fails to generate blend insn with
-mavx2. The new insn sequence includes lots of new vpshufb insns with
memory access.

Uros.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-10-01 Thread Jakub Jelinek

On Wed, Oct 01, 2014 at 04:12:15PM +0200, Jakub Jelinek wrote:
 For PR62128, IMHO the right fix is attached.  Note, this is again covered
 in vshuf-*.c tests (test 22 in both vshuf-v32*.c and vshuf-v16*.c).
 With that attached patch, pr62128.c (aka test_22 in vshuf-v32qi.c), changes:
 -   vpshufb .LC0(%rip), %ymm0, %ymm1
 -   vpshufb .LC1(%rip), %ymm0, %ymm0
 -   vpermq  $78, %ymm1, %ymm1
 -   vpor%ymm1, %ymm0, %ymm0
 +   vpermq  $78, %ymm0, %ymm1
 +   vpalignr$1, %ymm0, %ymm1, %ymm0
 ret
 
 2014-10-01  Jakub Jelinek  ja...@redhat.com
 
   PR target/62128
   * config/i386/i386.c (expand_vec_perm_1): Try expand_vec_perm_palignr
   if it expands to a single insn only.
   (expand_vec_perm_palignr): Add SINGLE_INSN_ONLY_P argument.  If true,
   fail unless in_order is true.  Add forward declaration.
   (expand_vec_perm_vperm2f128): Fix up comment about which permutation
   is useful for one_operand_p.
   (ix86_expand_vec_perm_const_1): Adjust expand_vec_perm_palignr caller.

Now bootstrapped/regtested on x86_64-linux and i686-linux (and additionally
tested also with --target_board=unix/-mavx2), ok for trunk?

 --- gcc/config/i386/i386.c.jj 2014-10-01 15:42:28.0 +0200
 +++ gcc/config/i386/i386.c2014-10-01 15:50:06.234079887 +0200
 @@ -39636,6 +39636,7 @@ struct expand_vec_perm_d
  static bool canonicalize_perm (struct expand_vec_perm_d *d);
  static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
  static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
 +static bool expand_vec_perm_palignr (struct expand_vec_perm_d *d, bool);
  
  /* Get a vector mode of the same size as the original but with elements
 twice as wide.  This is only guaranteed to apply to integral vectors.  */
 @@ -43225,6 +43226,10 @@ expand_vec_perm_1 (struct expand_vec_per
if (expand_vec_perm_pshufb (d))
  return true;
  
 +  /* Try the AVX2 vpalignr instruction.  */
 +  if (expand_vec_perm_palignr (d, true))
 +return true;
 +
/* Try the AVX512F vpermi2 instructions.  */
rtx vec[64];
enum machine_mode mode = d-vmode;
 @@ -43286,10 +43291,11 @@ expand_vec_perm_pshuflw_pshufhw (struct
 the permutation using the SSSE3 palignr instruction.  This succeeds
 when all of the elements in PERM fit within one vector and we merely
 need to shift them down so that a single vector permutation has a
 -   chance to succeed.  */
 +   chance to succeed.  If SINGLE_INSN_ONLY_P, succeed if only
 +   the vpalignr instruction itself can perform the requested permutation.  */
  
  static bool
 -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
 +expand_vec_perm_palignr (struct expand_vec_perm_d *d, bool 
 single_insn_only_p)
  {
unsigned i, nelt = d-nelt;
unsigned min, max;
 @@ -43320,8 +43326,9 @@ expand_vec_perm_palignr (struct expand_v
  
/* Given that we have SSSE3, we know we'll be able to implement the
   single operand permutation after the palignr with pshufb for
 - 128-bit vectors.  */
 -  if (d-testing_p  GET_MODE_SIZE (d-vmode) == 16)
 + 128-bit vectors.  If SINGLE_INSN_ONLY_P, in_order has to be computed
 + first.  */
 +  if (d-testing_p  GET_MODE_SIZE (d-vmode) == 16  !single_insn_only_p)
  return true;
  
dcopy = *d;
 @@ -43342,6 +43349,9 @@ expand_vec_perm_palignr (struct expand_v
  }
dcopy.one_operand_p = true;
  
 +  if (single_insn_only_p  !in_order)
 +return false;
 +
/* For AVX2, test whether we can permute the result in one instruction.  */
if (d-testing_p)
  {
 @@ -43922,7 +43932,8 @@ expand_vec_perm_vperm2f128 (struct expan
 return true;
   }
  
 -  /* For one operand, the only useful vperm2f128 permutation is 0x10.  */
 +  /* For one operand, the only useful vperm2f128 permutation is 0x01
 +  aka lanes swap.  */
if (d-one_operand_p)
   return false;
  }
 @@ -44811,7 +44822,7 @@ ix86_expand_vec_perm_const_1 (struct exp
if (expand_vec_perm_pshuflw_pshufhw (d))
  return true;
  
 -  if (expand_vec_perm_palignr (d))
 +  if (expand_vec_perm_palignr (d, false))
  return true;
  
if (expand_vec_perm_interleave2 (d))


Jakub

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-09-17 Thread Evgeny Stupachenko

It fix gcc.target/i386/pr52252-atom.c in core-avx2 make check and pr62128.

On Tue, Sep 16, 2014 at 6:51 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Sep 16, 2014 at 6:15 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 PING 2

 On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 PING

 On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 The rotate insn appeared right after expand.
 I've done it similar to define_insn_and_split 
 *avx_vperm_broadcast_mode.
 I don't see any potential losses on splitting that after reload.

 On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson r...@redhat.com wrote:
 On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
 +(define_insn_and_split avx2_rotatemode_perm
 +  [(set (match_operand:V_256 0 register_operand =x)
 +  (vec_select:V_256
 +   (match_operand:V_256 1 register_operand x)
 +   (match_parallel 2 palignr_operand
 + [(match_operand 3 const_int_operand n)])))]
 +  TARGET_AVX2
 +  #
 +   reload_completed
 +  [(const_int 0)]

 Why are you waiting until after reload to expand this?  It's only the
 vec_select parallel that determines which direction the palignr should be 
 done.

 This seems like something you could do during permutation expansion.



 Assuming your change is triggered today without any additional changes
  you should include some testcases.  For now, it doesn't show if it does
 anything useful.

 --
 H.J.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-09-17 Thread H.J. Lu

On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 It fix gcc.target/i386/pr52252-atom.c in core-avx2 make check and pr62128.


I suggest you resubmit the patch as a bug fix for pr62128 with
testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.


-- 
H.J.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-09-17 Thread Evgeny Stupachenko

The test in pr62128 is exactly TEST 22 from
gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
or not.
Resubmitting patch looks good as current mail thread is already too complicated.

On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 It fix gcc.target/i386/pr52252-atom.c in core-avx2 make check and pr62128.


 I suggest you resubmit the patch as a bug fix for pr62128 with
 testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.


 --
 H.J.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-09-16 Thread Evgeny Stupachenko

PING 2

On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 PING

 On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 The rotate insn appeared right after expand.
 I've done it similar to define_insn_and_split *avx_vperm_broadcast_mode.
 I don't see any potential losses on splitting that after reload.

 On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson r...@redhat.com wrote:
 On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
 +(define_insn_and_split avx2_rotatemode_perm
 +  [(set (match_operand:V_256 0 register_operand =x)
 +  (vec_select:V_256
 +   (match_operand:V_256 1 register_operand x)
 +   (match_parallel 2 palignr_operand
 + [(match_operand 3 const_int_operand n)])))]
 +  TARGET_AVX2
 +  #
 +   reload_completed
 +  [(const_int 0)]

 Why are you waiting until after reload to expand this?  It's only the
 vec_select parallel that determines which direction the palignr should be 
 done.

 This seems like something you could do during permutation expansion.


 r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-09-16 Thread H.J. Lu

On Tue, Sep 16, 2014 at 6:15 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 PING 2

 On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 PING

 On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 The rotate insn appeared right after expand.
 I've done it similar to define_insn_and_split *avx_vperm_broadcast_mode.
 I don't see any potential losses on splitting that after reload.

 On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson r...@redhat.com wrote:
 On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
 +(define_insn_and_split avx2_rotatemode_perm
 +  [(set (match_operand:V_256 0 register_operand =x)
 +  (vec_select:V_256
 +   (match_operand:V_256 1 register_operand x)
 +   (match_parallel 2 palignr_operand
 + [(match_operand 3 const_int_operand n)])))]
 +  TARGET_AVX2
 +  #
 +   reload_completed
 +  [(const_int 0)]

 Why are you waiting until after reload to expand this?  It's only the
 vec_select parallel that determines which direction the palignr should be 
 done.

 This seems like something you could do during permutation expansion.



Assuming your change is triggered today without any additional changes
 you should include some testcases.  For now, it doesn't show if it does
anything useful.

-- 
H.J.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-09-08 Thread Evgeny Stupachenko

PING

On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 The rotate insn appeared right after expand.
 I've done it similar to define_insn_and_split *avx_vperm_broadcast_mode.
 I don't see any potential losses on splitting that after reload.

 On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson r...@redhat.com wrote:
 On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
 +(define_insn_and_split avx2_rotatemode_perm
 +  [(set (match_operand:V_256 0 register_operand =x)
 +  (vec_select:V_256
 +   (match_operand:V_256 1 register_operand x)
 +   (match_parallel 2 palignr_operand
 + [(match_operand 3 const_int_operand n)])))]
 +  TARGET_AVX2
 +  #
 +   reload_completed
 +  [(const_int 0)]

 Why are you waiting until after reload to expand this?  It's only the
 vec_select parallel that determines which direction the palignr should be 
 done.

 This seems like something you could do during permutation expansion.


 r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-08-27 Thread Evgeny Stupachenko

The rotate insn appeared right after expand.
I've done it similar to define_insn_and_split *avx_vperm_broadcast_mode.
I don't see any potential losses on splitting that after reload.

On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson r...@redhat.com wrote:
 On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
 +(define_insn_and_split avx2_rotatemode_perm
 +  [(set (match_operand:V_256 0 register_operand =x)
 +  (vec_select:V_256
 +   (match_operand:V_256 1 register_operand x)
 +   (match_parallel 2 palignr_operand
 + [(match_operand 3 const_int_operand n)])))]
 +  TARGET_AVX2
 +  #
 +   reload_completed
 +  [(const_int 0)]

 Why are you waiting until after reload to expand this?  It's only the
 vec_select parallel that determines which direction the palignr should be 
 done.

 This seems like something you could do during permutation expansion.


 r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-08-26 Thread Evgeny Stupachenko

That is covered by a separate part of the patch:
(make check and bootstrap passed: 2 new passes for core-avx2)
is it ok?

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d6155cf..68ee65a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -81,6 +81,7 @@
   ;; For AVX2 support
   UNSPEC_VPERMVAR
   UNSPEC_VPERMTI
+  UNSPEC_VPALIGNRDI
   UNSPEC_GATHER
   UNSPEC_VSIBADDR

@@ -14167,6 +14168,19 @@
(set_attr prefix vex)
(set_attr mode OI)])

+(define_insn avx2_palignrv4di
+  [(set (match_operand:V4DI 0 register_operand =x)
+   (unspec:V4DI
+ [(match_operand:V4DI 1 register_operand x)
+  (match_operand:V4DI 2 nonimmediate_operand xm)
+  (match_operand:SI 3 const_0_to_255_operand n)]
+ UNSPEC_VPALIGNRDI))]
+  TARGET_AVX2
+  vpalignr\t{%3, %2, %1, %0|%0, %1, %2, %3}
+  [(set_attr type sselog)
+   (set_attr prefix vex)
+   (set_attr mode OI)])
+
 (define_insn avx2_vec_dupv4df
   [(set (match_operand:V4DF 0 register_operand =x)
(vec_duplicate:V4DF
@@ -14658,6 +14672,49 @@
(set_attr length_immediate 1)
(set_attr prefix orig,vex)])

+(define_insn_and_split avx2_rotatemode_perm
+  [(set (match_operand:V_256 0 register_operand =x)
+  (vec_select:V_256
+   (match_operand:V_256 1 register_operand x)
+   (match_parallel 2 palignr_operand
+ [(match_operand 3 const_int_operand n)])))]
+  TARGET_AVX2
+  #
+   reload_completed
+  [(const_int 0)]
+{
+  enum machine_mode imode = GET_MODE_INNER (MODEmode);
+  int shift = INTVAL (operands[3]) * GET_MODE_SIZE (imode);
+  rtx op0 = gen_rtx_REG (V4DImode, REGNO (operands[0]));
+  rtx op1 = gen_rtx_REG (V4DImode, REGNO (operands[1]));
+
+  if (shift == 0)
+emit_move_insn (operands[0], operands[1]);
+  else
+{
+  emit_insn (gen_avx2_permv2ti (op0,
+   op1,
+   op1,
+   GEN_INT (33)));
+  if (shift  16)
+   emit_insn (gen_avx2_palignrv4di (op0,
+op0,
+op1,
+GEN_INT (shift)));
+  else if (shift  16)
+   emit_insn (gen_avx2_palignrv4di (op0,
+op1,
+op0,
+GEN_INT (shift - 16)));
+}
+  DONE;
+}
+  [(set_attr type sseishft)
+   (set_attr prefix_extra 1)
+   (set_attr length_immediate 1)
+   (set_attr prefix vex)])
+
+
 (define_expand avx_vinsertf128mode
   [(match_operand:V_256 0 register_operand)
(match_operand:V_256 1 register_operand)

The test case covering this is gcc.target/i386/pr52252-atom.c.
It will pass for -march=core-avx2 when the patch committed.

On Thu, Aug 14, 2014 at 6:55 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 14, 2014 at 1:08 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 Ping.

 On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson r...@redhat.com wrote:
 On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
 -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
 +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)

 insn_num might as well be bool avx2, since it's only ever set to two 
 values.

 Agree. However:
  after the alignment, one operand permutation could be just move and
 take 2 instructions for AVX2 as well
  for AVX2 there could be other cases when the scheme takes 4 or 5 
 instructions
  we can leave it for potential avx512 extension


 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
 +  if (GET_MODE_SIZE (d-vmode) == 16)
 +{
 +  if (!TARGET_SSSE3)
 +   return false;
 +}
 +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
 +  else if (GET_MODE_SIZE (d-vmode) == 32)
 +{
 +  if (!TARGET_AVX2)
 +   return false;
 +}
 +  /* Other sizes are not supported.  */
 +  else
  return false;

 And you'd better check it up here because...


 Correct. The following should resolve the issue:
   /* For AVX2 we need more than 2 instructions when the alignment
  by itself does not produce the desired permutation.  */
   if (TARGET_AVX2  insn_num = 2)
 return false;

 +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
 + operand permutaoin.  */
 +  if (insn_num == 2)
 +{
 +  ok = expand_vec_perm_1 (dcopy);
 +  gcc_assert (ok);
 +}
 +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
 + vperm plus 4 instructions for one operand permutation.  */
 +  else if (insn_num == 6)
 +{
 +  ok = expand_vec_perm_vpshufb2_vpermq (dcopy);
 +  gcc_assert (ok);
 +}
 +  else
 +ok = false;
return ok;

 ... down here you'll simply ICE from

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-08-26 Thread Richard Henderson

On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
 +(define_insn_and_split avx2_rotatemode_perm
 +  [(set (match_operand:V_256 0 register_operand =x)
 +  (vec_select:V_256
 +   (match_operand:V_256 1 register_operand x)
 +   (match_parallel 2 palignr_operand
 + [(match_operand 3 const_int_operand n)])))]
 +  TARGET_AVX2
 +  #
 +   reload_completed
 +  [(const_int 0)]

Why are you waiting until after reload to expand this?  It's only the
vec_select parallel that determines which direction the palignr should be done.

This seems like something you could do during permutation expansion.


r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-08-14 Thread Evgeny Stupachenko

Ping.

On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson r...@redhat.com wrote:
 On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
 -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
 +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)

 insn_num might as well be bool avx2, since it's only ever set to two 
 values.

 Agree. However:
  after the alignment, one operand permutation could be just move and
 take 2 instructions for AVX2 as well
  for AVX2 there could be other cases when the scheme takes 4 or 5 instructions
  we can leave it for potential avx512 extension


 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
 +  if (GET_MODE_SIZE (d-vmode) == 16)
 +{
 +  if (!TARGET_SSSE3)
 +   return false;
 +}
 +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
 +  else if (GET_MODE_SIZE (d-vmode) == 32)
 +{
 +  if (!TARGET_AVX2)
 +   return false;
 +}
 +  /* Other sizes are not supported.  */
 +  else
  return false;

 And you'd better check it up here because...


 Correct. The following should resolve the issue:
   /* For AVX2 we need more than 2 instructions when the alignment
  by itself does not produce the desired permutation.  */
   if (TARGET_AVX2  insn_num = 2)
 return false;

 +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
 + operand permutaoin.  */
 +  if (insn_num == 2)
 +{
 +  ok = expand_vec_perm_1 (dcopy);
 +  gcc_assert (ok);
 +}
 +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
 + vperm plus 4 instructions for one operand permutation.  */
 +  else if (insn_num == 6)
 +{
 +  ok = expand_vec_perm_vpshufb2_vpermq (dcopy);
 +  gcc_assert (ok);
 +}
 +  else
 +ok = false;
return ok;

 ... down here you'll simply ICE from the gcc_assert.






 r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-08-14 Thread H.J. Lu

On Thu, Aug 14, 2014 at 1:08 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Ping.

 On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson r...@redhat.com wrote:
 On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
 -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
 +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)

 insn_num might as well be bool avx2, since it's only ever set to two 
 values.

 Agree. However:
  after the alignment, one operand permutation could be just move and
 take 2 instructions for AVX2 as well
  for AVX2 there could be other cases when the scheme takes 4 or 5 
 instructions
  we can leave it for potential avx512 extension


 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
 +  if (GET_MODE_SIZE (d-vmode) == 16)
 +{
 +  if (!TARGET_SSSE3)
 +   return false;
 +}
 +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
 +  else if (GET_MODE_SIZE (d-vmode) == 32)
 +{
 +  if (!TARGET_AVX2)
 +   return false;
 +}
 +  /* Other sizes are not supported.  */
 +  else
  return false;

 And you'd better check it up here because...


 Correct. The following should resolve the issue:
   /* For AVX2 we need more than 2 instructions when the alignment
  by itself does not produce the desired permutation.  */
   if (TARGET_AVX2  insn_num = 2)
 return false;

 +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
 + operand permutaoin.  */
 +  if (insn_num == 2)
 +{
 +  ok = expand_vec_perm_1 (dcopy);
 +  gcc_assert (ok);
 +}
 +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
 + vperm plus 4 instructions for one operand permutation.  */
 +  else if (insn_num == 6)
 +{
 +  ok = expand_vec_perm_vpshufb2_vpermq (dcopy);
 +  gcc_assert (ok);
 +}
 +  else
 +ok = false;
return ok;

 ... down here you'll simply ICE from the gcc_assert.


Can you modify your patch to fix

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62128

with a testcase?


-- 
H.J.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-07-10 Thread Evgeny Stupachenko

On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson r...@redhat.com wrote:
 On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
 -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
 +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)

 insn_num might as well be bool avx2, since it's only ever set to two values.

Agree. However:
 after the alignment, one operand permutation could be just move and
take 2 instructions for AVX2 as well
 for AVX2 there could be other cases when the scheme takes 4 or 5 instructions
 we can leave it for potential avx512 extension


 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
 +  if (GET_MODE_SIZE (d-vmode) == 16)
 +{
 +  if (!TARGET_SSSE3)
 +   return false;
 +}
 +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
 +  else if (GET_MODE_SIZE (d-vmode) == 32)
 +{
 +  if (!TARGET_AVX2)
 +   return false;
 +}
 +  /* Other sizes are not supported.  */
 +  else
  return false;

 And you'd better check it up here because...


Correct. The following should resolve the issue:
  /* For AVX2 we need more than 2 instructions when the alignment
 by itself does not produce the desired permutation.  */
  if (TARGET_AVX2  insn_num = 2)
return false;

 +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
 + operand permutaoin.  */
 +  if (insn_num == 2)
 +{
 +  ok = expand_vec_perm_1 (dcopy);
 +  gcc_assert (ok);
 +}
 +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
 + vperm plus 4 instructions for one operand permutation.  */
 +  else if (insn_num == 6)
 +{
 +  ok = expand_vec_perm_vpshufb2_vpermq (dcopy);
 +  gcc_assert (ok);
 +}
 +  else
 +ok = false;
return ok;

 ... down here you'll simply ICE from the gcc_assert.






 r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-07-07 Thread Richard Henderson

On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
 -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
 +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)

insn_num might as well be bool avx2, since it's only ever set to two values.

 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
 +  if (GET_MODE_SIZE (d-vmode) == 16)
 +{
 +  if (!TARGET_SSSE3)
 +   return false;
 +}
 +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
 +  else if (GET_MODE_SIZE (d-vmode) == 32)
 +{
 +  if (!TARGET_AVX2)
 +   return false;
 +}
 +  /* Other sizes are not supported.  */
 +  else
  return false;

And you'd better check it up here because...

 +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
 + operand permutaoin.  */
 +  if (insn_num == 2)
 +{
 +  ok = expand_vec_perm_1 (dcopy);
 +  gcc_assert (ok);
 +}
 +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
 + vperm plus 4 instructions for one operand permutation.  */
 +  else if (insn_num == 6)
 +{
 +  ok = expand_vec_perm_vpshufb2_vpermq (dcopy);
 +  gcc_assert (ok);
 +}
 +  else
 +ok = false;
return ok;

... down here you'll simply ICE from the gcc_assert.


r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-07-04 Thread Uros Bizjak

On Thu, Jul 3, 2014 at 11:53 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 The expand_vec_perm_palignr is similar for SSSE3 and AVX2 cases,
   but AVX2 requires more instructions to complete the scheme.

 The patch below adds AVX2 support for six instructions, leaving SSSE3 for two.
 Is it ok?

ChangeLog entry is missing.

Uros.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-07-04 Thread Evgeny Stupachenko

2014-07-04  Evgeny Stupachenko  evstu...@gmail.com

* config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
PALINGR instruction.
* config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try
for AVX2.

On Fri, Jul 4, 2014 at 3:22 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Thu, Jul 3, 2014 at 11:53 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 The expand_vec_perm_palignr is similar for SSSE3 and AVX2 cases,
   but AVX2 requires more instructions to complete the scheme.

 The patch below adds AVX2 support for six instructions, leaving SSSE3 for 
 two.
 Is it ok?

 ChangeLog entry is missing.

 Uros.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-07-03 Thread Evgeny Stupachenko

The expand_vec_perm_palignr is similar for SSSE3 and AVX2 cases,
  but AVX2 requires more instructions to complete the scheme.

The patch below adds AVX2 support for six instructions, leaving SSSE3 for two.
Is it ok?

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2cffcef..70fc832 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -43130,23 +43130,38 @@ expand_vec_perm_pshuflw_pshufhw (struct
expand_vec_perm_d *d)
   return true;
 }

+static bool
+expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
-   the permutation using the SSSE3 palignr instruction.  This succeeds
+   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
when all of the elements in PERM fit within one vector and we merely
need to shift them down so that a single vector permutation has a
chance to succeed.  */

 static bool
-expand_vec_perm_palignr (struct expand_vec_perm_d *d)
+expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)
 {
   unsigned i, nelt = d-nelt;
   unsigned min, max;
   bool in_order, ok;
-  rtx shift, target;
+  rtx shift, shift1, target, tmp;
   struct expand_vec_perm_d dcopy;

-  /* Even with AVX, palignr only operates on 128-bit vectors.  */
-  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
+  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
+  if (GET_MODE_SIZE (d-vmode) == 16)
+{
+  if (!TARGET_SSSE3)
+   return false;
+}
+  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
+  else if (GET_MODE_SIZE (d-vmode) == 32)
+{
+  if (!TARGET_AVX2)
+   return false;
+}
+  /* Other sizes are not supported.  */
+  else
 return false;

   min = nelt, max = 0;
@@ -43168,9 +43183,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)

   dcopy = *d;
   shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
-  target = gen_reg_rtx (TImode);
-  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
- gen_lowpart (TImode, d-op0), shift));
+  shift1 = GEN_INT ((min - nelt / 2)
+  * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
+
+  if (GET_MODE_SIZE (d-vmode) != 32)
+{
+  target = gen_reg_rtx (TImode);
+  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
+ gen_lowpart (TImode, d-op0), shift));
+}
+  else
+{
+  target = gen_reg_rtx (V2TImode);
+  tmp = gen_reg_rtx (V4DImode);
+  emit_insn (gen_avx2_permv2ti (tmp,
+   gen_lowpart (V4DImode, d-op0),
+   gen_lowpart (V4DImode, d-op1),
+   GEN_INT (33)));
+  if (min  nelt / 2)
+emit_insn (gen_avx2_palignrv2ti (target,
+gen_lowpart (V2TImode, tmp),
+gen_lowpart (V2TImode, d-op0),
+shift));
+  else
+   emit_insn (gen_avx2_palignrv2ti (target,
+gen_lowpart (V2TImode, d-op1),
+gen_lowpart (V2TImode, tmp),
+shift1));
+}

   dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target);
   dcopy.one_operand_p = true;
@@ -43192,9 +43232,22 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
   return true;
 }

-  ok = expand_vec_perm_1 (dcopy);
-  gcc_assert (ok);
-
+  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
+ operand permutaoin.  */
+  if (insn_num == 2)
+{
+  ok = expand_vec_perm_1 (dcopy);
+  gcc_assert (ok);
+}
+  /* For AVX2 we need 2 instructions for the shift: vpalignr and
+ vperm plus 4 instructions for one operand permutation.  */
+  else if (insn_num == 6)
+{
+  ok = expand_vec_perm_vpshufb2_vpermq (dcopy);
+  gcc_assert (ok);
+}
+  else
+ok = false;
   return ok;
 }

@@ -44627,7 +44680,7 @@ ix86_expand_vec_perm_const_1 (struct
expand_vec_perm_d *d)
   if (expand_vec_perm_pshuflw_pshufhw (d))
 return true;

-  if (expand_vec_perm_palignr (d))
+  if (expand_vec_perm_palignr (d, 2))
 return true;

   if (expand_vec_perm_interleave2 (d))
@@ -44680,6 +44733,10 @@ ix86_expand_vec_perm_const_1 (struct
expand_vec_perm_d *d)
   if (expand_vec_perm_even_odd (d))
 return true;

+  /* Try sequences of six instructions.  */
+  if (expand_vec_perm_palignr (d, 6))
+return true;
+
   /* Even longer sequences.  */
   if (expand_vec_perm_vpshufb4_vpermq2 (d))
 return true;

On Mon, May 19, 2014 at 7:32 PM, Richard Henderson r...@redhat.com wrote:
 On 05/05/2014 09:49 AM, Evgeny Stupachenko wrote:
 @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
if (expand_vec_perm_pshufb (d))
  return true;

 +  /* Try the AVX2 vpshufb.  */
 +  if

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-05-19 Thread Evgeny Stupachenko

Ping.

On Mon, May 5, 2014 at 8:49 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Is the following patch ok? It passes bootstrap and make check.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 88142a8..91f6f21 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
return true;
  }

 +static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
 +
  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
 in a single instruction.  */

 @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
if (expand_vec_perm_pshufb (d))
  return true;

 +  /* Try the AVX2 vpshufb.  */
 +  if (expand_vec_perm_vpshufb2_vpermq (d))
 +return true;
 +
/* Try the AVX512F vpermi2 instructions.  */
rtx vec[64];
enum machine_mode mode = d-vmode;
 @@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
 expand_vec_perm_d *d)
  }

  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
 -   the permutation using the SSSE3 palignr instruction.  This succeeds
 +   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
 when all of the elements in PERM fit within one vector and we merely
 need to shift them down so that a single vector permutation has a
 chance to succeed.  */
 @@ -43015,14 +43021,26 @@ expand_vec_perm_palignr (struct expand_vec_perm_d 
 *d)
unsigned i, nelt = d-nelt;
unsigned min, max;
bool in_order, ok;
 -  rtx shift, target;
 +  rtx shift, shift1, target, tmp;
struct expand_vec_perm_d dcopy;

 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
 +  if (GET_MODE_SIZE (d-vmode) == 16)
 +{
 +  if (!TARGET_SSSE3)
 +   return false;
 +}
 +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
 +  else if (GET_MODE_SIZE (d-vmode) == 32)
 +{
 +  if (!TARGET_AVX2)
 +   return false;
 +}
 +  /* Other sizes are not supported.  */
 +  else
  return false;

 -  min = nelt, max = 0;
 +  min = 2 * nelt, max = 0;
for (i = 0; i  nelt; ++i)
  {
unsigned e = d-perm[i];
 @@ -43041,9 +43059,35 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)

dcopy = *d;
shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
 -  target = gen_reg_rtx (TImode);
 -  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
 - gen_lowpart (TImode, d-op0), shift));
 +  shift1 = GEN_INT ((min - nelt / 2) *
 +  GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
 +
 +  if (GET_MODE_SIZE (d-vmode) != 32)
 +{
 +  target = gen_reg_rtx (TImode);
 +  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
 + gen_lowpart (TImode, d-op0), shift));
 +}
 +  else
 +{
 +  target = gen_reg_rtx (V2TImode);
 +  tmp = gen_reg_rtx (V4DImode);
 +  emit_insn (gen_avx2_permv2ti (tmp,
 +   gen_lowpart (V4DImode, d-op0),
 +   gen_lowpart (V4DImode, d-op1),
 +   GEN_INT (33)));
 +  if (min  nelt / 2)
 +emit_insn (gen_avx2_palignrv2ti (target,
 +gen_lowpart (V2TImode, tmp),
 +gen_lowpart (V2TImode, d-op0),
 +shift));
 +  else
 +   emit_insn (gen_avx2_palignrv2ti (target,
 +gen_lowpart (V2TImode, d-op1),
 +gen_lowpart (V2TImode, tmp),
 +shift1));
 +}
 +

dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target);
dcopy.one_operand_p = true;

 On Tue, Apr 29, 2014 at 1:03 AM, Richard Henderson r...@redhat.com wrote:
 On 04/28/2014 01:43 PM, Evgeny Stupachenko wrote:
 Agree on checks:

   /* PALIGNR of 2 128-bits registers takes only 1 instrucion.
  Requires SSSE3.  */
   if (GET_MODE_SIZE (d-vmode) == 16)
 {
   if(!TARGET_SSSE3)
 return false;
 }
   /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
  PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.  */
   else if (GET_MODE_SIZE (d-vmode) == 32)
 {
   if(!TARGET_AVX2)
 return false;
 }
   else
 return false;

 Thanks, much better.


 r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-05-19 Thread Richard Henderson

On 05/05/2014 09:49 AM, Evgeny Stupachenko wrote:
 @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
if (expand_vec_perm_pshufb (d))
  return true;
 
 +  /* Try the AVX2 vpshufb.  */
 +  if (expand_vec_perm_vpshufb2_vpermq (d))
 +return true;

Why is this here?  It doesn't expand to 1 insn, which is
what expand_vec_perm_1 is intended to check.

It's already called from ix86_expand_vec_perm_const_1.
If things need to be shuffled around in that function,
then that's the right place to do so.

It's also clearly unrelated to palignr, so has no
business at all within this patch.

 -  min = nelt, max = 0;
 +  min = 2 * nelt, max = 0;

This change to min is wrong for this patch.  It probably
belonged in your 2/2 patch.

 +  shift1 = GEN_INT ((min - nelt / 2) *
 +  GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));

Coding convention sez

  shift1 = GEN_INT ((min - nelt / 2)
* GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));


r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-05-05 Thread Evgeny Stupachenko

Is the following patch ok? It passes bootstrap and make check.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 88142a8..91f6f21 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
   return true;
 }

+static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
in a single instruction.  */

@@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
   if (expand_vec_perm_pshufb (d))
 return true;

+  /* Try the AVX2 vpshufb.  */
+  if (expand_vec_perm_vpshufb2_vpermq (d))
+return true;
+
   /* Try the AVX512F vpermi2 instructions.  */
   rtx vec[64];
   enum machine_mode mode = d-vmode;
@@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
expand_vec_perm_d *d)
 }

 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
-   the permutation using the SSSE3 palignr instruction.  This succeeds
+   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
when all of the elements in PERM fit within one vector and we merely
need to shift them down so that a single vector permutation has a
chance to succeed.  */
@@ -43015,14 +43021,26 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
   unsigned i, nelt = d-nelt;
   unsigned min, max;
   bool in_order, ok;
-  rtx shift, target;
+  rtx shift, shift1, target, tmp;
   struct expand_vec_perm_d dcopy;

-  /* Even with AVX, palignr only operates on 128-bit vectors.  */
-  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
+  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
+  if (GET_MODE_SIZE (d-vmode) == 16)
+{
+  if (!TARGET_SSSE3)
+   return false;
+}
+  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
+  else if (GET_MODE_SIZE (d-vmode) == 32)
+{
+  if (!TARGET_AVX2)
+   return false;
+}
+  /* Other sizes are not supported.  */
+  else
 return false;

-  min = nelt, max = 0;
+  min = 2 * nelt, max = 0;
   for (i = 0; i  nelt; ++i)
 {
   unsigned e = d-perm[i];
@@ -43041,9 +43059,35 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)

   dcopy = *d;
   shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
-  target = gen_reg_rtx (TImode);
-  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
- gen_lowpart (TImode, d-op0), shift));
+  shift1 = GEN_INT ((min - nelt / 2) *
+  GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
+
+  if (GET_MODE_SIZE (d-vmode) != 32)
+{
+  target = gen_reg_rtx (TImode);
+  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
+ gen_lowpart (TImode, d-op0), shift));
+}
+  else
+{
+  target = gen_reg_rtx (V2TImode);
+  tmp = gen_reg_rtx (V4DImode);
+  emit_insn (gen_avx2_permv2ti (tmp,
+   gen_lowpart (V4DImode, d-op0),
+   gen_lowpart (V4DImode, d-op1),
+   GEN_INT (33)));
+  if (min  nelt / 2)
+emit_insn (gen_avx2_palignrv2ti (target,
+gen_lowpart (V2TImode, tmp),
+gen_lowpart (V2TImode, d-op0),
+shift));
+  else
+   emit_insn (gen_avx2_palignrv2ti (target,
+gen_lowpart (V2TImode, d-op1),
+gen_lowpart (V2TImode, tmp),
+shift1));
+}
+

   dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target);
   dcopy.one_operand_p = true;

On Tue, Apr 29, 2014 at 1:03 AM, Richard Henderson r...@redhat.com wrote:
 On 04/28/2014 01:43 PM, Evgeny Stupachenko wrote:
 Agree on checks:

   /* PALIGNR of 2 128-bits registers takes only 1 instrucion.
  Requires SSSE3.  */
   if (GET_MODE_SIZE (d-vmode) == 16)
 {
   if(!TARGET_SSSE3)
 return false;
 }
   /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
  PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.  */
   else if (GET_MODE_SIZE (d-vmode) == 32)
 {
   if(!TARGET_AVX2)
 return false;
 }
   else
 return false;

 Thanks, much better.


 r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-04-28 Thread H.J. Lu

On Mon, Apr 28, 2014 at 9:48 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Hi,

 The patch enables use of palignr with perm instead of 2 pshufb, or
 and perm at AVX2 for some cases.

 Bootstrapped and passes make check on x86.

 Is it ok?

 2014-04-28  Evgeny Stupachenko  evstu...@gmail.com

 * config/i386/i386.c (expand_vec_perm_1): Try AVX2 vpshufb.
 * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
 PALINGR instruction.

Can you add testcases to verify that AVX2 vpshufb and paligngr are
properly generated?

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 88142a8..ae80477 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
return true;
  }

 +static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
 +
  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
 in a single instruction.  */

 @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
if (expand_vec_perm_pshufb (d))
  return true;

 +  /* Try the AVX2 vpshufb.  */
 +  if (expand_vec_perm_vpshufb2_vpermq (d))
 +return true;
 +
/* Try the AVX512F vpermi2 instructions.  */
rtx vec[64];
enum machine_mode mode = d-vmode;
 @@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
 expand_vec_perm_d *d)
  }

  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
 -   the permutation using the SSSE3 palignr instruction.  This succeeds
 +   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
 when all of the elements in PERM fit within one vector and we merely
 need to shift them down so that a single vector permutation has a
 chance to succeed.  */
 @@ -43015,14 +43021,20 @@ expand_vec_perm_palignr (struct expand_vec_perm_d 
 *d)
unsigned i, nelt = d-nelt;
unsigned min, max;
bool in_order, ok;
 -  rtx shift, target;
 +  rtx shift, shift1, target, tmp;
struct expand_vec_perm_d dcopy;

 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
 + PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
 + PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
 +  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d-vmode) != 16 

   ^^^ This should
be only on the next lined.

 +  GET_MODE_SIZE (d-vmode) != 32))
 +return false;
 +  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
 +  if (!TARGET_AVX2  (GET_MODE_SIZE (d-vmode) == 32))
 ^ No need for '(' here.

Or you can use

if ((!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
  (!TARGET_AVX2 || GET_MODE_SIZE (d-vmode) != 32))

  return false;

 -  min = nelt, max = 0;
 +  min = 2 * nelt, max = 0;
^^  Will this change 128bit vector code?

for (i = 0; i  nelt; ++i)
  {
unsigned e = d-perm[i];
 @@ -43041,9 +43053,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)

dcopy = *d;
shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
 -  target = gen_reg_rtx (TImode);
 -  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
 - gen_lowpart (TImode, d-op0), shift));
 +  shift1 = GEN_INT ((min - nelt / 2) *
 +  GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
 +
 +  if (GET_MODE_SIZE (d-vmode) != 32)
 +{
 +  target = gen_reg_rtx (TImode);
 +  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
 + gen_lowpart (TImode, d-op0), shift));
 +}
 +  else
 +{
 +  target = gen_reg_rtx (V2TImode);
 +  tmp = gen_reg_rtx (V4DImode);
 +  emit_insn (gen_avx2_permv2ti (tmp,
 +   gen_lowpart (V4DImode, d-op0),
 +   gen_lowpart (V4DImode, d-op1),
 +   GEN_INT (33)));
 +  if (min  nelt / 2)
 +emit_insn (gen_avx2_palignrv2ti (target,
 +gen_lowpart (V2TImode, tmp),
 +gen_lowpart (V2TImode, d-op0),
 +shift));
 +  else
 +   emit_insn (gen_avx2_palignrv2ti (target,
 +gen_lowpart (V2TImode, d-op1),
 +gen_lowpart (V2TImode, tmp),
 +shift1));
 +}

dcopy.op0 = dcopy.op1 = gen_lowpart (d-vmode, target);
dcopy.one_operand_p = true;


 Evgeny



-- 
H.J.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-04-28 Thread Richard Henderson

On 04/28/2014 09:48 AM, Evgeny Stupachenko wrote:
 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
 + PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
 + PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
 +  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d-vmode) != 16 
 +  GET_MODE_SIZE (d-vmode) != 32))
 +return false;
 +  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
 +  if (!TARGET_AVX2  (GET_MODE_SIZE (d-vmode) == 32))
  return false;

This is confusingly written.

How about

  if (GET_MODE_SIZE (d-vmode) == 16)
{
  if (!TARGET_SSSE3)
return false;
}
  else if (GET_MODE_SIZE (d-vmode) == 32)
{
  if (!TARGET_AVX2)
return false;
}
  else
return false;

With the comments added into the right places.


r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-04-28 Thread Evgeny Stupachenko

On Mon, Apr 28, 2014 at 9:08 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Apr 28, 2014 at 9:48 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 Hi,

 The patch enables use of palignr with perm instead of 2 pshufb, or
 and perm at AVX2 for some cases.

 Bootstrapped and passes make check on x86.

 Is it ok?

 2014-04-28  Evgeny Stupachenko  evstu...@gmail.com

 * config/i386/i386.c (expand_vec_perm_1): Try AVX2 vpshufb.
 * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
 PALINGR instruction.

 Can you add testcases to verify that AVX2 vpshufb and paligngr are
 properly generated?

One of next patches will have test case.


 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 88142a8..ae80477 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
return true;
  }

 +static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
 +
  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
 in a single instruction.  */

 @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
if (expand_vec_perm_pshufb (d))
  return true;

 +  /* Try the AVX2 vpshufb.  */
 +  if (expand_vec_perm_vpshufb2_vpermq (d))
 +return true;
 +
/* Try the AVX512F vpermi2 instructions.  */
rtx vec[64];
enum machine_mode mode = d-vmode;
 @@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
 expand_vec_perm_d *d)
  }

  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
 -   the permutation using the SSSE3 palignr instruction.  This succeeds
 +   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
 when all of the elements in PERM fit within one vector and we merely
 need to shift them down so that a single vector permutation has a
 chance to succeed.  */
 @@ -43015,14 +43021,20 @@ expand_vec_perm_palignr (struct expand_vec_perm_d 
 *d)
unsigned i, nelt = d-nelt;
unsigned min, max;
bool in_order, ok;
 -  rtx shift, target;
 +  rtx shift, shift1, target, tmp;
struct expand_vec_perm_d dcopy;

 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
 + PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
 + PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
 +  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d-vmode) != 16 

^^^ This should
 be only on the next lined.

 +  GET_MODE_SIZE (d-vmode) != 32))
 +return false;
 +  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
 +  if (!TARGET_AVX2  (GET_MODE_SIZE (d-vmode) == 32))
  ^ No need for '(' here.

 Or you can use

 if ((!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
   (!TARGET_AVX2 || GET_MODE_SIZE (d-vmode) != 32))

  return false;

 -  min = nelt, max = 0;
 +  min = 2 * nelt, max = 0;
 ^^  Will this change 128bit vector code?

No. The only case when all elements in permutation constant are equal
or greater than nelt will lead to canonization to one operand case
with all elements less than nelt. So the change actually affects
nothing, but still more accurate.


for (i = 0; i  nelt; ++i)
  {
unsigned e = d-perm[i];
 @@ -43041,9 +43053,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d 
 *d)

dcopy = *d;
shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
 -  target = gen_reg_rtx (TImode);
 -  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
 - gen_lowpart (TImode, d-op0), shift));
 +  shift1 = GEN_INT ((min - nelt / 2) *
 +  GET_MODE_BITSIZE (GET_MODE_INNER (d-vmode)));
 +
 +  if (GET_MODE_SIZE (d-vmode) != 32)
 +{
 +  target = gen_reg_rtx (TImode);
 +  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d-op1),
 + gen_lowpart (TImode, d-op0), shift));
 +}
 +  else
 +{
 +  target = gen_reg_rtx (V2TImode);
 +  tmp = gen_reg_rtx (V4DImode);
 +  emit_insn (gen_avx2_permv2ti (tmp,
 +   gen_lowpart (V4DImode, d-op0),
 +   gen_lowpart (V4DImode, d-op1),
 +   GEN_INT (33)));
 +  if (min  nelt / 2)
 +emit_insn (gen_avx2_palignrv2ti (target,
 +gen_lowpart (V2TImode, tmp),
 +gen_lowpart (V2TImode, d-op0),
 +shift));
 +  else
 +   emit_insn (gen_avx2_palignrv2ti (target,
 +gen_lowpart (V2TImode, d-op1),
 +gen_lowpart (V2TImode,

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-04-28 Thread Evgeny Stupachenko

Agree on checks:

  /* PALIGNR of 2 128-bits registers takes only 1 instrucion.
 Requires SSSE3.  */
  if (GET_MODE_SIZE (d-vmode) == 16)
{
  if(!TARGET_SSSE3)
return false;
}
  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
 PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.  */
  else if (GET_MODE_SIZE (d-vmode) == 32)
{
  if(!TARGET_AVX2)
return false;
}
  else
return false;


On Mon, Apr 28, 2014 at 9:32 PM, Richard Henderson r...@redhat.com wrote:
 On 04/28/2014 09:48 AM, Evgeny Stupachenko wrote:
 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
 + PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
 + PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
 +  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d-vmode) != 16 
 +  GET_MODE_SIZE (d-vmode) != 32))
 +return false;
 +  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
 +  if (!TARGET_AVX2  (GET_MODE_SIZE (d-vmode) == 32))
  return false;

 This is confusingly written.

 How about

   if (GET_MODE_SIZE (d-vmode) == 16)
 {
   if (!TARGET_SSSE3)
 return false;
 }
   else if (GET_MODE_SIZE (d-vmode) == 32)
 {
   if (!TARGET_AVX2)
 return false;
 }
   else
 return false;

 With the comments added into the right places.


 r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-04-28 Thread Richard Henderson

On 04/28/2014 01:43 PM, Evgeny Stupachenko wrote:
 Agree on checks:
 
   /* PALIGNR of 2 128-bits registers takes only 1 instrucion.
  Requires SSSE3.  */
   if (GET_MODE_SIZE (d-vmode) == 16)
 {
   if(!TARGET_SSSE3)
 return false;
 }
   /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
  PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.  */
   else if (GET_MODE_SIZE (d-vmode) == 32)
 {
   if(!TARGET_AVX2)
 return false;
 }
   else
 return false;

Thanks, much better.


r~

39 matches

Mail list logo