[Bug target/88547] missed optimization for vector comparisons

2019-01-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88547

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Jakub Jelinek  ---
Fixed.

[Bug target/88547] missed optimization for vector comparisons

2018-12-21 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88547

--- Comment #7 from Jakub Jelinek  ---
Author: jakub
Date: Fri Dec 21 10:37:11 2018
New Revision: 267322

URL: https://gcc.gnu.org/viewcvs?rev=267322=gcc=rev
Log:
PR target/88547
* config/i386/i386.c (ix86_expand_int_sse_cmp): Optimize
x > y ? 0 : -1 into min (x, y) == x ? -1 : 0.

* gcc.target/i386/pr88547-1.c: Expect only 2 knotb and 2 knotw
insns instead of 4, check for vpminud, vpminuq and no vpsubd or
vpsubq.
* gcc.target/i386/sse2-pr88547-1.c: New test.
* gcc.target/i386/sse2-pr88547-2.c: New test.
* gcc.target/i386/sse4_1-pr88547-1.c: New test.
* gcc.target/i386/sse4_1-pr88547-2.c: New test.
* gcc.target/i386/avx2-pr88547-1.c: New test.
* gcc.target/i386/avx2-pr88547-2.c: New test.
* gcc.target/i386/avx512f-pr88547-2.c: New test.
* gcc.target/i386/avx512vl-pr88547-1.c: New test.
* gcc.target/i386/avx512vl-pr88547-2.c: New test.
* gcc.target/i386/avx512vl-pr88547-3.c: New test.
* gcc.target/i386/avx512f_cond_move.c (y): Change from unsigned int
array to int array.

Added:
trunk/gcc/testsuite/gcc.target/i386/avx2-pr88547-1.c
trunk/gcc/testsuite/gcc.target/i386/avx2-pr88547-2.c
trunk/gcc/testsuite/gcc.target/i386/avx512f-pr88547-2.c
trunk/gcc/testsuite/gcc.target/i386/avx512vl-pr88547-1.c
trunk/gcc/testsuite/gcc.target/i386/avx512vl-pr88547-2.c
trunk/gcc/testsuite/gcc.target/i386/avx512vl-pr88547-3.c
trunk/gcc/testsuite/gcc.target/i386/sse2-pr88547-1.c
trunk/gcc/testsuite/gcc.target/i386/sse2-pr88547-2.c
trunk/gcc/testsuite/gcc.target/i386/sse4_1-pr88547-1.c
trunk/gcc/testsuite/gcc.target/i386/sse4_1-pr88547-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c
trunk/gcc/testsuite/gcc.target/i386/pr88547-1.c

[Bug target/88547] missed optimization for vector comparisons

2018-12-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88547

--- Comment #6 from Jakub Jelinek  ---
Created attachment 45274
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45274=edit
gcc9-pr88547.patch

Untested patch for the rest.  Richard, is that what you had in mind?

[Bug target/88547] missed optimization for vector comparisons

2018-12-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88547

--- Comment #5 from Jakub Jelinek  ---
Author: jakub
Date: Thu Dec 20 07:58:02 2018
New Revision: 267293

URL: https://gcc.gnu.org/viewcvs?rev=267293=gcc=rev
Log:
PR target/88547
* config/i386/i386.c (ix86_expand_sse_movcc): For maskcmp, try to
emit vpmovm2? instruction perhaps after knot?.  Reorganize code
so that it doesn't have to test !maskcmp in almost every conditional.

* gcc.target/i386/pr88547-1.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr88547-1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/testsuite/ChangeLog

[Bug target/88547] missed optimization for vector comparisons

2018-12-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88547

--- Comment #4 from Jakub Jelinek  ---
Created attachment 45264
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45264=edit
gcc9-pr88547-1.patch

Untested patch to improve the avx512* sse_movcc.

[Bug target/88547] missed optimization for vector comparisons

2018-12-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88547

--- Comment #3 from Jakub Jelinek  ---
For 64-byte vectors, we emit
vpcmpgtb%zmm1, %zmm0, %k1
vpxor   %xmm1, %xmm1, %xmm1
vpternlogd  $0xFF, %zmm0, %zmm0, %zmm0
vmovdqu8%zmm1, %zmm0{%k1}
for f1, perhaps it would be better to emit:
vpcmpgtb%zmm1, %zmm0, %k1
knotq   %k1, %k1
vpmovm2b%k1, %zmm0
?

[Bug target/88547] missed optimization for vector comparisons

2018-12-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88547

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
More complete testcase:
typedef signed char v16qi __attribute__((vector_size(16)));
typedef unsigned char v16uqi __attribute__((vector_size(16)));
typedef short v8hi __attribute__((vector_size(16)));
typedef unsigned short v8uhi __attribute__((vector_size(16)));
typedef int v4si __attribute__((vector_size(16)));
typedef unsigned v4usi __attribute__((vector_size(16)));
typedef long long v2di __attribute__((vector_size(16)));
typedef unsigned long long v2udi __attribute__((vector_size(16)));

v16qi
f1 (v16qi x, v16qi y)
{
  return x <= y;
}

v16qi
f1a (v16qi x, v16qi y)
{
  return x < y;
}

v16uqi
f2 (v16uqi x, v16uqi y)
{
  return x <= y;
}

v16qi
f3 (v16qi x, v16qi y)
{
  return x >= y;
}

v16uqi
f4 (v16uqi x, v16uqi y)
{
  return x >= y;
}

v8hi
f5 (v8hi x, v8hi y)
{
  return x <= y;
}

v8uhi
f6 (v8uhi x, v8uhi y)
{
  return x <= y;
}

v8hi
f7 (v8hi x, v8hi y)
{
  return x >= y;
}

v8uhi
f8 (v8uhi x, v8uhi y)
{
  return x >= y;
}

v4si
f9 (v4si x, v4si y)
{
  return x <= y;
}

v4usi
f10 (v4usi x, v4usi y)
{
  return x <= y;
}

v4si
f11 (v4si x, v4si y)
{
  return x >= y;
}

v4usi
f12 (v4usi x, v4usi y)
{
  return x >= y;
}

v2di
f13 (v2di x, v2di y)
{
  return x <= y;
}

v2udi
f14 (v2udi x, v2udi y)
{
  return x <= y;
}

v2di
f15 (v2di x, v2di y)
{
  return x >= y;
}

v2udi
f16 (v2udi x, v2udi y)
{
  return x >= y;
}

plus of course we need a 32-byte and 64-byte vector variant, and test with
-msse4.1 (the first one to have pmin{s,u}b, -mavx, -mavx2, -mavx512*.

I think it could be done in ix86_expand_int_sse_cmp or in ix86_expand_int_vcond
- perhaps only for the cases where one of the vcond operands is all ones and
the other one is zero, notice that depending on which one is which the negation
is 2 instructions (though, only if we don't hoist the constant load e.g. before
a loop) and that for TARGET_SSE4_1 we can use the minimum or maximum.

[Bug target/88547] missed optimization for vector comparisons

2018-12-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88547

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||x86_64-*-*, i?86-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-12-19
Version|unknown |9.0
 Ever confirmed|0   |1
   Severity|normal  |enhancement

--- Comment #1 from Richard Biener  ---
Nice.  Patch?