[Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand

2021-01-12 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #10 from Jakub Jelinek  ---
.

[Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand

2020-12-02 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #9 from Hongtao.liu  ---
Fixed in GCC11.

[Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand

2020-12-02 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #8 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:70310982492071f98eacdac0747521769b0f0328

commit r11-5697-g70310982492071f98eacdac0747521769b0f0328
Author: liuhongt 
Date:   Mon Nov 30 13:27:16 2020 +0800

Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnleuw [PR96906]

For signed comparisons, it handles cases that are eq or neq to 0.
For unsigned comparisons, it additionaly handles cases that are le or
gt to 0(equivilent to eq or neq to 0). Transform case eq to leu,
case neq to gtu.

.i.e. for -mavx512bw -mavx512vl transform eq case code from

vpsubusw%xmm1, %xmm0, %xmm0
vpxor   %xmm1, %xmm1, %xmm1
vpcmpeqw  %xmm1, %xmm0, %k0
to
vpcmpleuw   %xmm1, %xmm0, %k0

.i.e. for -mavx512bw -mavx512vl transform neq case code from

vpsubusw%xmm1, %xmm0, %xmm0
vpxor   %xmm1, %xmm1, %xmm1
vpcmpneqw  %xmm1, %xmm0, %k0
to
vpcmpnleuw   %xmm1, %xmm0, %k0

gcc/ChangeLog
PR target/96906
* config/i386/sse.md
(_ucmp3): Add a new
define_split after this insn.

gcc/testsuite/ChangeLog

* gcc.target/i386/avx512bw-pr96906-1.c: New test.
* gcc.target/i386/pr96906-1.c: Add -mno-avx512f.

[Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand

2020-11-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #7 from Hongtao.liu  ---
(In reply to Jakub Jelinek from comment #6)
> Implemented now for non-AVX512*.  Hongtao, do you think you could have a
> look at the avx512{bw,vl}/avx512bw splitter(s)?

Yes, i'll do it. Thanks for the patch.

[Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand

2020-11-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #6 from Jakub Jelinek  ---
Implemented now for non-AVX512*.  Hongtao, do you think you could have a look
at the avx512{bw,vl}/avx512bw splitter(s)?

[Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand

2020-11-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:32b0abb24b8702ec9954448739682ace6fa5ccf5

commit r11-5398-g32b0abb24b8702ec9954448739682ace6fa5ccf5
Author: Jakub Jelinek 
Date:   Thu Nov 26 08:44:15 2020 +0100

i386: Optimize psubusw compared to 0 into pminuw compared to op0 [PR96906]

The following patch renames VI12_AVX2 iterator to VI12_AVX2_AVX512BW
for consistency with some other iterators, as I need VI12_AVX2 without
AVX512BW for this change.
The real meat is a combiner split which combine
can use to optimize psubusw compared to 0 into pminuw compared to op0
(and similarly for psubusb compared to 0 into pminub compared to op0).
According to Agner Fog's tables, psubus[bw] and pminu[bw] timings
are the same, but the advantage of pminu[bw] is that the comparison
doesn't need a zero operand, so e.g. for -msse4.1 it causes changes like
-   psubusw %xmm1, %xmm0
-   pxor%xmm1, %xmm1
+   pminuw  %xmm0, %xmm1
pcmpeqw %xmm1, %xmm0
and similarly for avx2:
-   vpsubusb%ymm1, %ymm0, %ymm0
-   vpxor   %xmm1, %xmm1, %xmm1
-   vpcmpeqb%ymm1, %ymm0, %ymm0
+   vpminub %ymm1, %ymm0, %ymm1
+   vpcmpeqb%ymm0, %ymm1, %ymm0

I haven't done the AVX512{BW,VL} define_split(s), they'll need
to match the UNSPEC_PCMP which are used for avx512 comparisons.

2020-11-26  Jakub Jelinek  

PR target/96906
* config/i386/sse.md (VI12_AVX2): Remove V64QI/V32HI modes.
(VI12_AVX2_AVX512BW): New mode iterator.
(_3,
uavg3_ceil, _uavg3): Use
VI12_AVX2_AVX512BW iterator instead of VI12_AVX2.
(*_3): Likewise.
(*_uavg3): Likewise.
(*_3): Add a new
define_split after this insn.

* gcc.target/i386/pr96906-1.c: New test.

[Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand

2020-11-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Created attachment 49621
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49621&action=edit
gcc11-pr96906.patch

Looking over Agner Fog's table, pminus[bw] and psubus[bw] seems to have the
same timing.  This untested patch does the optimization in the combiner for
SSE2/SSE4.1/AVX2, but handling AVX512BW and AVX512BW+AVX512VL will need further
define_insn_and_split patterns I don't have cycles for right now (match the
unspec comparisons in there).

[Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand

2020-09-02 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #3 from Hongtao.liu  ---
(gdb) f 6 
#6  0x00f04a33 in expand_vect_cond_optab_fn (stmt=0x7fffea1193f0,
optab=vcond_optab) at ../../../gcc/gnu-toolchain/master/gcc/internal-fn.c:2612
2612  expand_insn (icode, 6, ops);
(gdb) p icode
$22 = CODE_FOR_vcondv8hiv8hi

Shouldn't CODE_FOR_vconduv8hiv8hi be used? then sign info could be handled by
extra param in ix86_expand_int_vcond.

[Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand

2020-09-02 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #2 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #1)
> For vector compare to integer mask, gcc use UNSPEC, that's why it's not
> handled by general part, maybe peephole2 should be added for this.

Or in ix86_expand_int_vcond

[Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand

2020-09-02 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #1 from Hongtao.liu  ---
For vector compare to integer mask, gcc use UNSPEC, that's why it's not handled
by general part, maybe peephole2 should be added for this.