[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-07-28 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |13.0

--- Comment #9 from H.J. Lu  ---
Fixed for GCC 13.

[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-05-12 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #8 from Haochen Jiang  ---
Fixed for GCC 13.

[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-05-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Hongyu Wang :

https://gcc.gnu.org/g:3c9364f29e7e47eb9de33f2d8843d5b00284ceca

commit r13-338-g3c9364f29e7e47eb9de33f2d8843d5b00284ceca
Author: Haochen Jiang 
Date:   Tue Feb 8 10:51:26 2022 +0800

i386: Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0x to
ptest.

gcc/ChangeLog:

PR target/104371
* config/i386/sse.md (vi1avx2const): New define_mode_attr.
(pxor/pcmpeqb/pmovmskb/cmp 0x to ptest splitter):
New define_split pattern.

gcc/testsuite/ChangeLog:

PR target/104371
* gcc.target/i386/pr104371-1.c: New test.
* gcc.target/i386/pr104371-2.c: Ditto.

[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-03-31 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

Haochen Jiang  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #6 from Haochen Jiang  ---
Created attachment 52723
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52723=edit
This patch aims to optimize pxor+pcmpeqb+pmovmskb+cmp 0x pattern to ptest

I fixed that through this patch. Regtested on x86_64-pc-linux-gnu.

Currently hold for Stage 1 of GCC 13

If this is ok, could you help me to add block to PR105073? Thx.

[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-02-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #5 from Hongtao.liu  ---
(In reply to Richard Biener from comment #1)
>[local count: 1073741824]:
>   _2 = VIEW_CONVERT_EXPR<__v16qi>(x_3(D));
>   _6 = _2 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
>   _7 = VIEW_CONVERT_EXPR(_6);
>   _4 = __builtin_ia32_pmovmskb128 (_7);
>   _5 = _4 == 65535;
>   return _5;
> 
> so likely one reason is the builtin and later UNSPEC for the movemask
> operation.
> 

Under AVX512BW & AVX512VL we can fold __builtin_ia32_pmovmskb128 to 

vector(16)  temp_1 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0 }
temp_2 = VIEW_CONVERT_EXPR temp_1; - 
_4 = zero_extend temp_2;

but I'm not sure if VIEW_CONVERT_EXPR can be used between vector and integer
type.

[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-02-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #4 from Hongtao.liu  ---
Failed to match this instruction:
(set (reg:CCZ 17 flags)
(compare:CCZ (unspec:SI [
(eq:V16QI (subreg:V16QI (reg:V2DI 94) 0)
(const_vector:V16QI [
(const_int 0 [0]) repeated x16
]))
] UNSPEC_MOVMSK)
(const_int 65535 [0x])))

This can be optimized to ptest as long as only CCZ is cared.

[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-02-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #3 from Hongtao.liu  ---
Similar for

#include
bool is_zero256(__m256i x)
{
return _mm256_movemask_epi8(_mm256_cmpeq_epi8(x, _mm256_setzero_si256()))
== 0x;
}

[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-02-04 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #2 from Gabriel Ravier  ---
Although I agree the pattern doesn't seem that useful at first, I've seen it
crop up in several places, such as:

- in pixman: https://github.com/servo/pixman/blob/master/pixman/pixman-sse2.c
on line 181
- in an simd mandelbrot implementation:
https://github.com/huonw/mandel-simd/blob/master/mandel_sse2.c on line 47
- in this article:
http://0x80.pl/notesen/2021-02-02-all-bytes-in-reg-are-equal.html
- in boost::uuid (although this one will detect if compiling on a platform with
SSE4.1):
https://github.com/boostorg/uuid/blob/develop/include/boost/uuid/detail/uuid_x86.ipp
- in this other article:
https://mischasan.wordpress.com/2011/11/09/the-generic-sse2-loop/
- in a research paper's accompanying github repo:
https://github.com/GameTechDev/MaskedOcclusionCulling/blob/master/MaskedOcclusionCulling.cpp
on line 333
- in ClickHouse:
https://clickhouse.com/codebrowser/html_report/ClickHouse/src/Common/memcmpSmall.h.html
on line 241

And this is just what I found in a few minutes, so I would personally think
there are many more occurences of that pattern.

[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-02-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #1 from Richard Biener  ---
   [local count: 1073741824]:
  _2 = VIEW_CONVERT_EXPR<__v16qi>(x_3(D));
  _6 = _2 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _7 = VIEW_CONVERT_EXPR(_6);
  _4 = __builtin_ia32_pmovmskb128 (_7);
  _5 = _4 == 65535;
  return _5;

so likely one reason is the builtin and later UNSPEC for the movemask
operation.

combine does try the following though

Trying 8, 11, 13 -> 14:
8: r92:V16QI=r89:V16QI==r96:V2DI#0
  REG_DEAD r96:V2DI
  REG_DEAD r89:V16QI
   11: r88:SI=unspec[r92:V16QI] 44
  REG_DEAD r92:V16QI
   13: flags:CCZ=cmp(r88:SI,0x)
  REG_DEAD r88:SI
   14: r95:QI=flags:CCZ==0
  REG_DEAD flags:CCZ
Failed to match this instruction:
(set (reg:QI 95)
(eq:QI (unspec:SI [
(eq:V16QI (reg:V16QI 89)
(subreg:V16QI (reg:V2DI 96) 0))
] UNSPEC_MOVMSK)
(const_int 65535 [0x])))

of course I have my doubts the pattern is a useful one to optimize.

[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-02-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement