[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371 H.J. Lu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED Target Milestone|--- |13.0 --- Comment #9 from H.J. Lu --- Fixed for GCC 13.
[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371 --- Comment #8 from Haochen Jiang --- Fixed for GCC 13.
[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371 --- Comment #7 from CVS Commits --- The master branch has been updated by Hongyu Wang : https://gcc.gnu.org/g:3c9364f29e7e47eb9de33f2d8843d5b00284ceca commit r13-338-g3c9364f29e7e47eb9de33f2d8843d5b00284ceca Author: Haochen Jiang Date: Tue Feb 8 10:51:26 2022 +0800 i386: Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0x to ptest. gcc/ChangeLog: PR target/104371 * config/i386/sse.md (vi1avx2const): New define_mode_attr. (pxor/pcmpeqb/pmovmskb/cmp 0x to ptest splitter): New define_split pattern. gcc/testsuite/ChangeLog: PR target/104371 * gcc.target/i386/pr104371-1.c: New test. * gcc.target/i386/pr104371-2.c: Ditto.
[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371 Haochen Jiang changed: What|Removed |Added CC||haochen.jiang at intel dot com --- Comment #6 from Haochen Jiang --- Created attachment 52723 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52723=edit This patch aims to optimize pxor+pcmpeqb+pmovmskb+cmp 0x pattern to ptest I fixed that through this patch. Regtested on x86_64-pc-linux-gnu. Currently hold for Stage 1 of GCC 13 If this is ok, could you help me to add block to PR105073? Thx.
[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371 --- Comment #5 from Hongtao.liu --- (In reply to Richard Biener from comment #1) >[local count: 1073741824]: > _2 = VIEW_CONVERT_EXPR<__v16qi>(x_3(D)); > _6 = _2 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > _7 = VIEW_CONVERT_EXPR(_6); > _4 = __builtin_ia32_pmovmskb128 (_7); > _5 = _4 == 65535; > return _5; > > so likely one reason is the builtin and later UNSPEC for the movemask > operation. > Under AVX512BW & AVX512VL we can fold __builtin_ia32_pmovmskb128 to vector(16) temp_1 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } temp_2 = VIEW_CONVERT_EXPR temp_1; - _4 = zero_extend temp_2; but I'm not sure if VIEW_CONVERT_EXPR can be used between vector and integer type.
[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371 --- Comment #4 from Hongtao.liu --- Failed to match this instruction: (set (reg:CCZ 17 flags) (compare:CCZ (unspec:SI [ (eq:V16QI (subreg:V16QI (reg:V2DI 94) 0) (const_vector:V16QI [ (const_int 0 [0]) repeated x16 ])) ] UNSPEC_MOVMSK) (const_int 65535 [0x]))) This can be optimized to ptest as long as only CCZ is cared.
[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371 --- Comment #3 from Hongtao.liu --- Similar for #include bool is_zero256(__m256i x) { return _mm256_movemask_epi8(_mm256_cmpeq_epi8(x, _mm256_setzero_si256())) == 0x; }
[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371 --- Comment #2 from Gabriel Ravier --- Although I agree the pattern doesn't seem that useful at first, I've seen it crop up in several places, such as: - in pixman: https://github.com/servo/pixman/blob/master/pixman/pixman-sse2.c on line 181 - in an simd mandelbrot implementation: https://github.com/huonw/mandel-simd/blob/master/mandel_sse2.c on line 47 - in this article: http://0x80.pl/notesen/2021-02-02-all-bytes-in-reg-are-equal.html - in boost::uuid (although this one will detect if compiling on a platform with SSE4.1): https://github.com/boostorg/uuid/blob/develop/include/boost/uuid/detail/uuid_x86.ipp - in this other article: https://mischasan.wordpress.com/2011/11/09/the-generic-sse2-loop/ - in a research paper's accompanying github repo: https://github.com/GameTechDev/MaskedOcclusionCulling/blob/master/MaskedOcclusionCulling.cpp on line 333 - in ClickHouse: https://clickhouse.com/codebrowser/html_report/ClickHouse/src/Common/memcmpSmall.h.html on line 241 And this is just what I found in a few minutes, so I would personally think there are many more occurences of that pattern.
[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371 --- Comment #1 from Richard Biener --- [local count: 1073741824]: _2 = VIEW_CONVERT_EXPR<__v16qi>(x_3(D)); _6 = _2 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; _7 = VIEW_CONVERT_EXPR(_6); _4 = __builtin_ia32_pmovmskb128 (_7); _5 = _4 == 65535; return _5; so likely one reason is the builtin and later UNSPEC for the movemask operation. combine does try the following though Trying 8, 11, 13 -> 14: 8: r92:V16QI=r89:V16QI==r96:V2DI#0 REG_DEAD r96:V2DI REG_DEAD r89:V16QI 11: r88:SI=unspec[r92:V16QI] 44 REG_DEAD r92:V16QI 13: flags:CCZ=cmp(r88:SI,0x) REG_DEAD r88:SI 14: r95:QI=flags:CCZ==0 REG_DEAD flags:CCZ Failed to match this instruction: (set (reg:QI 95) (eq:QI (unspec:SI [ (eq:V16QI (reg:V16QI 89) (subreg:V16QI (reg:V2DI 96) 0)) ] UNSPEC_MOVMSK) (const_int 65535 [0x]))) of course I have my doubts the pattern is a useful one to optimize.
[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement