[Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973 --- Comment #10 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:5322f009e8f7d1c7a1c9aab7cb4c90c433398fdd commit r14-2030-g5322f009e8f7d1c7a1c9aab7cb4c90c433398fdd Author: Roger Sayle Date: Thu Jun 22 07:43:07 2023 +0100 i386: Convert ptestz of pandn into ptestc. This patch is the next installment in a set of backend patches around improvements to ptest/vptest. A previous patch optimized the sequence t=pand(x,y); ptestz(t,t) into the equivalent ptestz(x,y), using the property that ZF is set to (X) == 0. This patch performs a similar transformation, converting t=pandn(x,y); ptestz(t,t) into the (almost) equivalent ptestc(y,x), using the property that the CF flags is set to (~X) == 0. The tricky bit is that this sets the CF flag instead of the ZF flag, so we can only perform this transformation when we can also convert the flags consumer, as well as the producer. For the test case: int foo (__m128i x, __m128i y) { __m128i a = x & ~y; return __builtin_ia32_ptestz128 (a, a); } With -O2 -msse4.1 we previously generated: foo:pandn %xmm0, %xmm1 xorl%eax, %eax ptest %xmm1, %xmm1 sete%al ret with this patch we now generate: foo:xorl%eax, %eax ptest %xmm0, %xmm1 setc%al ret At the same time, this patch also provides alternative fixes for PR target/109973 and PR target/110118, by recognizing that ptestc(x,x) always sets the carry flag (X&~X is always zero). This is achieved both by recognizing the special case in ix86_expand_sse_ptest and with a splitter to convert an eligible ptest into an stc. 2023-06-22 Roger Sayle Uros Bizjak gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize expansion of ptestc with equal operands as producing const1_rtx. * config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost estimates of UNSPEC_PTEST, where the ptest performs the PAND or PAND of its operands. * config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST of reg_equal_p operands into an x86_stc instruction. (define_split): Split pandn/ptestz/set{n?}e into ptestc/set{n?}c. (define_split): Similar to above for strict_low_part destinations. (define_split): Split pandn/ptestz/j{n?}e into ptestc/j{n?}c. gcc/testsuite/ChangeLog * gcc.target/i386/avx-vptest-4.c: New test case. * gcc.target/i386/avx-vptest-5.c: Likewise. * gcc.target/i386/avx-vptest-6.c: Likewise. * gcc.target/i386/pr109973-1.c: Update test case. * gcc.target/i386/pr109973-2.c: Likewise. * gcc.target/i386/sse4_1-ptest-4.c: New test case. * gcc.target/i386/sse4_1-ptest-5.c: Likewise. * gcc.target/i386/sse4_1-ptest-6.c: Likewise.
[Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973 --- Comment #9 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:8ab9fb6b8e05cf9acca7bd8282979ede53524cf1 commit r14-1528-g8ab9fb6b8e05cf9acca7bd8282979ede53524cf1 Author: Roger Sayle Date: Sun Jun 4 11:59:32 2023 +0100 PR target/110083: Fix-up REG_EQUAL notes on COMPARE in STV. This patch fixes PR target/110083, an ICE-on-valid regression exposed by my recent PTEST improvements (to address PR target/109973). The latent bug (admittedly mine) is that the scalar-to-vector (STV) pass doesn't update or delete REG_EQUAL notes attached to COMPARE instructions. As a result the operands of COMPARE would be mismatched, with the register transformed to V1TImode, but the immediate operand left as const_wide_int, which is valid for TImode but not V1TImode. This remained latent when the STV conversion converted the mode of the COMPARE to CCmode, with later passes recognizing the REG_EQUAL note is obviously invalid as the modes didn't match, but now that we (correctly) preserve the CCZmode on COMPARE, the mismatched operand modes trigger a sanity checking ICE downstream. Fixed by updating (or deleting) any REG_EQUAL notes in convert_compare. Before: (expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ]) (const_wide_int 0x8000)) After: (expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ]) (const_vector:V1TI [ (const_wide_int 0x8000) ])) 2023-06-04 Roger Sayle gcc/ChangeLog PR target/110083 * config/i386/i386-features.cc (scalar_chain::convert_compare): Update or delete REG_EQUAL notes, converting CONST_INT and CONST_WIDE_INT immediate operands to a suitable CONST_VECTOR. gcc/testsuite/ChangeLog PR target/110083 * gcc.target/i386/pr110083.c: New test case.
[Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973 --- Comment #8 from Benji Smith --- Yes, just pulled latest trunk and confirmed that the issue no longer repros. Thanks for the fix!
[Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973 --- Comment #7 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:3635e8c67e13e3da7e1e23a617dd9952218e93e0 commit r14-1466-g3635e8c67e13e3da7e1e23a617dd9952218e93e0 Author: Roger Sayle Date: Thu Jun 1 15:10:09 2023 +0100 PR target/109973: CCZmode and CCCmode variants of [v]ptest on x86. This is my proposed minimal fix for PR target/109973 (hopefully suitable for backporting) that follows Jakub Jelinek's suggestion that we introduce CCZmode and CCCmode variants of ptest and vptest, so that the i386 backend treats [v]ptest instructions similarly to testl instructions; using different CCmodes to indicate which condition flags are desired, and then relying on the RTL cmpelim pass to eliminate redundant tests. This conveniently matches Intel's intrinsics, that provide different functions for retrieving different flags, _mm_testz_si128 tests the Z flag, _mm_testc_si128 tests the carry flag. Currently we use the same instruction (pattern) for both, and unfortunately the *ptest_and optimization is only valid when the ptest/vptest instruction is used to set/test the Z flag. The downside, as predicted by Jakub, is that GCC's cmpelim pass is currently COMPARE-centric and not able to merge the ptests from expressions such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a known issue, PR target/80040. 2023-06-01 Roger Sayle Uros Bizjak gcc/ChangeLog PR target/109973 * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new CODE_for_sse4_1_ptestzv2di. (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di. (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di. (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di. * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode when expanding UNSPEC_PTEST to compare against zero. * config/i386/i386-features.cc (scalar_chain::convert_compare): Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons. (general_scalar_chain::convert_insn): Use CCZmode for COMPARE result. (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result. * config/i386/i386-protos.h (ix86_match_ptest_ccmode): Prototype. * config/i386/i386.cc (ix86_match_ptest_ccmode): New predicate to check for suitable matching modes for the UNSPEC_PTEST pattern. * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ. (*_ptest): Add asterisk to hide define_insn. Remove ":CC" mode of FLAGS_REG, instead use ix86_match_ptest_ccmode. (_ptestz): New define_expand to specify CCZ. (_ptestc): New define_expand to specify CCC. (_ptest): A define_expand using CC to preserve the current behavior. (*ptest_and): Specify CCZ to only perform this optimization when only the Z flag is required. gcc/testsuite/ChangeLog PR target/109973 * gcc.target/i386/pr109973-1.c: New test case. * gcc.target/i386/pr109973-2.c: Likewise.
[Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973 --- Comment #6 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:69185294f322dd53d4e1592115014c5488302e2e commit r14-1405-g69185294f322dd53d4e1592115014c5488302e2e Author: Roger Sayle Date: Tue May 30 14:40:50 2023 +0100 PR target/107172: Avoid "unusual" MODE_CC comparisons in simplify-rtx.cc I believe that a better (or supplementary) fix to PR target/107172 is to avoid producing incorrect (but valid) RTL in simplify_const_relational_operation when presented with questionable (obviously invalid) expressions, such as those produced during combine. Just as with the "first do no harm" clause with the Hippocratic Oath, simplify-rtx (probably) shouldn't unintentionally transform invalid RTL expressions, into incorrect (non-equivalent) but valid RTL that may be inappropriately recognized by recog. In this specific case, many GCC backends represent their flags register via MODE_CC, whose representation is intentionally "opaque" to the middle-end. The only use of MODE_CC comprehensible to the middle-end's RTL optimizers is relational comparisons between the result of a COMPARE rtx (op0) and zero (op1). Any other uses of MODE_CC should be left alone, and some might argue indicate representational issues in the backend. In practice, CPUs occasionally have numerous instructions that affect the flags register(s) other than comparisons [AVR's setc, powerpc's mtcrf, x86's clc, stc and cmc and x86_64's ptest that sets C and Z flags in non-obvious ways, c.f. PR target/109973]. Currently care has to be taken, wrapping these in UNSPEC, to avoid combine inappropriately merging flags setters with flags consumers (such as conditional jumps). It's safer to teach simplify_const_relational_operation not to modify expressions that it doesn't understand/recognize. 2023-05-30 Roger Sayle gcc/ChangeLog PR target/107172 * simplify-rtx.cc (simplify_const_relational_operation): Return early if we have a MODE_CC comparison that isn't a COMPARE against const0_rtx.
[Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973 Roger Sayle changed: What|Removed |Added Status|NEW |ASSIGNED See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=80040 Assignee|unassigned at gcc dot gnu.org |roger at nextmovesoftware dot com --- Comment #5 from Roger Sayle --- Many thanks to Benji for reporting this issue. I've proposed a solution at https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619973.html (following Jakub's suggestions in comment #3).
[Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973 --- Comment #4 from Benji Smith --- Just as follow-up, I also tested the same code with _mm_and_si128/_mm_testc_si128 on SSE4.1, and the same issue repros (via `gcc -O1 -msse4.1`): #include int do_stuff(__m128i Y0, __m128i Y1, __m128i X2) { __m128i And01 = _mm_and_si128(Y0, Y1); int TestResult = _mm_testc_si128(And01, And01); return TestResult; }
[Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- Guess the optimization is perfectly valid when it is just the ZF flag that is tested, i.e. in bar: #include int foo (__m256i x, __m256i y) { __m256i a = _mm256_and_si256 (x, y); return _mm256_testc_si256 (a, a); } int bar (__m256i x, __m256i y) { __m256i a = _mm256_and_si256 (x, y); return _mm256_testz_si256 (a, a); } _mm256_testc_si256 (a, a) is dumb (always returns non-zero because a & ~a is 0), perhaps we could fold it in gimple folding to 1. Still I'm afraid at RTL we can't rely on that folding. One option could be to use CCZmode instead of CCmode for the _mm*_testz* cases and perform this optimization solely for CCZmode and not for CCmode that would be used for _mm*_testc*. It has a disadvantage that we'd likely not be able to merge _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b) (or vice versa).
[Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973 Richard Biener changed: What|Removed |Added Last reconfirmed||2023-05-26 Priority|P3 |P2 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Known to fail||13.1.0 Known to work||12.3.0
[Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |13.2 Summary|Wrong code for AVX2 since |[13/14 Regression] Wrong |13.1 by combining VPAND and |code for AVX2 since 13.1 by |VPTEST |combining VPAND and VPTEST Keywords||wrong-code --- Comment #2 from Andrew Pinski --- The patch which introduced the failure is all in the x86_64 backend so it is target issue. r13-2006-ga56c1641e9d25e