[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473 --- Comment #9 from Hongtao.liu --- (In reply to Hongtao.liu from comment #8) > (In reply to Andrew Pinski from comment #7) > > The UNSPEC_MASKOP ones are still there. > > > > PR 93885 is the same issue. > > void test(void* data, void* data2) > { > __m128i v = _mm_load_si128((__m128i const*)data); > __mmask8 m = _mm_testn_epi16_mask(v, v); > m = m | 0x0f; > m = m | 0xf0; > v = _mm_maskz_add_epi16(m, v, v); > _mm_store_si128((__m128i*)data2, v); > } > > Should be ok. > > Currently we rely on RA to choose whether to use mask register or gpr for > bitwise operation, which means that if we remove UNSPEC_MASKOP, _kor_mask8 > will only generate gpr orb, to ensure the correspondence between intrinsic > and instruction, UNSPEC_MASKOP is necessary, if the user wants GCC to > optimize bitwise operation, it is recommended to use bitwise operator > instead of intrinsic. Similar for 88476
[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473 --- Comment #8 from Hongtao.liu --- (In reply to Andrew Pinski from comment #7) > The UNSPEC_MASKOP ones are still there. > > PR 93885 is the same issue. void test(void* data, void* data2) { __m128i v = _mm_load_si128((__m128i const*)data); __mmask8 m = _mm_testn_epi16_mask(v, v); m = m | 0x0f; m = m | 0xf0; v = _mm_maskz_add_epi16(m, v, v); _mm_store_si128((__m128i*)data2, v); } Should be ok. Currently we rely on RA to choose whether to use mask register or gpr for bitwise operation, which means that if we remove UNSPEC_MASKOP, _kor_mask8 will only generate gpr orb, to ensure the correspondence between intrinsic and instruction, UNSPEC_MASKOP is necessary, if the user wants GCC to optimize bitwise operation, it is recommended to use bitwise operator instead of intrinsic.
[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473 Andrew Pinski changed: What|Removed |Added Blocks||93885 --- Comment #7 from Andrew Pinski --- The UNSPEC_MASKOP ones are still there. PR 93885 is the same issue. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93885 [Bug 93885] Spurious instruction kshiftlw issued
[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #6 from Hongtao.liu --- Fixed in GCC11 by https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=388cb292a94f98a276548cd6ce01285cf36d17df
[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473 --- Comment #5 from Jakub Jelinek --- The rationale for doing it the way it currently is done: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02612.html
[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473 Jakub Jelinek changed: What|Removed |Added CC||kyukhin at gcc dot gnu.org, ||uros at gcc dot gnu.org, ||vmakarov at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- The *s on the =k, km *mov{di,si}_internal patterns (which I've copied to the *zero_extend?i?i2 patterns) were introduced in https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01113.html but there wasn't any discussion on why that has been introduced. Was that a fear that the register allocator will start using the mask registers for cases like memory1 = memory2 | memory3; instead of the GPRs? I'd say it would be useful to slightly disparage transfers from GPRs to mask registers and back and perhaps also slightly disparate mask stores into memory if needed to prevent using mask registers for the logical ops or shifts with only memory arguments and keep the rest of the alternative constaints (like =k, km) without any modifiers. And if that works, change the mask intrinsics to be normal arithmetics instead of special builtins with UNSPECs at RTL. Thoughts on that?
[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Target||x86_64-*-*, i?86-*-* Status|UNCONFIRMED |NEW Last reconfirmed||2018-12-14 Ever confirmed|0 |1 --- Comment #3 from Richard Biener --- Confirmed.
[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473 --- Comment #2 from Daniel Fruzynski --- I was playing with Compiler Explorer, to see how compilers optimize various pieces of code. I found that next clang version (currently trunk) will be able to analyze expressions which spans over vectors, masks and GPRs. I logged Bug 88476 to do something similar in gcc, please take a look. I think such approach as in clang would be more beneficial. In the past I also thought about template-based library, which would wrap vector operations. One of unique concepts was to create separate types to hold vector with bool values, and another one for int masks. With lazy instantiation this should lead to faster resulting code. I did not try to write it yet, but overall this approach look promising for me. With it such cases as in this bug can appear as a side effect of inlining.
[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- Well, if you want this constant folded, why are you using the _k* intrinsics at all rather than just normal arithmetics on the __mmask8 etc. types? The intrinsics are handled in GCC in a way to force those to be actually those instructions. We could surely implement those just by doing normal arithmetics inside of the headers, but then it would be more likely that normal GPR arithmetics would be used for those rather than mask logic etc. operations.