[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #11 from Hongtao.liu --- (In reply to 罗勇刚(Yonggang Luo) from comment #10) > (In reply to Hongtao.liu from comment #9) > > > > Without `-mbmi` option, gcc can not compile and all other three compiler > > > can compile. > > > > As long as it keeps semantics(respect zero input), I think this is > > acceptable. > > Yeap, it's acceptable, but consistence with Clang/MSVC/ICL would be better. > That would makes the cross-platform code easier, besides, GCC also works for > WIN32, that's needs GCC to be consistence with MSVC Sorry for confusion, I meant generating codes like f(int, int): # @f(int, int) testedi, edi je .LBB0_2 rep bsf eax, edi ret .LBB0_2: mov eax, 32 ret w/o mbmi is acceptable as long as it respect zero input.
[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #10 from 罗勇刚(Yonggang Luo) --- (In reply to Hongtao.liu from comment #9) > > Without `-mbmi` option, gcc can not compile and all other three compiler > > can compile. > > As long as it keeps semantics(respect zero input), I think this is > acceptable. Yeap, it's acceptable, but consistence with Clang/MSVC/ICL would be better. That would makes the cross-platform code easier, besides, GCC also works for WIN32, that's needs GCC to be consistence with MSVC
[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #9 from Hongtao.liu --- > There is a redundant xor instrunction, There's false dependence issue on some specific processors. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011 > Without `-mbmi` option, gcc can not compile and all other three compiler > can compile. As long as it keeps semantics(respect zero input), I think this is acceptable.
[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #8 from 罗勇刚(Yonggang Luo) --- (In reply to Hongtao.liu from comment #7) > No, I think what clang does is correct, Thanks, yeap, according to https://github.com/llvm/llvm-project/issues/64477 I think clang did it well. GCC also needs handling the following code properly ```c #ifdef _MSC_VER #include __forceinline void unreachable() {__assume(0);} #else #include inline __attribute__((always_inline)) void unreachable() { #if defined(__INTEL_COMPILER) __assume(0); #else __builtin_unreachable(); #endif } #endif int f(int a) { if (a == 0) { unreachable(); } return _tzcnt_u32 (a); } ``` According to https://godbolt.org/z/T9axzaPqj gcc with `-O2 -mbmi -m32` option compiled to ```asm f(int): xor eax, eax tzcnt eax, DWORD PTR [esp+4] ret ``` There is a redundant xor instrunction, Without `-mbmi` option, gcc can not compile and all other three compiler can compile. This issue still make sense, gcc can fixes it in Clang's way
[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #7 from Hongtao.liu --- (In reply to 罗勇刚(Yonggang Luo) from comment #6) > MSVC also added, clang seems have optimization issue, but MSVC doesn't have > that No, I think what clang does is correct, f(int, int): # @f(int, int) testedi, edi --- when source operand is zero. je .LBB0_2 rep bsf eax, edi ret .LBB0_2: mov eax, 32 ret The key difference between TZCNT and BSF instruction is that TZCNT provides operand size as output when source operand is zero while in the case of BSF instruction, if source operand is zero, the content of destination operand are undefined. https://godbolt.org/z/s74dfdWP4
[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #6 from 罗勇刚(Yonggang Luo) --- MSVC also added, clang seems have optimization issue, but MSVC doesn't have that https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1DIApACYAQuYukl9ZATwDKjdAGFUtAK4sGIMwDMpK4AMngMmAByPgBGmMQg0gAOqAqETgwe3r7%2BQSlpjgJhEdEscQnSdpgOGUIETMQEWT5%2BgbaY9oUMdQ0ExVGx8Ym29Y3NOW0Ko33hA2VDkgCUtqhexMjsHOYBeFRYVADUAPoAskJuRwBq2ABKJhoAgtvhyN5YByYBbuEExOEAdAhPth7k9Ah0lKDngxXl53p83KoABwANh%2BfwYgOBUPBDHwVChjx%2BByoEGJTFIB2JMUWUIA7FZHgdmSzmcRMAQ1gxjgQAF7IQRHLwBMwsiBMWkBRlPOkAEQ4y1onAArLw/BwtKRUJw3NZrAcFKt1pgPoEeKQCJoFcsANYSFH/DQBZUogJ0lEATkkkjpHpRkiCSo4kjVVq1nF4ChAGgtVuWcFgSDQLCSdHi5EoydT9ASADdkEkkkdc1wPUcDARMFMjqp/aQsLm8BtLnhMAB3ADySUYnHNNFoleIUYgMTDMXCDQAnr3eOPmMRJx2YtpqpbuLxk2xBB2GLRpxreFgYl5gG4xLQo%2Bv65gWIZgOID/W8Oyarmq2HMKpql5KzPyIIOjDWg8BiYgpw8LAw1%2BPAWBnZYqAMYAFBbdsux7K9%2BEEEQxHYKQZEERQVHUR9dDMfQ7xQPVLH0ECo0gZZUCSLpLwAWg7MxeFQN9iD%2BLA6IgZYqhqZwIFccY/C4YI8X6Upyj0fJ0gEcT5NSRSGBkwYEkkoSuh6MZPBaPQdNqaYNPmLSRl6ZTtNM2ZZKGLhBKNDYJEVFVQ0fbUOAOWtJAOFgFHzA4Sw9f4KyrAgDggXBCBIU0Akc3g1y0RZbQkSR/mVDQzC9JENC4FENDyuknX0TgQ1IdVNS8yNo1jA94xgRAUFQFM0zICgICzdqUAMIwjl%2BLwGBta9G2bVtO27dU%2BzoQdh1HR85ynP8loXJcVwcP9N0YAgdz3MMjxPM9aAvP8sFvIwH01fAX0cN9L01T9v1/K8fkAx9gNA8CME2TVoNg9d4MQ5CJrQ6beEw4RRHEPDIcItQw10SS%2BuMKibE%2B/iGKYjJWI7AJOO43j33o9pOgyFw8WsqT0DMuTJIUroqYZjJaYc0nVxMqyDJybSOg5gQ9JmEpNKM6YqamXpWa0py1hcxyyo4VVKrDLyfNRFj/QOV47yiwbhsWKKYqIYh4sS%2BqUrSswHTMLgNDpJEzEkKRlQK91pCDCqqs4iNbDq5KFVIBNmu6nMMy61rsyGJtkDMbLSxjBsm0wFDJvQmaB3ieaxwnBcVpzxdl1XLbWq3Xbd33a7MGPU9z0vc1zrvK7D2fDn7o/L9kB/TZzTeoNNU%2BsCFwg36kr%2BAGeCBpgkJTsG/0h7CYekOGlARkiQEDCjTEsawaJiTGtWxgRcY4rVCbwPj4EEvnhL8UTKe5iTqallSCgyJnVK6Z/ebJgWxYfozr66VssLcyosubZEfhLRoX8ZbGlcgrJWXtwzeWRCiDWflo4HFjjbf4oUNCG3wMbU2iwkpxjSlwJE/xUTKh9MqP0eUzD2yRArT2KsfZRhjP7Rq8AIBIFWAQJIP4w4h3iJEVgmxUHoIOMAZAyBdbECGjaEhwRCEkHPnoee0NcJL1kPDYimpSKkDbGBJIcEEEeWqpwDsP5BGRVQIcXy/lApyJCmFJglYphRQ8G1HM8UzDKK4WlOk/w6QUMkEiD0jtspmCRG6MiHsLHew4LVThcZA5NV4S1Hx6ZOoiISMALgIpRpJxnlNOes1M6UAWpqVa%2B5zS1PWkXK821tzlwOlXI6tczo3kbiPJ8t08Bt0fE9TuL0e4AT7rwAe31IKPn%2BmYhCU8QaoTKRhWQC9tH4XkCvfROhhgo0otvaiGNL4H2YpwNiooWJlBYHgAm8Qib72MiJMS/9JKhDsiLemH835vNIMzIonzQHf35t0P%2BECAE/zBZLIFdNLL6QhTZGFIC6awLlm5RWiTkFqzQZrGRcjCn/FFNFVRJtthm0CaQO0BV/gojpfShlDL3blSxTVX2qSGoYpPkgtllLuJpGcJIIAA%3D The demo code is: #ifdef _MSC_VER #include #else #include #endif int f(int a, int b) { return _tzcnt_u32 (a); }
[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #5 from Hongtao.liu --- Maybe source code can be changed as int f(int a, int b) { #ifdef __BMI__ return _tzcnt_u32 (a); #else return _bit_scan_forward (a); #endif } But looks like clang/MSVC doesn't support _bit_scan_forward, should be a bug for them since it's in the intrinsics guide.
[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #4 from Hongtao.liu --- (In reply to Hongtao.liu from comment #3) > But there's difference between TZCNT and BSF > > The key difference between TZCNT and BSF instruction is that TZCNT provides > operand size as output when source operand is zero while in the case of BSF > instruction. > > Clang looks correct since it also handle zero case, ICC seems wrong, it just > generates > https://godbolt.org/z/WvrsTrjWr MSCV seems wrong either.
[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #3 from Hongtao.liu --- But there's difference between TZCNT and BSF The key difference between TZCNT and BSF instruction is that TZCNT provides operand size as output when source operand is zero while in the case of BSF instruction. Clang looks correct since it also handle zero case, ICC seems wrong, it just generates https://godbolt.org/z/WvrsTrjWr
[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #2 from Andrew Pinski --- Testcase: #include "x86intrin.h" int f(int a, int b) { return _tzcnt_u32 (a); }
[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2023-08-06 Target|x86_64-*-*,x86-*-* |x86_64-*-*, i?86-*-* Status|UNCONFIRMED |NEW --- Comment #1 from Andrew Pinski --- (In reply to 罗勇刚(Yonggang Luo) from comment #0) > This is for alignment with Clang and MSVC. > > Also the _mm_tzcnt_32 and _mm_tzcnt_64 are added for consistence with Clang > and MSVC ICC acts in a similar way as clang and MSVC does too. Confirmed.