[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921

--- Comment #11 from Hongtao.liu  ---
(In reply to 罗勇刚(Yonggang Luo) from comment #10)
> (In reply to Hongtao.liu from comment #9)
> 
> > > Without `-mbmi` option, gcc can not compile and all other three compiler
> > > can compile.
> > 
> > As long as it keeps semantics(respect zero input), I think this is
> > acceptable.
> 
> Yeap, it's acceptable, but consistence with Clang/MSVC/ICL would be better.
> That would makes the cross-platform code easier, besides, GCC also works for
> WIN32, that's needs GCC to be consistence with MSVC

Sorry for confusion, I meant generating codes like

f(int, int): # @f(int, int)
testedi, edi
je  .LBB0_2
rep   bsf eax, edi
ret
.LBB0_2:
mov eax, 32
ret

w/o mbmi is acceptable as long as it respect zero input.

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-07 Thread luoyonggang at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921

--- Comment #10 from 罗勇刚(Yonggang Luo)  ---
(In reply to Hongtao.liu from comment #9)

> > Without `-mbmi` option, gcc can not compile and all other three compiler
> > can compile.
> 
> As long as it keeps semantics(respect zero input), I think this is
> acceptable.

Yeap, it's acceptable, but consistence with Clang/MSVC/ICL would be better.
That would makes the cross-platform code easier, besides, GCC also works for
WIN32, that's needs GCC to be consistence with MSVC

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921

--- Comment #9 from Hongtao.liu  ---

> There is a redundant xor instrunction,
There's false dependence issue on some specific processors.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011

> Without `-mbmi` option, gcc can not compile and all other three compiler
> can compile.

As long as it keeps semantics(respect zero input), I think this is acceptable.

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-07 Thread luoyonggang at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921

--- Comment #8 from 罗勇刚(Yonggang Luo)  ---
(In reply to Hongtao.liu from comment #7)
> No, I think what clang does is correct,

Thanks, yeap, according to https://github.com/llvm/llvm-project/issues/64477
I think clang did it well.

GCC also needs handling the following code properly

```c
#ifdef _MSC_VER
#include 
__forceinline void
unreachable() {__assume(0);}
#else
#include 
inline __attribute__((always_inline)) void
unreachable() {
#if defined(__INTEL_COMPILER)
__assume(0);
#else
__builtin_unreachable();
#endif
}
#endif

int f(int a)
{
if (a == 0) {
  unreachable();
}
return _tzcnt_u32   (a);
}

```
According to https://godbolt.org/z/T9axzaPqj

gcc with `-O2 -mbmi -m32` option compiled to
```asm
f(int):
xor eax, eax
tzcnt   eax, DWORD PTR [esp+4]
ret
```

There is a redundant xor instrunction,
Without `-mbmi` option, gcc can not compile and all other three compiler
can compile.
This issue still make sense, gcc can fixes it in Clang's way

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921

--- Comment #7 from Hongtao.liu  ---
(In reply to 罗勇刚(Yonggang Luo) from comment #6)
> MSVC also added, clang seems have optimization issue, but MSVC doesn't have
> that
No, I think what clang does is correct,

f(int, int): # @f(int, int)
testedi, edi   --- when source operand is zero.
je  .LBB0_2
rep   bsf eax, edi
ret
.LBB0_2:
mov eax, 32
ret


 The key difference between TZCNT and BSF instruction is that TZCNT provides
operand size as output when source operand is zero while in the case of BSF
instruction, if source operand is zero, the content of destination operand are
undefined.

https://godbolt.org/z/s74dfdWP4

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-06 Thread luoyonggang at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921

--- Comment #6 from 罗勇刚(Yonggang Luo)  ---
MSVC also added, clang seems have optimization issue, but MSVC doesn't have
that


https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1DIApACYAQuYukl9ZATwDKjdAGFUtAK4sGIMwDMpK4AMngMmAByPgBGmMQg0gAOqAqETgwe3r7%2BQSlpjgJhEdEscQnSdpgOGUIETMQEWT5%2BgbaY9oUMdQ0ExVGx8Ym29Y3NOW0Ko33hA2VDkgCUtqhexMjsHOYBeFRYVADUAPoAskJuRwBq2ABKJhoAgtvhyN5YByYBbuEExOEAdAhPth7k9Ah0lKDngxXl53p83KoABwANh%2BfwYgOBUPBDHwVChjx%2BByoEGJTFIB2JMUWUIA7FZHgdmSzmcRMAQ1gxjgQAF7IQRHLwBMwsiBMWkBRlPOkAEQ4y1onAArLw/BwtKRUJw3NZrAcFKt1pgPoEeKQCJoFcsANYSFH/DQBZUogJ0lEATkkkjpHpRkiCSo4kjVVq1nF4ChAGgtVuWcFgSDQLCSdHi5EoydT9ASADdkEkkkdc1wPUcDARMFMjqp/aQsLm8BtLnhMAB3ADySUYnHNNFoleIUYgMTDMXCDQAnr3eOPmMRJx2YtpqpbuLxk2xBB2GLRpxreFgYl5gG4xLQo%2Bv65gWIZgOID/W8Oyarmq2HMKpql5KzPyIIOjDWg8BiYgpw8LAw1%2BPAWBnZYqAMYAFBbdsux7K9%2BEEEQxHYKQZEERQVHUR9dDMfQ7xQPVLH0ECo0gZZUCSLpLwAWg7MxeFQN9iD%2BLA6IgZYqhqZwIFccY/C4YI8X6Upyj0fJ0gEcT5NSRSGBkwYEkkoSuh6MZPBaPQdNqaYNPmLSRl6ZTtNM2ZZKGLhBKNDYJEVFVQ0fbUOAOWtJAOFgFHzA4Sw9f4KyrAgDggXBCBIU0Akc3g1y0RZbQkSR/mVDQzC9JENC4FENDyuknX0TgQ1IdVNS8yNo1jA94xgRAUFQFM0zICgICzdqUAMIwjl%2BLwGBta9G2bVtO27dU%2BzoQdh1HR85ynP8loXJcVwcP9N0YAgdz3MMjxPM9aAvP8sFvIwH01fAX0cN9L01T9v1/K8fkAx9gNA8CME2TVoNg9d4MQ5CJrQ6beEw4RRHEPDIcItQw10SS%2BuMKibE%2B/iGKYjJWI7AJOO43j33o9pOgyFw8WsqT0DMuTJIUroqYZjJaYc0nVxMqyDJybSOg5gQ9JmEpNKM6YqamXpWa0py1hcxyyo4VVKrDLyfNRFj/QOV47yiwbhsWKKYqIYh4sS%2BqUrSswHTMLgNDpJEzEkKRlQK91pCDCqqs4iNbDq5KFVIBNmu6nMMy61rsyGJtkDMbLSxjBsm0wFDJvQmaB3ieaxwnBcVpzxdl1XLbWq3Xbd33a7MGPU9z0vc1zrvK7D2fDn7o/L9kB/TZzTeoNNU%2BsCFwg36kr%2BAGeCBpgkJTsG/0h7CYekOGlARkiQEDCjTEsawaJiTGtWxgRcY4rVCbwPj4EEvnhL8UTKe5iTqallSCgyJnVK6Z/ebJgWxYfozr66VssLcyosubZEfhLRoX8ZbGlcgrJWXtwzeWRCiDWflo4HFjjbf4oUNCG3wMbU2iwkpxjSlwJE/xUTKh9MqP0eUzD2yRArT2KsfZRhjP7Rq8AIBIFWAQJIP4w4h3iJEVgmxUHoIOMAZAyBdbECGjaEhwRCEkHPnoee0NcJL1kPDYimpSKkDbGBJIcEEEeWqpwDsP5BGRVQIcXy/lApyJCmFJglYphRQ8G1HM8UzDKK4WlOk/w6QUMkEiD0jtspmCRG6MiHsLHew4LVThcZA5NV4S1Hx6ZOoiISMALgIpRpJxnlNOes1M6UAWpqVa%2B5zS1PWkXK821tzlwOlXI6tczo3kbiPJ8t08Bt0fE9TuL0e4AT7rwAe31IKPn%2BmYhCU8QaoTKRhWQC9tH4XkCvfROhhgo0otvaiGNL4H2YpwNiooWJlBYHgAm8Qib72MiJMS/9JKhDsiLemH835vNIMzIonzQHf35t0P%2BECAE/zBZLIFdNLL6QhTZGFIC6awLlm5RWiTkFqzQZrGRcjCn/FFNFVRJtthm0CaQO0BV/gojpfShlDL3blSxTVX2qSGoYpPkgtllLuJpGcJIIAA%3D

The demo code is:

#ifdef _MSC_VER
#include 
#else
#include 
#endif

int f(int a, int b)
{
return _tzcnt_u32   (a);
}

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921

--- Comment #5 from Hongtao.liu  ---
Maybe source code can be changed as
 int f(int a, int b)
{
#ifdef __BMI__
return _tzcnt_u32   (a);
#else
return _bit_scan_forward (a);
#endif
}

But looks like clang/MSVC doesn't support _bit_scan_forward, should be a bug
for them since it's in the intrinsics guide.

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921

--- Comment #4 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #3)
> But there's difference between TZCNT and BSF
> 
>  The key difference between TZCNT and BSF instruction is that TZCNT provides
> operand size as output when source operand is zero while in the case of BSF
> instruction.
> 
> Clang looks correct since it also handle zero case, ICC seems wrong, it just
> generates 
> https://godbolt.org/z/WvrsTrjWr

MSCV seems wrong either.

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #3 from Hongtao.liu  ---
But there's difference between TZCNT and BSF

 The key difference between TZCNT and BSF instruction is that TZCNT provides
operand size as output when source operand is zero while in the case of BSF
instruction.

Clang looks correct since it also handle zero case, ICC seems wrong, it just
generates 
https://godbolt.org/z/WvrsTrjWr

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921

--- Comment #2 from Andrew Pinski  ---
Testcase:
#include "x86intrin.h"

int f(int a, int b)
{
return _tzcnt_u32   (a);
}

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-08-06
 Target|x86_64-*-*,x86-*-*  |x86_64-*-*, i?86-*-*
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
(In reply to 罗勇刚(Yonggang Luo) from comment #0)
> This is for alignment with Clang and MSVC.
> 
> Also the _mm_tzcnt_32 and _mm_tzcnt_64 are added for consistence with Clang
> and MSVC

ICC acts in a similar way as clang and MSVC does too.

Confirmed.