[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #10 from 罗勇刚(Yonggang Luo) --- (In reply to Hongtao.liu from comment #9) > > Without `-mbmi` option, gcc can not compile and all other three compiler > > can compile. > > As long as it keeps semantics(respect zero input), I think this is > acceptable. Yeap, it's acceptable, but consistence with Clang/MSVC/ICL would be better. That would makes the cross-platform code easier, besides, GCC also works for WIN32, that's needs GCC to be consistence with MSVC
[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #8 from 罗勇刚(Yonggang Luo) --- (In reply to Hongtao.liu from comment #7) > No, I think what clang does is correct, Thanks, yeap, according to https://github.com/llvm/llvm-project/issues/64477 I think clang did it well. GCC also needs handling the following code properly ```c #ifdef _MSC_VER #include __forceinline void unreachable() {__assume(0);} #else #include inline __attribute__((always_inline)) void unreachable() { #if defined(__INTEL_COMPILER) __assume(0); #else __builtin_unreachable(); #endif } #endif int f(int a) { if (a == 0) { unreachable(); } return _tzcnt_u32 (a); } ``` According to https://godbolt.org/z/T9axzaPqj gcc with `-O2 -mbmi -m32` option compiled to ```asm f(int): xor eax, eax tzcnt eax, DWORD PTR [esp+4] ret ``` There is a redundant xor instrunction, Without `-mbmi` option, gcc can not compile and all other three compiler can compile. This issue still make sense, gcc can fixes it in Clang's way
[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #6 from 罗勇刚(Yonggang Luo) --- MSVC also added, clang seems have optimization issue, but MSVC doesn't have that https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1DIApACYAQuYukl9ZATwDKjdAGFUtAK4sGIMwDMpK4AMngMmAByPgBGmMQg0gAOqAqETgwe3r7%2BQSlpjgJhEdEscQnSdpgOGUIETMQEWT5%2BgbaY9oUMdQ0ExVGx8Ym29Y3NOW0Ko33hA2VDkgCUtqhexMjsHOYBeFRYVADUAPoAskJuRwBq2ABKJhoAgtvhyN5YByYBbuEExOEAdAhPth7k9Ah0lKDngxXl53p83KoABwANh%2BfwYgOBUPBDHwVChjx%2BByoEGJTFIB2JMUWUIA7FZHgdmSzmcRMAQ1gxjgQAF7IQRHLwBMwsiBMWkBRlPOkAEQ4y1onAArLw/BwtKRUJw3NZrAcFKt1pgPoEeKQCJoFcsANYSFH/DQBZUogJ0lEATkkkjpHpRkiCSo4kjVVq1nF4ChAGgtVuWcFgSDQLCSdHi5EoydT9ASADdkEkkkdc1wPUcDARMFMjqp/aQsLm8BtLnhMAB3ADySUYnHNNFoleIUYgMTDMXCDQAnr3eOPmMRJx2YtpqpbuLxk2xBB2GLRpxreFgYl5gG4xLQo%2Bv65gWIZgOID/W8Oyarmq2HMKpql5KzPyIIOjDWg8BiYgpw8LAw1%2BPAWBnZYqAMYAFBbdsux7K9%2BEEEQxHYKQZEERQVHUR9dDMfQ7xQPVLH0ECo0gZZUCSLpLwAWg7MxeFQN9iD%2BLA6IgZYqhqZwIFccY/C4YI8X6Upyj0fJ0gEcT5NSRSGBkwYEkkoSuh6MZPBaPQdNqaYNPmLSRl6ZTtNM2ZZKGLhBKNDYJEVFVQ0fbUOAOWtJAOFgFHzA4Sw9f4KyrAgDggXBCBIU0Akc3g1y0RZbQkSR/mVDQzC9JENC4FENDyuknX0TgQ1IdVNS8yNo1jA94xgRAUFQFM0zICgICzdqUAMIwjl%2BLwGBta9G2bVtO27dU%2BzoQdh1HR85ynP8loXJcVwcP9N0YAgdz3MMjxPM9aAvP8sFvIwH01fAX0cN9L01T9v1/K8fkAx9gNA8CME2TVoNg9d4MQ5CJrQ6beEw4RRHEPDIcItQw10SS%2BuMKibE%2B/iGKYjJWI7AJOO43j33o9pOgyFw8WsqT0DMuTJIUroqYZjJaYc0nVxMqyDJybSOg5gQ9JmEpNKM6YqamXpWa0py1hcxyyo4VVKrDLyfNRFj/QOV47yiwbhsWKKYqIYh4sS%2BqUrSswHTMLgNDpJEzEkKRlQK91pCDCqqs4iNbDq5KFVIBNmu6nMMy61rsyGJtkDMbLSxjBsm0wFDJvQmaB3ieaxwnBcVpzxdl1XLbWq3Xbd33a7MGPU9z0vc1zrvK7D2fDn7o/L9kB/TZzTeoNNU%2BsCFwg36kr%2BAGeCBpgkJTsG/0h7CYekOGlARkiQEDCjTEsawaJiTGtWxgRcY4rVCbwPj4EEvnhL8UTKe5iTqallSCgyJnVK6Z/ebJgWxYfozr66VssLcyosubZEfhLRoX8ZbGlcgrJWXtwzeWRCiDWflo4HFjjbf4oUNCG3wMbU2iwkpxjSlwJE/xUTKh9MqP0eUzD2yRArT2KsfZRhjP7Rq8AIBIFWAQJIP4w4h3iJEVgmxUHoIOMAZAyBdbECGjaEhwRCEkHPnoee0NcJL1kPDYimpSKkDbGBJIcEEEeWqpwDsP5BGRVQIcXy/lApyJCmFJglYphRQ8G1HM8UzDKK4WlOk/w6QUMkEiD0jtspmCRG6MiHsLHew4LVThcZA5NV4S1Hx6ZOoiISMALgIpRpJxnlNOes1M6UAWpqVa%2B5zS1PWkXK821tzlwOlXI6tczo3kbiPJ8t08Bt0fE9TuL0e4AT7rwAe31IKPn%2BmYhCU8QaoTKRhWQC9tH4XkCvfROhhgo0otvaiGNL4H2YpwNiooWJlBYHgAm8Qib72MiJMS/9JKhDsiLemH835vNIMzIonzQHf35t0P%2BECAE/zBZLIFdNLL6QhTZGFIC6awLlm5RWiTkFqzQZrGRcjCn/FFNFVRJtthm0CaQO0BV/gojpfShlDL3blSxTVX2qSGoYpPkgtllLuJpGcJIIAA%3D The demo code is: #ifdef _MSC_VER #include #else #include #endif int f(int a, int b) { return _tzcnt_u32 (a); }
[Bug target/110921] New: Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 Bug ID: 110921 Summary: Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: luoyonggang at gmail dot com Target Milestone: --- Created attachment 55695 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55695=edit A patch for fixes this issue in a brute way, just for demo The compiling error: In file included from C:/CI-Tools/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/13.1.0/include/x86gprintrin.h:41, from C:/CI-Tools/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/13.1.0/include/x86intrin.h:27, from C:/CI-Tools/msys64/mingw64/include/intrin.h:69, from ../../src/amd/addrlib/src/core/addrcommon.h:33, from ../../src/amd/addrlib/src/core/addrobject.h:21, from ../../src/amd/addrlib/src/core/addrlib.h:21, from ../../src/amd/addrlib/src/core/addrlib2.h:20, from ../../src/amd/addrlib/src/gfx10/gfx10addrlib.h:19, from ../../src/amd/addrlib/src/gfx10/gfx10addrlib.cpp:16: C:/CI-Tools/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/13.1.0/include/bmiintrin.h: In function 'unsigned int Addr::BitScanForward(unsigned int)': C:/CI-Tools/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/13.1.0/include/bmiintrin.h:116:1: error: inlining failed in call to 'always_inline' 'unsigned int _tzcnt_u32(unsigned int)': target specific option mismatch 116 | _tzcnt_u32 (unsigned int __X) | ^~ ../../src/amd/addrlib/src/core/addrcommon.h:343:23: note: called from here 343 | out = ::_tzcnt_u32(mask); | ^~ [113/] Compiling C++ object src/amd/compiler/libaco.a.p/aco_insert_NOPs.cpp.obj This code can be compiled with MSVC and Clang. Reason: Since the TZCNT instruction behaves as BSF on non-BMI targets, there is code that expects to use it as a potentially faster version of BSF. This is for alignment with Clang and MSVC. Also the _mm_tzcnt_32 and _mm_tzcnt_64 are added for consistence with Clang and MSVC
[Bug target/108191] Add support to usage of *intrin.h without -mavx512f -mavx512cd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108191 --- Comment #6 from 罗勇刚(Yonggang Luo) --- Is the following command are valid usage? It's compiled properly ``` // compile args: -fPIC -O2 -D__SSE3__=1 -D__SSSE3__=1 -D__SSE4_1__=1 -D__SSE4_2__=1 -D__SSE4A__=1 -D__POPCNT__=1 -D__XSAVE__=1 -D__CRC32__=1 -D__AVX__=1 -D__AVX2__=1 -D__FP_FAST_FMAF32=1 -D__FP_FAST_FMAF64=1 -D__FP_FAST_FMAF=1 -D__FP_FAST_FMAF32x=1 -D__AVX512F__=1 -D__AVX512CD__=1 #include #pragma GCC push_options #pragma GCC target("avx512f") #pragma GCC target("avx512cd") #pragma GCC target("sse4a") #if defined(_MSC_VER) #include #else #include #endif #pragma GCC pop_options #pragma GCC push_options #pragma GCC target("avx512f") #pragma GCC target("avx512cd") #pragma GCC target("sse4a") void util_fadd_512(float *a, float *b, float *c) { /* a = b + c */ __m512 av = _mm512_load_ps(a); __m512 bv = _mm512_load_ps(b); __m512 cv = _mm512_add_ps(av, bv); _mm512_store_ps(c, cv); } static inline int util_iround(float f) { __m128 m = _mm_set_ss(f); return _mm_cvtss_i32(m); } #pragma GCC pop_options int util_iround_outside(int x, float y) { return x + util_iround(y); } float util_fadd(float a, float b) { return a + b; } ```
[Bug target/108191] Add support to usage of *intrin.h without -mavx512f -mavx512cd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108191 --- Comment #4 from 罗勇刚(Yonggang Luo) --- (In reply to Richard Biener from comment #3) > I suppose the issue will be that __attribute__((target)) isn't supported by > MSVC? But indeed this isn't something we are going to support. Note > another way is to put the functions into different translation units. gcc is enough, no need care about msvc, msvc can support without attribute, we can use macro to deal with that.
[Bug target/108191] Add support to usage of *intrin.h without -mavx512f -mavx512cd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108191 --- Comment #2 from 罗勇刚(Yonggang Luo) --- (In reply to Jakub Jelinek from comment #1) > You are lying to the compiler, don't. In GCC you can #include > with SSE2 only and later in say __attribute__((target ("avx512cd"))) > function use avx512f/avx512cd intrinsics, no need to do the what you show > above. Can you be more specific, show me the code, thanks:)
[Bug c/108191] New: Add support to usage of *intrin.h without -mavx512f -mavx512cd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108191 Bug ID: 108191 Summary: Add support to usage of *intrin.h without -mavx512f -mavx512cd Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: luoyonggang at gmail dot com Target Milestone: --- This is for getting the following command to be works ``` gcc -fPIC -O2 -D__SSE3__=1 -D__SSSE3__=1 \ -D__SSE4_1__=1 -D__SSE4_2__=1 -D__SSE4A__=1 \ -D__POPCNT__=1 -D__XSAVE__=1 -D__CRC32__=1 \ -D__AVX__=1 -D__AVX2__=1 \ -D__FP_FAST_FMAF32=1 \ -D__FP_FAST_FMAF64=1 \ -D__FP_FAST_FMAF=1 \ -D__FP_FAST_FMAF32x=1 \ -D__AVX512F__=1 -D__AVX512CD__=1 test.c ``` That is generating code for SSE2 only, and we can using #include by using runtime flags. Indeed, MSVC are aready can did that, if gcc can also support for that, we can reduce the usage of inline assembly, because MSVC(x64) doesn't support for inline assembly, so that we can reduce the code complex The content of test.c is: ``` #if defined(_MSC_VER) #include #else #include #endif #include static inline int util_iround(float f) { __m128 m = _mm_set_ss(f); return _mm_cvtss_i32(m); } int util_iround_outside(int x, float y) { return x + util_iround(y); } ``` The compile error is something like: ``` In file included from C:/CI-Tools/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/12.2.0/include/immintrin.h:35, from C:/CI-Tools/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/12.2.0/include/x86intrin.h:32, from test.c:4: C:/CI-Tools/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/12.2.0/include/pmmintrin.h: In function '_mm_addsub_ps': C:/CI-Tools/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/12.2.0/include/pmmintrin.h:53:3: error: cannot convert a value of type 'int' to vector type '__vector(4) float' which has different size 53 | return (__m128) __builtin_ia32_addsubps ((__v4sf)__X, (__v4sf)__Y); | ^~ C:/CI-Tools/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/12.2.0/include/pmmintrin.h: In function '_mm_hadd_ps': C:/CI-Tools/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/12.2.0/include/pmmintrin.h:59:3: error: cannot convert a value of type 'int' to vector type '__vector(4) float' which has different size 59 | return (__m128) __builtin_ia32_haddps ((__v4sf)__X, (__v4sf)__Y); | ^~ C:/CI-Tools/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/12.2.0/include/pmmintrin.h: In function '_mm_hsub_ps': C:/CI-Tools/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/12.2.0/include/pmmintrin.h:65:3: error: cannot convert a value of type 'int' to vector type '__vector(4) float' which has different size 65 | return (__m128) __builtin_ia32_hsubps ((__v4sf)__X, (__v4sf)__Y); ```