[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319 --- Comment #8 from Pali Rohár --- Thanks for quick response and fixup of this issue.
[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319 Jakub Jelinek changed: What|Removed |Added Resolution|--- |FIXED CC||jakub at gcc dot gnu.org Status|NEW |RESOLVED --- Comment #7 from Jakub Jelinek --- Fixed for GCC 14.
[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319 --- Comment #6 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:74bca21db31e3f4ab6543b56c3f26b4dfe586fef commit r14-9453-g74bca21db31e3f4ab6543b56c3f26b4dfe586fef Author: Jakub Jelinek Date: Wed Mar 13 15:34:59 2024 +0100 store-merging: Match bswap64 on 32-bit targets with bswapsi2 [PR114319] gimple-ssa-store-merging.cc tests bswap_optab in 3 different places, in 2 of them it has special exception for double-word bswap using pair of word-mode bswap optabs, but in the last one it doesn't. The following patch changes even the last spot. We don't handle 128-bit bswaps in the passes at all, because currently we just use uint64_t to represent the byte reshuffling (we'd need to use offset_int or something like that instead) and we don't have __builtin_bswap128 nor type-generic __builtin_bswap, so there is nothing for 64-bit targets there. 2024-03-13 Jakub Jelinek PR middle-end/114319 * gimple-ssa-store-merging.cc (imm_store_chain_info::try_coalesce_bswap): For 32-bit targets allow matching __builtin_bswap64 if there is bswapsi2 optab. * gcc.target/i386/pr114319.c: New test.
[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319 --- Comment #5 from Richard Biener --- Coalescing successful! Merged into 1 stores 32 bit bswap implementation found at: _37 looks like we are only merging one store. Note we cannot recognize bswap to memory this is a known issue. So for the bswap64 we need to merge to a 64bit store which we never do on a 32bit platform. We could with SSE, but appearantly we don't try with the bswap trick at least. The bswap trick also doesn't seem to consider the split 64bit bswap. Oddly enough we also fail to merge the other store (maybe missing a val >> 32 pre-shift "trick"). Possibly could be shown to be a similar issue with a 126bit bswap on x86_64 which we could emulate with two 64bit bswaps.
[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319 --- Comment #4 from Andrew Pinski --- (In reply to Pali Rohár from comment #3) > --with-arch-32=i686 This basically causes SSE to be disabled for 32bit by default ... With the default options to configure GCC, -m32 for x86_64 still enables sse ...
[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319 --- Comment #3 from Pali Rohár --- For details, here is the compiler which produces the mentioned code: $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-14' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.2.0 (Debian 12.2.0-14) I guess that with these configure options you should be able to compile gcc which produces the mentioned code.
[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2024-03-12 Target|x86 |ILP32 Blocks||94094 Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski --- Confirmed. I see the trunk even without -mno-sse does not produce the 2 bswaps. Looks like the store-merging pass is not recognizing bswap<<32 for some reason. Also I thought there was a dup somewhere ... Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94094 [Bug 94094] [meta-bug] store-merging and/or bswap load/store-merging missed optimizations
[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319 Andrew Pinski changed: What|Removed |Added Known to fail||11.4.0 --- Comment #1 from Andrew Pinski --- >But compiling it for 32-bit x86 via "gcc -m32 -O2" produces not so optimized >code: I get that code generation for GCC 11.4.0 and before. For GCC 12.1.0 and above I get: ``` movl8(%esp), %ecx bswap %ecx movl%ecx, %eax movl4(%esp), %ecx bswap %ecx movl%ecx, %edx movl12(%esp), %ecx movl%eax, (%ecx) movl%edx, 4(%ecx) ret ``` Which just has a few extra moves. But adding -mno-sse, GCC 12 produces worse code.
[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319 Andrew Pinski changed: What|Removed |Added Component|target |middle-end Severity|normal |enhancement Keywords||missed-optimization