[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86

2024-03-13 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319

--- Comment #8 from Pali Rohár  ---
Thanks for quick response and fixup of this issue.

[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86

2024-03-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||jakub at gcc dot gnu.org
 Status|NEW |RESOLVED

--- Comment #7 from Jakub Jelinek  ---
Fixed for GCC 14.

[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86

2024-03-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319

--- Comment #6 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:74bca21db31e3f4ab6543b56c3f26b4dfe586fef

commit r14-9453-g74bca21db31e3f4ab6543b56c3f26b4dfe586fef
Author: Jakub Jelinek 
Date:   Wed Mar 13 15:34:59 2024 +0100

store-merging: Match bswap64 on 32-bit targets with bswapsi2 [PR114319]

gimple-ssa-store-merging.cc tests bswap_optab in 3 different places,
in 2 of them it has special exception for double-word bswap using pair
of word-mode bswap optabs, but in the last one it doesn't.

The following patch changes even the last spot.
We don't handle 128-bit bswaps in the passes at all, because currently we
just use uint64_t to represent the byte reshuffling (we'd need to use
offset_int or something like that instead) and we don't have
__builtin_bswap128 nor type-generic __builtin_bswap, so there is nothing
for 64-bit targets there.

2024-03-13  Jakub Jelinek  

PR middle-end/114319
* gimple-ssa-store-merging.cc
(imm_store_chain_info::try_coalesce_bswap): For 32-bit targets
allow matching __builtin_bswap64 if there is bswapsi2 optab.

* gcc.target/i386/pr114319.c: New test.

[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86

2024-03-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319

--- Comment #5 from Richard Biener  ---
Coalescing successful!
Merged into 1 stores
32 bit bswap implementation found at: _37

looks like we are only merging one store.  Note we cannot recognize
bswap to memory this is a known issue.  So for the bswap64 we need to
merge to a 64bit store which we never do on a 32bit platform.  We
could with SSE, but appearantly we don't try with the bswap trick
at least.  The bswap trick also doesn't seem to consider the split
64bit bswap.  Oddly enough we also fail to merge the other store
(maybe missing a val >> 32 pre-shift "trick").

Possibly could be shown to be a similar issue with a 126bit bswap
on x86_64 which we could emulate with two 64bit bswaps.

[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86

2024-03-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319

--- Comment #4 from Andrew Pinski  ---
(In reply to Pali Rohár from comment #3)
> --with-arch-32=i686

This basically causes SSE to be disabled for 32bit by default ...
With the default options to configure GCC, -m32 for x86_64 still enables sse
...

[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86

2024-03-12 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319

--- Comment #3 from Pali Rohár  ---
For details, here is the compiler which produces the mentioned code:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-14'
--with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Debian 12.2.0-14)

I guess that with these configure options you should be able to compile gcc
which produces the mentioned code.

[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86

2024-03-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-03-12
 Target|x86 |ILP32
 Blocks||94094
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
Confirmed. I see the trunk even without -mno-sse does not produce the 2 bswaps.

Looks like the store-merging pass is not recognizing bswap<<32 for some reason.

Also I thought there was a dup somewhere ...


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94094
[Bug 94094] [meta-bug] store-merging and/or bswap load/store-merging missed
optimizations

[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86

2024-03-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319

Andrew Pinski  changed:

   What|Removed |Added

  Known to fail||11.4.0

--- Comment #1 from Andrew Pinski  ---
>But compiling it for 32-bit x86 via "gcc -m32 -O2" produces not so optimized 
>code:


I get that code generation for GCC 11.4.0 and before.

For GCC 12.1.0 and above I get:
```
movl8(%esp), %ecx
bswap   %ecx
movl%ecx, %eax
movl4(%esp), %ecx
bswap   %ecx
movl%ecx, %edx
movl12(%esp), %ecx
movl%eax, (%ecx)
movl%edx, 4(%ecx)
ret
```

Which just has a few extra moves.

But adding  -mno-sse, GCC 12 produces worse code.

[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86

2024-03-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319

Andrew Pinski  changed:

   What|Removed |Added

  Component|target  |middle-end
   Severity|normal  |enhancement
   Keywords||missed-optimization