[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes

2018-02-28 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #9 from Jakub Jelinek  ---
Actually managed to reproduce today using godbolt's -E output.
This is something that got fixed with r247104 on the trunk and backported
to 7.2 as r248724.  So I believe this is fixed.  The choice to emit
__atomic_load_16 calls is intentional, see the overlong thread about it from
the last week or so on gcc at gcc.gnu.org mailing list.

*** This bug has been marked as a duplicate of bug 80293 ***

[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes

2018-01-25 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|7.3 |7.4

--- Comment #8 from Richard Biener  ---
GCC 7.3 is being released, adjusting target milestone.

[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes

2017-12-20 Thread matt at godbolt dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837

Matt Godbolt  changed:

   What|Removed |Added

 CC||matt at godbolt dot org

--- Comment #7 from Matt Godbolt  ---
The pedigree of the GCC 7.1 is as Peter says. The compiler isn't bootstrapped,
and uses an in-tree binutils, should that be important.

Other pertinent info from gcc -v

Configured with: ../gcc-7.1.0/configure --prefix /root/staging
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
--disable-multilib --disable-bootstrap --disable-multiarch --with-arch-32=i586
--enable-clocale=gnu --enable-languages=c,c++,go,fortran --enable-ld=yes
--enable-gold=yes --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--enable-linker-build-id --enable-lto --enable-plugins --enable-threads=posix
--with-pkgversion=GCC-Explorer-Build
Thread model: posix
gcc version 7.1.0 (GCC-Explorer-Build) 
GNU C++11 (GCC-Explorer-Build) version 7.1.0 (x86_64-linux-gnu)
compiled by GNU C version 5.4.0 20160609, GMP version 6.1.0, MPFR
version 3.1.4, MPC version 1.0.3, isl version isl-0.16.1-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072


The binaries used on CE are here:
https://s3.amazonaws.com/compiler-explorer/opt/gcc-7.1.0.tar.xz

[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes

2017-12-14 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837

--- Comment #6 from Peter Cordes  ---
(In reply to Jakub Jelinek from comment #4)
> But have just tried gcc 7.1.0 release and can't reproduce even there.

Matt says the Compiler Explorer backend uses upstream release tarballs like
`URL=ftp://ftp.gnu.org/gnu/gcc/gcc-${VERSION}/${TARBALL}`.  (where TARBALL is
`gcc-${VERSION}.tar.xz` for recent gcc where .xz is available).

The compiler config used is
https://github.com/mattgodbolt/compiler-explorer-image/blob/master/gcc/build/build.sh#L78:

CONFIG=""
CONFIG+=" --build=x86_64-linux-gnu"
CONFIG+=" --host=x86_64-linux-gnu"
CONFIG+=" --target=x86_64-linux-gnu"
CONFIG+=" --disable-bootstrap"
CONFIG+=" --enable-multiarch"
CONFIG+=" --with-abi=m64"
CONFIG+=" --with-multilib-list=m32,m64,mx32"
CONFIG+=" --enable-multilib"
CONFIG+=" --enable-clocale=gnu"
CONFIG+=" --enable-languages=c,c++,fortran" # used to have go, but is
incompatible with m32/mx32
CONFIG+=" --enable-ld=yes"
CONFIG+=" --enable-gold=yes"
CONFIG+=" --enable-libstdcxx-debug"
CONFIG+=" --enable-libstdcxx-time=yes"
CONFIG+=" --enable-linker-build-id" 
CONFIG+=" --enable-lto"
CONFIG+=" --enable-plugins"
CONFIG+=" --enable-threads=posix"
CONFIG+=" --with-pkgversion=GCC-Explorer-Build"
BINUTILS_VERSION=2.29.1


Does that help figure out how to build a gcc7.1.0 that can repro this?

[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes

2017-12-13 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837

--- Comment #5 from Peter Cordes  ---
(In reply to Jakub Jelinek from comment #4)
> Can't reproduce.  It is true that we now emit the __atomic_load_16 call, but
> that was intentional change

Yup.

>, and it can't be easily tail call, because the
> tailcall pass doesn't understand that the low 8 bytes of the 16 byte
> structure are returned the same as the whole structure

Ok that's disappointing, but hopefully is very rare after inlining.

> But I certainly can't reproduce any significant value masking etc., tried
> r235002 (+- gcc 6 branchpoint), r247000 (+- gcc 7 branchpoint) as well as
> current trunk.
> Unless it is something that has been broken on the 7 branch and later fixed.
> 
> But have just tried gcc 7.1.0 release and can't reproduce even there.

I can't repro it locally with gcc7.1.1 either.  This is the version info from
-fverbose-asm on the godbolt.org link (which does still repro it)

# GNU C++11 (GCC-Explorer-Build) version 7.1.0 (x86_64-linux-gnu)
#   compiled by GNU C version 5.4.0 20160609, GMP version 6.1.0, MPFR
version 3.1.4, MPC version 1.0.3, isl version isl-0.16.1-GMP

It's not present in the gcc7.2 build on Godbolt.org either.

I asked Matt Godbolt what exact version the compiler explorer site is using for
the gcc7.1.0 dropdown
(https://github.com/mattgodbolt/compiler-explorer/issues/684).  Hopefully he
can help us track down a gcc SVN revision to repro it, or confirm that it was a
misconfigured or buggy gcc or something.  Just to rule out the possibility of a
now-dormant bug lurking somewhere.

[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes

2017-12-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Can't reproduce.  It is true that we now emit the __atomic_load_16 call, but
that was intentional change, and it can't be easily tail call, because the
tailcall pass doesn't understand that the low 8 bytes of the 16 byte structure
are returned the same as the whole structure; I think that is pretty obscure
detail of the ABI, so I think it isn't worth teaching the tailcall pass about
such arch specific complexities.
But I certainly can't reproduce any significant value masking etc., tried
r235002 (+- gcc 6 branchpoint), r247000 (+- gcc 7 branchpoint) as well as
current trunk.
Unless it is something that has been broken on the 7 branch and later fixed.

But have just tried gcc 7.1.0 release and can't reproduce even there.

[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes

2017-08-20 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837

--- Comment #3 from Peter Cordes  ---
Seems to be fixed in gcc7.2.0: https://godbolt.org/g/jRwtZN

gcc7.2 is fine with -m32, -mx32, and -m64, but x32 is the most compact.  -m64
just calls __atomic_load_16


gcc7.2 -O3 -mx32 output:
follow_nounion(std::atomic*):
movq(%edi), %rax
movl%eax, %eax
ret

vs.

gcc7.1 -O3 -mx32
follow_nounion(std::atomic*):
movq(%edi), %rcx
xorl%edx, %edx
movzbl  %ch, %eax
movb%cl, %dl
movq%rcx, %rsi
movb%al, %dh
andl$16711680, %esi
andl$4278190080, %ecx
movzwl  %dx, %eax
orq %rsi, %rax
orq %rcx, %rax
ret

---


gcc7.2 -O3 -m64 just forwards its arg to __atomic_load_16 and then returns:

follow_nounion(std::atomic*):
subq$8, %rsp
movl$2, %esi
call__atomic_load_16
addq$8, %rsp
ret

It unfortunately doesn't optimize the tail-call to

movl$2, %esi
jmp __atomic_load_16

presumably because it hasn't realized early enough that it takes zero
instructions to extract the 8-byte low half of the 16-byte __atomic_load_16
return value.

[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes

2017-08-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|7.2 |7.3

[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes

2017-08-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|7.2 |7.3

--- Comment #3 from Richard Biener  ---
GCC 7.2 is being released, adjusting target milestone.

--- Comment #4 from Richard Biener  ---
GCC 7.2 is being released, adjusting target milestone.

[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes

2017-08-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|7.2 |7.3

--- Comment #3 from Richard Biener  ---
GCC 7.2 is being released, adjusting target milestone.

--- Comment #4 from Richard Biener  ---
GCC 7.2 is being released, adjusting target milestone.

[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes

2017-05-23 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837

--- Comment #2 from Peter Cordes  ---
(In reply to Richard Biener from comment #1)
> GCC 8 generates a __atomic_load_16 call for me while GCC 6 does
> 
> lock cmpxchg16b (%rdi)

That's expected.  See https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02344.html
for the reason.  (Apparently the plan is for gcc7 to call libatomic, to make it
possible to change the implementation sooner in the future if that's deemed
appropriate, even in ways that breaks compat with code that inlines lock
cmpxchg16b.)

The bug here is that gcc7.1.0 goes berserk after the library call, but gcc8
doesn't.  I haven't tested on any other gcc7 versions.

[Bug target/80837] [7/8 regression] x86 accessing a member of a 16-byte atomic object generates terrible code: splitting/merging the bytes

2017-05-23 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80837

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-05-23
   Target Milestone|--- |7.2
Summary|[7.1.0 regression] x86  |[7/8 regression] x86
   |accessing a member of a |accessing a member of a
   |16-byte atomic object   |16-byte atomic object
   |generates terrible code:|generates terrible code:
   |splitting/merging the bytes |splitting/merging the bytes
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
GCC 8 generates a __atomic_load_16 call for me while GCC 6 does

lock cmpxchg16b (%rdi)