[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2023-02-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #33 from Segher Boessenkool --- Yes, exactly. It was the X server I think? I try to forget such horrors :-)

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2023-02-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #32 from Andrew Pinski --- (In reply to Segher Boessenkool from comment #31) > Yes, there was a user who incorrectly used memcpy on non-memory memory. >From what I remember (it was also reported about aarch64 at one point too), one

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2023-02-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #31 from Segher Boessenkool --- Yes, there was a user who incorrectly used memcpy on non-memory memory. This is not valid, and never has been.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2023-02-15 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #30 from Florian Weimer --- (In reply to Segher Boessenkool from comment #29) > (In reply to Florian Weimer from comment #28) > > Maybe this belongs in the ABI manual? For example, the POWER ABI says that > > memcpy needs to work on

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2023-02-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #29 from Segher Boessenkool --- (In reply to Florian Weimer from comment #28) > Maybe this belongs in the ABI manual? For example, the POWER ABI says that > memcpy needs to work on device memory. Huh?! Where do you see this? The

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-29 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #28 from Florian Weimer --- (In reply to Peter Cordes from comment #27) > (In reply to Alexander Monakov from comment #26) > > Sure, the right course of action seems to be to simply document that atomic > > types and built-ins are

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-28 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #27 from Peter Cordes --- (In reply to Alexander Monakov from comment #26) > Sure, the right course of action seems to be to simply document that atomic > types and built-ins are meant to be used on "common" (writeback) memory

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #26 from Alexander Monakov --- Sure, the right course of action seems to be to simply document that atomic types and built-ins are meant to be used on "common" (writeback) memory, and no guarantees can be given otherwise, because it

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-28 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #25 from Peter Cordes --- (In reply to Alexander Monakov from comment #24) > > I think it's possible to get UC/WC mappings via a graphics/compute API (e.g. > OpenGL, Vulkan, OpenCL, CUDA) on any OS if you get a mapping to device >

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #24 from Alexander Monakov --- (In reply to Peter Cordes from comment #23) > But at least on Linux, I don't think there's a way for user-space to even > ask for a page of WT or WP memory (or UC or WC). Only WB memory is easily >

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-28 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 Peter Cordes changed: What|Removed |Added CC||peter at cordes dot ca --- Comment #23

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-23 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #22 from Xi Ruoyao --- (In reply to Jakub Jelinek from comment #21) > What about loads? That is even more important than the stores. While > atomic store can be worst case done through cmpxchg16b, even when it is > slower, we

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-23 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #21 from Jakub Jelinek --- What about loads? That is even more important than the stores. While atomic store can be worst case done through cmpxchg16b, even when it is slower, we can't use cmpxchg16b on atomic load because we

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-23 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #20 from Xi Ruoyao --- >From Mayshao (Zhaoxin engineer): "On Zhaoxin CPUs with AVX, the VMOVDQA instruction is atomic if the accessed memory is Write Back, but it's not guaranteed for other memory types." Is it allowed to use

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-21 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #19 from CVS Commits --- The releases/gcc-11 branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:60880f3afc82f55b834643e449883dd5b6ad057a commit r11-10385-g60880f3afc82f55b834643e449883dd5b6ad057a Author: Jakub

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #18 from CVS Commits --- The releases/gcc-12 branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:86dea99d8525bf49d51636332d6be440e51b931a commit r12-8920-g86dea99d8525bf49d51636332d6be440e51b931a Author: Jakub Jelinek

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #17 from Jakub Jelinek --- Fixed for AMD on the library side too. We need a statement from Zhaoxin and VIA for their CPUs.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #16 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:4a7a846687e076eae58ad3ea959245b2bf7fdc07 commit r13-4048-g4a7a846687e076eae58ad3ea959245b2bf7fdc07 Author: Jakub Jelinek Date:

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #15 from Alexander Monakov --- Ah, there will be an mfence after the vmovdqa when necessary for an atomic store, thanks (I missed that because the testcase doesn't scan for mfence).

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #14 from Jakub Jelinek --- For ordering guarantees I assume (already since the r12-7689 change) that VMOVDQA behaves the same as MOVL/MOVQ. This PR was about whether there is a quarantee that VMOVDQA will be an atomic load or store

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #12 from Jakub Jelinek --- I've posted the patches (so far only lightly tested): https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606021.html https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606022.html It is still

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-13 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 Xi Ruoyao changed: What|Removed |Added Summary|gcc and libatomic can use |gcc and libatomic can use