https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005
--- Comment #7 from Joel Holdsworth ---
> Did you test it with big-endian?
Good question. It seems to do the right thing in both cases:
https://godbolt.org/z/7rDzAm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005
--- Comment #5 from Joel Holdsworth ---
I found that if I make modified versions of the intrinsics in arm_neon.h that
are designed more along the lines of the x86_64 SSE intrinsics defined with a
simple pointer dereference, then gcc does the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005
--- Comment #4 from Joel Holdsworth ---
Results for clang and MSVC are similar:
clang trunk:
foo(__simd128_int32_t):
push{r11, lr}
mov r11, sp
sub sp, sp, #24
bfc sp, #0, #4
mov r0,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005
--- Comment #3 from Joel Holdsworth ---
Interesting. Comparing the implementation of _mm_store_si128 to vst1q_s32:
emminitrin.h
extern __inline void __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm_store_si128 (__m128i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005
--- Comment #2 from Joel Holdsworth ---
Are you saying that if the GIMPLE were defined for the intrinsics, then the
optimizer would eliminate them automatically? Or is there more to it?
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: joel at airwebreathe dot org.uk
Target Milestone: ---
On x86_64 SSE, gcc is able to eliminated redundant load/store operations to the
stack, but on ARM, gcc seems unable to do the same