[Bug target/93005] Redundant NEON loads/stores from stack are not eliminated

2020-01-06 Thread joel at airwebreathe dot org.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005 --- Comment #7 from Joel Holdsworth --- > Did you test it with big-endian? Good question. It seems to do the right thing in both cases: https://godbolt.org/z/7rDzAm

[Bug target/93005] Redundant NEON loads/stores from stack are not eliminated

2020-01-06 Thread joel at airwebreathe dot org.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005 --- Comment #5 from Joel Holdsworth --- I found that if I make modified versions of the intrinsics in arm_neon.h that are designed more along the lines of the x86_64 SSE intrinsics defined with a simple pointer dereference, then gcc does the

[Bug target/93005] Redundant NEON loads/stores from stack are not eliminated

2020-01-06 Thread joel at airwebreathe dot org.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005 --- Comment #4 from Joel Holdsworth --- Results for clang and MSVC are similar: clang trunk: foo(__simd128_int32_t): push{r11, lr} mov r11, sp sub sp, sp, #24 bfc sp, #0, #4 mov r0,

[Bug target/93005] Redundant NEON loads/stores from stack are not eliminated

2020-01-03 Thread joel at airwebreathe dot org.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005 --- Comment #3 from Joel Holdsworth --- Interesting. Comparing the implementation of _mm_store_si128 to vst1q_s32: emminitrin.h extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_store_si128 (__m128i

[Bug target/93005] Redundant NEON loads/stores from stack are not eliminated

2020-01-02 Thread joel at airwebreathe dot org.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005 --- Comment #2 from Joel Holdsworth --- Are you saying that if the GIMPLE were defined for the intrinsics, then the optimizer would eliminate them automatically? Or is there more to it?

[Bug c++/93005] New: Redundant NEON loads/stores from stack are not eliminated

2019-12-19 Thread joel at airwebreathe dot org.uk
Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: joel at airwebreathe dot org.uk Target Milestone: --- On x86_64 SSE, gcc is able to eliminated redundant load/store operations to the stack, but on ARM, gcc seems unable to do the same