https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114323
Bug ID: 114323 Summary: [14 Regression] MVE vector load intrinsic miscompiled since r14-5622-g4d7647edfd7d98 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: acoplan at gcc dot gnu.org Target Milestone: --- The following testcase: #include <arm_mve.h> uint32x4_t foo (void) { uint32x4_t V0 = vld1q_u32(((const uint32_t[4]){1, 2, 3, 4})); return V0; } is miscompiled with -O2 -march=armv8.1-m.main+mve -mfloat-abi=hard on arm-none-eabi. Since r14-5622-g4d7647edfd7d985fbefe13de03c8bc2e3a74fc61 we generate: foo: sub sp, sp, #16 vldrw.32 q0, [sp] add sp, sp, #16 bx lr i.e. we do a vector load from uninitialized stack memory. GCC 13 used to give: foo: sub sp, sp, #16 mov ip, sp ldr r3, .L4 ldm r3, {r0, r1, r2, r3} stm ip, {r0, r1, r2, r3} vldrw.32 q0, [ip] add sp, sp, #16 bx lr .align 2 .L4: .word .LANCHOR0 .size foo, .-foo .section .rodata .align 2 .set .LANCHOR0,. + 0 .word 1 .word 2 .word 3 .word 4 which, while not optimal, is at least correct. Here is a full executable testcase for the testsuite: #include <arm_mve.h> __attribute__((noipa)) uint32x4_t foo (void) { uint32x4_t V0 = vld1q_u32(((const uint32_t[4]){1, 2, 3, 4})); return V0; } int main(void) { uint32_t buf[4]; vst1q_u32 (buf, foo()); for (int i = 0; i < 4; i++) if (buf[i] != i+1) __builtin_abort (); }