https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84278
Bug ID: 84278 Summary: claims initv4sfv2sf is available but inits through stack Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-*, i?86-*-* float A[1024]; float B[1024]; void foo(int s) { for (int i = 0; i < 128; i++) { B[i*2+0] = A[i*s+0]; B[i*2+1] = A[i*s+1]; } } the vectorizer generates { v2sf, v2sf } for the strided load because the backend tells it it can efficiently initialize such vector. But we expand to the following which doesn't look at all like using the special init path. (insn 14 13 16 (set (reg:V4SF 97) (const_vector:V4SF [ (const_double:SF 0.0 [0x0.0p+0]) (const_double:SF 0.0 [0x0.0p+0]) (const_double:SF 0.0 [0x0.0p+0]) (const_double:SF 0.0 [0x0.0p+0]) ])) "t.c":8 -1 (nil)) (insn 16 14 17 (set (reg:DI 99) (subreg:DI (reg:V4SF 97) 0)) "t.c":8 -1 (nil)) (insn 17 16 18 (parallel [ (set (reg:DI 100) (and:DI (reg:DI 99) (const_int 0 [0]))) (clobber (reg:CC 17 flags)) ]) "t.c":8 -1 (nil)) (insn 18 17 19 (parallel [ (set (reg:DI 101) (ior:DI (reg:DI 100) (mem:DI (reg/f:DI 88 [ _5 ]) [1 MEM[base: _5, offset: 0B]+0 S8 A32]))) (clobber (reg:CC 17 flags)) ]) "t.c":8 -1 (nil)) (insn 19 18 21 (set (subreg:DI (reg:V4SF 97) 0) (reg:DI 101)) "t.c":8 -1 (nil)) (insn 21 19 22 (set (reg:DI 103) (subreg:DI (reg:V4SF 97) 8)) "t.c":8 -1 (nil)) (insn 22 21 23 (parallel [ (set (reg:DI 104) (and:DI (reg:DI 103) (const_int 0 [0]))) (clobber (reg:CC 17 flags)) ]) "t.c":8 -1 (nil)) (insn 23 22 24 (parallel [ (set (reg:DI 105) (ior:DI (reg:DI 104) (mem:DI (plus:DI (reg/f:DI 88 [ _5 ]) (reg:DI 93 [ _20 ])) [1 MEM[base: _5, index: _20, offset: 0B]+0 S8 A32]))) (clobber (reg:CC 17 flags)) ]) "t.c":8 -1 (nil)) (insn 24 23 25 (set (subreg:DI (reg:V4SF 97) 8) (reg:DI 105)) "t.c":8 -1 (nil)) The issue seems to be the constructor element vector types have BLKmode as seen by store_constructor. The mismatch between what the vectorizer checks and what expansion gets is that TYPE_MODE ends up calling targetm.vector_mode_supported_p while the vectorizer just asks for mode_for_vector (elmode, group_size).exists (&vmode).