The intrinsic family for vdupq_n_XXX with argument of 0. The code generated is:
mov r0, #0 vdup.32 q8, r0 Instead of the faster veor.32 q8, q8, q8 Thing to note is that GCC will use xorps on x86[_64] for SSE when using _mm_setzero_ps() or _mm_set1_ps(0). -- Summary: GCC produces suboptimal ARM NEON code for zero vector assignment Product: gcc Version: 4.4.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: liranuna at gmail dot com GCC build triplet: x86_64-linux-gnu GCC host triplet: x86_64-linux-gnu GCC target triplet: arm-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43724