https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40771
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> --- AARCH64 vectorization looks decent too: ``` dup v31.8h, w0 adrp x2, .LC0 adrp x0, .LC1 adrp x1, .LANCHOR0 ldr q30, [x2, #:lo12:.LC0] ldr q29, [x0, #:lo12:.LC1] add v30.8h, v31.8h, v30.8h add v29.8h, v31.8h, v29.8h uzp2 v29.16b, v30.16b, v29.16b str q29, [x1, #:lo12:.LANCHOR0] ``` The only improvement that can be made there is with SVE, those ldr could be `index` instructions instead but that is PR 113328 .