Something I noticed while looking at AdvSIMD dumps, while testing changes common with SVE2.
If we're going to load a zero into a vector register for clearing the high bits of the SVE register, we might as well use that zero to store the 8 bytes at the top of the AdvSIMD register as well. Output assembly goes from e.g. 00: 48 c7 85 08 10 00 00 00 movq $0x0,0x1008(%rbp) 00 00 00 0b: c5 f9 ef c0 vpxor %xmm0,%xmm0,%xmm0 0f: c5 fe 7f 85 10 10 00 00 vmovdqu %ymm0,0x1010(%rbp) 17: c5 fa 7f 85 30 10 00 00 vmovdqu %xmm0,0x1030(%rbp) to 00: c5 f9 ef c0 vpxor %xmm0,%xmm0,%xmm0 04: c5 f9 d6 85 08 10 00 00 vmovq %xmm0,0x1008(%rbp) 0c: c5 fe 7f 85 10 10 00 00 vmovdqu %ymm0,0x1010(%rbp) 14: c5 fa 7f 85 30 10 00 00 vmovdqu %xmm0,0x1030(%rbp) Saves a few bytes now, and more when we can do better with loading constants into registers, where we can share the vpxor between instructions. The target/arm patches are not aided by the tcg patch, but are not dependent on it. r~ Richard Henderson (3): tcg: Improve vector tail clearing target/arm: Use tcg_gen_gvec_mov for clear_vec_high target/arm: Use clear_vec_high more effectively target/arm/translate-a64.c | 69 ++++++++++++++++++-------------- tcg/tcg-op-gvec.c | 82 +++++++++++++++++++++++++++++--------- 2 files changed, 101 insertions(+), 50 deletions(-) -- 2.20.1