On 4/19/19 1:07 PM, Alex Bennée wrote: > > Richard Henderson <richard.hender...@linaro.org> writes: > >> This is a case where we generate more than 64k code for a mere 231 >> guest instructions. > > I would like to know more! Are these unrolled vector ops or something else?
Yes. E.g. ld4 { v0.16b - v3.16b }, [x0] will generate 64 guest byte loads. Given the size of the code generated for each guest memory operation, we should probably change this to use 64-bit loads and dole out the bytes manually. Even for linux-user, with direct host memory ops this converts to 1k code. r~