On 29/04/2025 9:16 am, Jan Beulich wrote: > Multicall compat translation and hypercall continuation handling can > also be shrunk to the processing of just (up to) 5 arguments. > > Take the opportunity to > - make exceeding the limit noisy in hypercall_create_continuation(), > - use speculation-safe array access in hypercall_create_continuation(), > - avoid a Misra C:2012 Rule 19.1 violation in xlat_multicall_entry(), > - further tidy xlat_multicall_entry() and __trace_multicall_call() > style-wise. > > Amends: 2f531c122e95 ("x86: limit number of hypercall parameters to 5") > Signed-off-by: Jan Beulich <jbeul...@suse.com>
Acked-by: Andrew Cooper <andrew.coop...@citrix.com> > --- > hypercall_xlat_continuation() uses BUG() when encountering too large an > argument count in release builds, but I think that's too harsh. Hence in > hypercall_create_continuation() I'm re-using the existing error path. > Interestingly the multicall part of hypercall_xlat_continuation() has no > check at all which would cover release builds. This has never been the best of code. All we can do is improve it moving forward. > > With gcc14 code size grows according to my observation, due to the loops > in xlat_multicall_entry() and __trace_multicall_call() both being > unrolled now. With GCC-12, it gets marginally smaller. The loops are unrolled and the instruction scheduling is rather odd. mov 0x208(%r15),%rax mov 0x228(%r15),%rdx mov %eax,0x1c(%rsp) mov 0x210(%r15),%rax mov %edx,0x2c(%rsp) mov %eax,0x20(%rsp) mov 0x218(%r15),%rax mov 0x1c(%rsp),%rsi mov %edx,0x210(%r15) mov %eax,0x24(%rsp) mov 0x220(%r15),%rax mov %rsi,0x200(%r15) mov %eax,0x28(%rsp) mov 0x24(%rsp),%rsi mov %rsi,0x208(%r15) This is the interleaved copy onto the stack, and back off. Weirdly, we're doing 8 byte loads and 4 byte stores to copy on to the stack. It's not wrong per say, but it's also not necessary. Then, for the copy-off, there is an overlapping store: mov %edx,0x210(%r15) and mov %rsi,0x208(%r15) not to mention this: mov %eax,0x28(%rsp) mov 0x24(%rsp),%rsi using an overlapping store and wider load for the copy-off. Anyway - it looks to function correctly, but I don't exactly feel as if the optimiser has done a great job. ~Andrew