On 29/04/2025 9:16 am, Jan Beulich wrote:
> Multicall compat translation and hypercall continuation handling can
> also be shrunk to the processing of just (up to) 5 arguments.
>
> Take the opportunity to
> - make exceeding the limit noisy in hypercall_create_continuation(),
> - use speculation-safe array access in hypercall_create_continuation(),
> - avoid a Misra C:2012 Rule 19.1 violation in xlat_multicall_entry(),
> - further tidy xlat_multicall_entry() and __trace_multicall_call()
>   style-wise.
>
> Amends: 2f531c122e95 ("x86: limit number of hypercall parameters to 5")
> Signed-off-by: Jan Beulich <jbeul...@suse.com>

Acked-by: Andrew Cooper <andrew.coop...@citrix.com>

> ---
> hypercall_xlat_continuation() uses BUG() when encountering too large an
> argument count in release builds, but I think that's too harsh. Hence in
> hypercall_create_continuation() I'm re-using the existing error path.
> Interestingly the multicall part of hypercall_xlat_continuation() has no
> check at all which would cover release builds.

This has never been the best of code.  All we can do is improve it
moving forward.

>
> With gcc14 code size grows according to my observation, due to the loops
> in xlat_multicall_entry() and __trace_multicall_call() both being
> unrolled now.

With GCC-12, it gets marginally smaller.  The loops are unrolled and the
instruction scheduling is rather odd.

mov    0x208(%r15),%rax
mov    0x228(%r15),%rdx
mov    %eax,0x1c(%rsp)
mov    0x210(%r15),%rax
mov    %edx,0x2c(%rsp)
mov    %eax,0x20(%rsp)
mov    0x218(%r15),%rax
mov    0x1c(%rsp),%rsi
mov    %edx,0x210(%r15)
mov    %eax,0x24(%rsp)
mov    0x220(%r15),%rax
mov    %rsi,0x200(%r15)
mov    %eax,0x28(%rsp)
mov    0x24(%rsp),%rsi
mov    %rsi,0x208(%r15)

This is the interleaved copy onto the stack, and back off.

Weirdly, we're doing 8 byte loads and 4 byte stores to copy on to the
stack.  It's not wrong per say, but it's also not necessary.

Then, for the copy-off, there is an overlapping store:

mov    %edx,0x210(%r15)

and

mov    %rsi,0x208(%r15)

not to mention this:

mov    %eax,0x28(%rsp)
mov    0x24(%rsp),%rsi

using an overlapping store and wider load for the copy-off.

Anyway - it looks to function correctly, but I don't exactly feel as if
the optimiser has done a great job.

~Andrew

Reply via email to