Hi Andrew,
On 11/11/2021 17:57, Andrew Cooper wrote:
There are exactly 3 callers of sort() in the hypervisor.
Both arm callers pass in NULL for the swap function. While this might seem
like an attractive option at first, it causes generic_swap() to be used which
forced a byte-wise copy. Provide real swap functions which the compiler can
optimise sensibly.
Furthermore, use of function pointers in tight loops like that can be very bad
for performance. Implement sort() as extern inline, so the optimiser can
judge whether to inline things or not.
On x86, the diffstat shows how much of a better job the compiler can do when
it is able to see the cmp/swap implementations.
For completness, here the Arm bloat-o-meter:
add/remove: 0/5 grow/shrink: 2/0 up/down: 928/-660 (268)
Function old new delta
boot_fdt_info 640 1132 +492
register_mmio_handler 292 728 +436
u32_swap 20 - -20
generic_swap 40 - -40
cmp_mmio_handler 44 - -44
cmp_memory_node 44 - -44
sort 512 - -512
Total: Before=966915, After=967183, chg +0.03%
Cheers,
--
Julien Grall