On 11.11.2021 18:57, Andrew Cooper wrote:
> There are exactly 3 callers of sort() in the hypervisor.
> 
> Both arm callers pass in NULL for the swap function.  While this might seem
> like an attractive option at first, it causes generic_swap() to be used which
> forced a byte-wise copy.  Provide real swap functions which the compiler can
> optimise sensibly.
> 
> Furthermore, use of function pointers in tight loops like that can be very bad
> for performance.  Implement sort() as extern inline, so the optimiser can
> judge whether to inline things or not.
> 
> On x86, the diffstat shows how much of a better job the compiler can do when
> it is able to see the cmp/swap implementations.
> 
>   $ ../scripts/bloat-o-meter xen-syms-before xen-syms-after
>   add/remove: 0/5 grow/shrink: 1/1 up/down: 505/-735 (-230)
>   Function                                     old     new   delta
>   sort_exception_table                          31     536    +505
>   u32_swap                                       9       -      -9
>   generic_swap                                  34       -     -34
>   cmp_ex                                        36       -     -36
>   swap_ex                                       41       -     -41
>   sort_exception_tables                         90      38     -52
>   sort                                         563       -    -563
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>

Technically
Reviewed-by: Jan Beulich <jbeul...@suse.com>

Yet again without the intention of overriding Julien's concerns in any
way. To address one of them, how about retaining generic_swap() (as an
inline function), ...

> --- a/xen/include/xen/sort.h
> +++ b/xen/include/xen/sort.h
> @@ -3,8 +3,61 @@
>  
>  #include <xen/types.h>
>  
> +/*
> + * sort - sort an array of elements
> + * @base: pointer to data to sort
> + * @num: number of elements
> + * @size: size of each element
> + * @cmp: pointer to comparison function
> + * @swap: pointer to swap function or NULL
> + *
> + * This function does a heapsort on the given array. You may provide a
> + * swap function optimized to your element type.
> + *
> + * Sorting time is O(n log n) both on average and worst-case. While
> + * qsort is about 20% faster on average, it suffers from exploitable
> + * O(n*n) worst-case behavior and extra memory requirements that make
> + * it less suitable for kernel use.
> + */
> +#ifndef SORT_IMPLEMENTATION
> +extern gnu_inline
> +#endif
>  void sort(void *base, size_t num, size_t size,
>            int (*cmp)(const void *, const void *),
> -          void (*swap)(void *, void *, size_t));
> +          void (*swap)(void *, void *, size_t))
> +{
> +    /* pre-scale counters for performance */
> +    size_t i = (num / 2) * size, n = num * size, c, r;
> +
> +    /* heapify */
> +    while ( i > 0 )
> +    {
> +        for ( r = i -= size; r * 2 + size < n; r = c )
> +        {
> +            c = r * 2 + size;
> +            if ( (c < n - size) && (cmp(base + c, base + c + size) < 0) )
> +                c += size;
> +            if ( cmp(base + r, base + c) >= 0 )
> +                break;
> +            swap(base + r, base + c, size);

... doing

            if ( swap )
                swap(base + r, base + c, size);
            else
                generic_swap(base + r, base + c, size);

here and below. The compiler would then still be able to eliminate the
indirect calls (as well as the added conditional), I think.

Jan


Reply via email to