On 11.11.2021 18:57, Andrew Cooper wrote:
> Retpolines are expensive, and all these do are select between the sync and
> nosync helpers.  Pass a boolean instead, and use direct calls everywhere.
> 
> Pause/unpause operations on behalf of dom0 are not fastpaths, so avoid
> exposing the __domain_pause_by_systemcontroller() internal.
> 
> This actually compiles smaller than before:
> 
>   $ ../scripts/bloat-o-meter xen-syms-before xen-syms-after
>   add/remove: 3/1 grow/shrink: 0/5 up/down: 250/-273 (-23)
>   Function                                     old     new   delta
>   _domain_pause                                  -     115    +115
>   domain_pause_by_systemcontroller               -      69     +69
>   domain_pause_by_systemcontroller_nosync        -      66     +66
>   domain_kill                                  426     398     -28
>   domain_resume                                246     214     -32
>   domain_pause_except_self                     189     141     -48
>   domain_pause                                  59      10     -49
>   domain_pause_nosync                           59       7     -52
>   __domain_pause_by_systemcontroller            64       -     -64
> 
> despite GCC's best efforts.  The new _domain_pause_by_systemcontroller()
> really should not be inlined, considering that the difference is only the
> setup of the sync boolean to pass to _domain_pause(), and there are plenty of
> registers to spare.
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>

Reviewed-by: Jan Beulich <jbeul...@suse.com>
albeit without meaning to override Julien's concerns in any way.

Also a question:

> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1234,15 +1234,18 @@ int vcpu_unpause_by_systemcontroller(struct vcpu *v)
>      return 0;
>  }
>  
> -static void do_domain_pause(struct domain *d,
> -                            void (*sleep_fn)(struct vcpu *v))
> +static void _domain_pause(struct domain *d, bool sync /* or nosync */)
>  {
>      struct vcpu *v;
>  
>      atomic_inc(&d->pause_count);
>  
> -    for_each_vcpu( d, v )
> -        sleep_fn(v);
> +    if ( sync )
> +        for_each_vcpu ( d, v )
> +            vcpu_sleep_sync(v);
> +    else
> +        for_each_vcpu ( d, v )
> +            vcpu_sleep_nosync(v);

Is this really better (for whichever reason) than 

    for_each_vcpu ( d, v )
    {
        if ( sync )
            vcpu_sleep_sync(v);
        else
            vcpu_sleep_nosync(v);
    }

?

Jan


Reply via email to