On Mon, Jun 27, 2022 at 10:13:50AM +0800, Guo Hui wrote:
> The instructions assigned to the vcpu_is_preempted function parameter
> in the X86 architecture physical machine are redundant instructions,
> causing the multi-core performance of Unixbench to drop by about 4% to 5%.
> The C function is as follows:
> static bool vcpu_is_preempted(long vcpu);
> 
> The parameter 'vcpu' in the function osq_lock
> that calls the function vcpu_is_preempted is assigned as follows:
> 
> The C code is in the function node_cpu:
> cpu = node->cpu - 1;
> 
> The instructions corresponding to the C code are:
> mov 0x14(%rax),%edi
> sub $0x1,%edi
> 
> The above instructions are unnecessary
> in the X86 Native operating environment,
> causing high cache-misses and degrading performance.

The above basically says that argument setup is not patched out and
causes significant pain due to a cache-miss.

> Signed-off-by: Guo Hui <[email protected]>
> ---
>  arch/x86/kernel/paravirt-spinlocks.c | 4 ++++
>  kernel/locking/osq_lock.c            | 9 ++++++++-
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
> b/arch/x86/kernel/paravirt-spinlocks.c
> index 9e1ea99ad..7a55f8407 100644
> --- a/arch/x86/kernel/paravirt-spinlocks.c
> +++ b/arch/x86/kernel/paravirt-spinlocks.c
> @@ -33,6 +33,8 @@ bool pv_is_native_vcpu_is_preempted(void)
>               __raw_callee_save___native_vcpu_is_preempted;
>  }
>  
> +DECLARE_STATIC_KEY_FALSE(preemted_key);
> +
>  void __init paravirt_set_cap(void)
>  {
>       if (!pv_is_native_spin_unlock())
> @@ -40,4 +42,6 @@ void __init paravirt_set_cap(void)
>  
>       if (!pv_is_native_vcpu_is_preempted())
>               setup_force_cpu_cap(X86_FEATURE_VCPUPREEMPT);
> +     else
> +             static_branch_enable(&preemted_key);
>  }

At least for x86 it makes sense to have the static_key default the other
way around. That is, enable it along with vcpu_is_preempted().

> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index d5610ad52..a8798e701 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -22,9 +22,16 @@ static inline int encode_cpu(int cpu_nr)
>       return cpu_nr + 1;
>  }
>  
> +DEFINE_STATIC_KEY_FALSE(preemted_key);
> +
>  static inline int node_cpu(struct optimistic_spin_node *node)
>  {
> -     return node->cpu - 1;
> +     int cpu = 0;
> +
> +     if (!static_branch_unlikely(&preemted_key))
> +             cpu = node->cpu - 1;
> +
> +     return cpu;
>  }

Would not something like:

static inline bool
vcpu_is_preempted_node(struct optimistic_spin_node *node)
{
        if (!static_branch_unlikely(&vcpu_has_preemption))
                return false;

        return vcpu_is_preempted(node_cpu(node->prev));
}

And then use that like:

        if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() ||
                                  vcpu_is_preempted_node(node)))

Not generate better code still?
_______________________________________________
Virtualization mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to