tlb: Flush remote and local TLBs concurrently

Nadav Amit via Virtualization Wed, 03 Jul 2019 10:02:45 -0700

> On Jul 3, 2019, at 7:04 AM, Juergen Gross <[email protected]> wrote:
> 
> On 03.07.19 01:51, Nadav Amit wrote:
>> To improve TLB shootdown performance, flush the remote and local TLBs
>> concurrently. Introduce flush_tlb_multi() that does so. Introduce
>> paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
>> and hyper-v are only compile-tested).
>> While the updated smp infrastructure is capable of running a function on
>> a single local core, it is not optimized for this case. The multiple
>> function calls and the indirect branch introduce some overhead, and
>> might make local TLB flushes slower than they were before the recent
>> changes.
>> Before calling the SMP infrastructure, check if only a local TLB flush
>> is needed to restore the lost performance in this common case. This
>> requires to check mm_cpumask() one more time, but unless this mask is
>> updated very frequently, this should impact performance negatively.
>> Cc: "K. Y. Srinivasan" <[email protected]>
>> Cc: Haiyang Zhang <[email protected]>
>> Cc: Stephen Hemminger <[email protected]>
>> Cc: Sasha Levin <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: [email protected]
>> Cc: Juergen Gross <[email protected]>
>> Cc: Paolo Bonzini <[email protected]>
>> Cc: Dave Hansen <[email protected]>
>> Cc: Andy Lutomirski <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Boris Ostrovsky <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Nadav Amit <[email protected]>
>> ---
>>  arch/x86/hyperv/mmu.c                 | 13 +++---
>>  arch/x86/include/asm/paravirt.h       |  6 +--
>>  arch/x86/include/asm/paravirt_types.h |  4 +-
>>  arch/x86/include/asm/tlbflush.h       |  9 ++--
>>  arch/x86/include/asm/trace/hyperv.h   |  2 +-
>>  arch/x86/kernel/kvm.c                 | 11 +++--
>>  arch/x86/kernel/paravirt.c            |  2 +-
>>  arch/x86/mm/tlb.c                     | 65 ++++++++++++++++++++-------
>>  arch/x86/xen/mmu_pv.c                 | 20 ++++++---
>>  include/trace/events/xen.h            |  2 +-
>>  10 files changed, 91 insertions(+), 43 deletions(-)
> 
> ...
> 
>> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
>> index beb44e22afdf..19e481e6e904 100644
>> --- a/arch/x86/xen/mmu_pv.c
>> +++ b/arch/x86/xen/mmu_pv.c
>> @@ -1355,8 +1355,8 @@ static void xen_flush_tlb_one_user(unsigned long addr)
>>      preempt_enable();
>>  }
>>  -static void xen_flush_tlb_others(const struct cpumask *cpus,
>> -                             const struct flush_tlb_info *info)
>> +static void xen_flush_tlb_multi(const struct cpumask *cpus,
>> +                            const struct flush_tlb_info *info)
>>  {
>>      struct {
>>              struct mmuext_op op;
>> @@ -1366,7 +1366,7 @@ static void xen_flush_tlb_others(const struct cpumask 
>> *cpus,
>>      const size_t mc_entry_size = sizeof(args->op) +
>>              sizeof(args->mask[0]) * BITS_TO_LONGS(num_possible_cpus());
>>  -   trace_xen_mmu_flush_tlb_others(cpus, info->mm, info->start, info->end);
>> +    trace_xen_mmu_flush_tlb_multi(cpus, info->mm, info->start, info->end);
>>      if (cpumask_empty(cpus))
>>              return;         /* nothing to do */
>> @@ -1375,9 +1375,17 @@ static void xen_flush_tlb_others(const struct cpumask 
>> *cpus,
>>      args = mcs.args;
>>      args->op.arg2.vcpumask = to_cpumask(args->mask);
>>  -   /* Remove us, and any offline CPUS. */
>> +    /* Flush locally if needed and remove us */
>> +    if (cpumask_test_cpu(smp_processor_id(), to_cpumask(args->mask))) {
>> +            local_irq_disable();
>> +            flush_tlb_func_local(info);
> 
> I think this isn't the correct function for PV guests.
> 
> In fact it should be much easier: just don't clear the own cpu from the
> mask, that's all what's needed. The hypervisor is just fine having the
> current cpu in the mask and it will do the right thing.


Thanks. I will do so in v3. I don’t think Hyper-V people would want to do
the same, unfortunately, since it would induce VM-exit on TLB flushes. But
if they do - I’ll be able not to expose flush_tlb_func_local().

_______________________________________________
Virtualization mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

Reply via email to