Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
Boris Ostrovsky writes: > On 08/16/2017 12:42 PM, Vitaly Kuznetsov wrote: >> Vitaly Kuznetsov writes: >> >>> In case we decide to go HAVE_RCU_TABLE_FREE for all PARAVIRT-enabled >>> kernels (as it seems to be the easiest/fastest way to fix Xen PV) - what >>> do you think about the required testing? Any suggestion for a >>> specifically crafted micro benchmark in addition to standard >>> ebizzy/kernbench/...? >> In the meantime I tested HAVE_RCU_TABLE_FREE with kernbench (enablement >> patch I used is attached; I know that it breaks other architectures) on >> bare metal with PARAVIRT enabled in config. The results are: >> >>... >> >> As you can see, there's no notable difference. I'll think of a >> microbenchmark though. > > FWIW, I was about to send a very similar patch (but with only Xen-PV > enabling RCU-based free by default) and saw similar results with > kernbench, both Xen PV and baremetal. > Thanks for the confirmation, I'd go with enabling it for PARAVIRT as we will need it for Hyper-V too. >> >> #if CONFIG_PGTABLE_LEVELS > 4 >> void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d) >> { >> paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT); >> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE >> +tlb_remove_table(tlb, virt_to_page(p4d)); >> +#else >> tlb_remove_page(tlb, virt_to_page(p4d)); >> +#endif > > This can probably be factored out. > >> } >> #endif /* CONFIG_PGTABLE_LEVELS > 4 */ >> #endif /* CONFIG_PGTABLE_LEVELS > 3 */ >> diff --git a/mm/memory.c b/mm/memory.c >> index e158f7ac6730..18d6671b6ae2 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -329,6 +329,11 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, >> struct page *page, int page_ >> * See the comment near struct mmu_table_batch. >> */ >> >> +static void __tlb_remove_table(void *table) >> +{ >> +free_page_and_swap_cache(table); >> +} >> + > > This needs to be a per-arch routine (e.g. see arch/arm64/include/asm/tlb.h). > Yea, this was a quick-and-dirty x86-only patch. -- Vitaly
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On 08/16/2017 12:42 PM, Vitaly Kuznetsov wrote: > Vitaly Kuznetsov writes: > >> Peter Zijlstra writes: >> >>> On Fri, Aug 11, 2017 at 09:16:29AM -0700, Linus Torvalds wrote: On Fri, Aug 11, 2017 at 2:03 AM, Peter Zijlstra wrote: > I'm sure we talked about using HAVE_RCU_TABLE_FREE for x86 (and yes that > would make it work again), but this was some years ago and I cannot > readily find those emails. I think the only time we really talked about HAVE_RCU_TABLE_FREE for x86 (at least that I was cc'd on) was not because of RCU freeing, but because we just wanted to use the generic page table lookup code on x86 *despite* not using RCU freeing. And we just ended up renaming HAVE_GENERIC_RCU_GUP as HAVE_GENERIC_GUP. There was only passing mention of maybe making x86 use RCU, but the discussion was really about why the IF flag meant that x86 didn't need to, iirc. I don't recall us ever discussing *really* making x86 use RCU. >>> Google finds me this: >>> >>> https://lwn.net/Articles/500188/ >>> >>> Which includes: >>> >>> http://www.mail-archive.com/kvm@vger.kernel.org/msg72918.html >>> >>> which does as was suggested here, selects HAVE_RCU_TABLE_FREE for >>> PARAVIRT_TLB_FLUSH. >>> >>> But yes, this is very much virt specific nonsense, native would never >>> need this. >> In case we decide to go HAVE_RCU_TABLE_FREE for all PARAVIRT-enabled >> kernels (as it seems to be the easiest/fastest way to fix Xen PV) - what >> do you think about the required testing? Any suggestion for a >> specifically crafted micro benchmark in addition to standard >> ebizzy/kernbench/...? > In the meantime I tested HAVE_RCU_TABLE_FREE with kernbench (enablement > patch I used is attached; I know that it breaks other architectures) on > bare metal with PARAVIRT enabled in config. The results are: > > 6-CPU host: > > Average Half load -j 3 Run (std deviation): > CURRENT HAVE_RCU_TABLE_FREE > === === > Elapsed Time 400.498 (0.179679) Elapsed Time 399.909 (0.162853) > User Time 1098.72 (0.278536) User Time 1097.59 (0.283894) > System Time 100.301 (0.201629)System Time 99.736 (0.196254) > Percent CPU 299 (0) Percent CPU 299 (0) > Context Switches 5774.1 (69.2121) Context Switches 5744.4 (79.4162) > Sleeps 87621.2 (78.1093) Sleeps 87586.1 (99.7079) > > Average Optimal load -j 24 Run (std deviation): > CURRENT HAVE_RCU_TABLE_FREE > === === > Elapsed Time 219.03 (0.652534)Elapsed Time 218.959 (0.598674) > User Time 1119.51 (21.3284) User Time 1118.81 (21.7793) > System Time 100.499 (0.389308)System Time 99.8335 (0.251423) > Percent CPU 432.5 (136.974) Percent CPU 432.45 (136.922) > Context Switches 81827.4 (78029.5)Context Switches 81818.5 (78051) > Sleeps 97124.8 (9822.4) Sleeps 97207.9 (9955.04) > > 16-CPU host: > > Average Half load -j 8 Run (std deviation): > CURRENT HAVE_RCU_TABLE_FREE > === === > Elapsed Time 213.538 (3.7891) Elapsed Time 212.5 (3.10939) > User Time 1306.4 (1.83399)User Time 1307.65 (1.01364) > System Time 194.59 (0.864378) System Time 195.478 (0.794588) > Percent CPU 702.6 (13.5388) Percent CPU 707 (11.1131) > Context Switches 21189.2 (1199.4) Context Switches 21288.2 (552.388) > Sleeps 89390.2 (482.325) Sleeps 89677 (277.06) > > Average Optimal load -j 64 Run (std deviation): > CURRENT HAVE_RCU_TABLE_FREE > === === > Elapsed Time 137.866 (0.787928) Elapsed Time 138.438 (0.218792) > User Time 1488.92 (192.399) User Time 1489.92 (192.135) > System Time 234.981 (42.5806) System Time 236.09 (42.8138) > Percent CPU 1057.1 (373.826) Percent CPU 1057.1 (369.114) > Context Switches 187514 (175324) Context Switches 187358 (175060) > Sleeps 112633 (24535.5) Sleeps 111743 (23297.6) > > As you can see, there's no notable difference. I'll think of a > microbenchmark though. FWIW, I was about to send a very similar patch (but with only Xen-PV enabling RCU-based free by default) and saw similar results with kernbench, both Xen PV and baremetal. >> Additionally, I see another option for us: enable 'rcu table free' on >> boot (e.g. by taking tlb_remove_table to pv_ops and doing boot-time >> patching for it) so bare metal and other hypervisors are not affected >> by the change. > It seems there's no need for that and we can keep things simple... > > -- Vitaly > > 0001-x86-enable-RCU-based-table-free-w
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
Vitaly Kuznetsov writes: > Peter Zijlstra writes: > >> On Fri, Aug 11, 2017 at 09:16:29AM -0700, Linus Torvalds wrote: >>> On Fri, Aug 11, 2017 at 2:03 AM, Peter Zijlstra >>> wrote: >>> > >>> > I'm sure we talked about using HAVE_RCU_TABLE_FREE for x86 (and yes that >>> > would make it work again), but this was some years ago and I cannot >>> > readily find those emails. >>> >>> I think the only time we really talked about HAVE_RCU_TABLE_FREE for >>> x86 (at least that I was cc'd on) was not because of RCU freeing, but >>> because we just wanted to use the generic page table lookup code on >>> x86 *despite* not using RCU freeing. >>> >>> And we just ended up renaming HAVE_GENERIC_RCU_GUP as HAVE_GENERIC_GUP. >>> >>> There was only passing mention of maybe making x86 use RCU, but the >>> discussion was really about why the IF flag meant that x86 didn't need >>> to, iirc. >>> >>> I don't recall us ever discussing *really* making x86 use RCU. >> >> Google finds me this: >> >> https://lwn.net/Articles/500188/ >> >> Which includes: >> >> http://www.mail-archive.com/kvm@vger.kernel.org/msg72918.html >> >> which does as was suggested here, selects HAVE_RCU_TABLE_FREE for >> PARAVIRT_TLB_FLUSH. >> >> But yes, this is very much virt specific nonsense, native would never >> need this. > > In case we decide to go HAVE_RCU_TABLE_FREE for all PARAVIRT-enabled > kernels (as it seems to be the easiest/fastest way to fix Xen PV) - what > do you think about the required testing? Any suggestion for a > specifically crafted micro benchmark in addition to standard > ebizzy/kernbench/...? In the meantime I tested HAVE_RCU_TABLE_FREE with kernbench (enablement patch I used is attached; I know that it breaks other architectures) on bare metal with PARAVIRT enabled in config. The results are: 6-CPU host: Average Half load -j 3 Run (std deviation): CURRENT HAVE_RCU_TABLE_FREE === === Elapsed Time 400.498 (0.179679) Elapsed Time 399.909 (0.162853) User Time 1098.72 (0.278536)User Time 1097.59 (0.283894) System Time 100.301 (0.201629) System Time 99.736 (0.196254) Percent CPU 299 (0) Percent CPU 299 (0) Context Switches 5774.1 (69.2121) Context Switches 5744.4 (79.4162) Sleeps 87621.2 (78.1093)Sleeps 87586.1 (99.7079) Average Optimal load -j 24 Run (std deviation): CURRENT HAVE_RCU_TABLE_FREE === === Elapsed Time 219.03 (0.652534) Elapsed Time 218.959 (0.598674) User Time 1119.51 (21.3284) User Time 1118.81 (21.7793) System Time 100.499 (0.389308) System Time 99.8335 (0.251423) Percent CPU 432.5 (136.974) Percent CPU 432.45 (136.922) Context Switches 81827.4 (78029.5) Context Switches 81818.5 (78051) Sleeps 97124.8 (9822.4) Sleeps 97207.9 (9955.04) 16-CPU host: Average Half load -j 8 Run (std deviation): CURRENT HAVE_RCU_TABLE_FREE === === Elapsed Time 213.538 (3.7891) Elapsed Time 212.5 (3.10939) User Time 1306.4 (1.83399) User Time 1307.65 (1.01364) System Time 194.59 (0.864378) System Time 195.478 (0.794588) Percent CPU 702.6 (13.5388) Percent CPU 707 (11.1131) Context Switches 21189.2 (1199.4) Context Switches 21288.2 (552.388) Sleeps 89390.2 (482.325)Sleeps 89677 (277.06) Average Optimal load -j 64 Run (std deviation): CURRENT HAVE_RCU_TABLE_FREE === === Elapsed Time 137.866 (0.787928) Elapsed Time 138.438 (0.218792) User Time 1488.92 (192.399) User Time 1489.92 (192.135) System Time 234.981 (42.5806) System Time 236.09 (42.8138) Percent CPU 1057.1 (373.826)Percent CPU 1057.1 (369.114) Context Switches 187514 (175324)Context Switches 187358 (175060) Sleeps 112633 (24535.5) Sleeps 111743 (23297.6) As you can see, there's no notable difference. I'll think of a microbenchmark though. > > Additionally, I see another option for us: enable 'rcu table free' on > boot (e.g. by taking tlb_remove_table to pv_ops and doing boot-time > patching for it) so bare metal and other hypervisors are not affected > by the change. It seems there's no need for that and we can keep things simple... -- Vitaly >From daf5117706920aebe793d1239fccac2edd4d680c Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Mon, 14 Aug 2017 16:05:05 +0200 Subject: [PATCH] x86: enable RCU based table free when PARAVIRT Signed-off-by: Vitaly Kuznetsov --- arch/x86/Kconfig | 1 + arch/x86/mm/pgtable.c | 16 mm/memory.c | 5 + 3 files changed, 22 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig in
Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, 11 Aug 2017 14:07:14 +0200 Peter Zijlstra wrote: > It goes like: > > CPU0CPU1 > > unhook page > cli > traverse page tables > TLB invalidate ---> > sti > >TLB invalidate > <-- complete I guess the important part here is the above "complete". CPU0 doesn't proceed until its receives it. Thus it does act like cli~rcu_read_lock(), sti~rcu_read_unlock(), and "TLB invalidate" is equivalent to synchronize_rcu(). [ this response is for clarification for the casual observer of this thread ;-) ] -- Steve > > free page > > So the CPU1 page-table walker gets an existence guarantee of the > page-tables by clearing IF.
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
Peter Zijlstra writes: > On Fri, Aug 11, 2017 at 09:16:29AM -0700, Linus Torvalds wrote: >> On Fri, Aug 11, 2017 at 2:03 AM, Peter Zijlstra wrote: >> > >> > I'm sure we talked about using HAVE_RCU_TABLE_FREE for x86 (and yes that >> > would make it work again), but this was some years ago and I cannot >> > readily find those emails. >> >> I think the only time we really talked about HAVE_RCU_TABLE_FREE for >> x86 (at least that I was cc'd on) was not because of RCU freeing, but >> because we just wanted to use the generic page table lookup code on >> x86 *despite* not using RCU freeing. >> >> And we just ended up renaming HAVE_GENERIC_RCU_GUP as HAVE_GENERIC_GUP. >> >> There was only passing mention of maybe making x86 use RCU, but the >> discussion was really about why the IF flag meant that x86 didn't need >> to, iirc. >> >> I don't recall us ever discussing *really* making x86 use RCU. > > Google finds me this: > > https://lwn.net/Articles/500188/ > > Which includes: > > http://www.mail-archive.com/kvm@vger.kernel.org/msg72918.html > > which does as was suggested here, selects HAVE_RCU_TABLE_FREE for > PARAVIRT_TLB_FLUSH. > > But yes, this is very much virt specific nonsense, native would never > need this. In case we decide to go HAVE_RCU_TABLE_FREE for all PARAVIRT-enabled kernels (as it seems to be the easiest/fastest way to fix Xen PV) - what do you think about the required testing? Any suggestion for a specifically crafted micro benchmark in addition to standard ebizzy/kernbench/...? Additionally, I see another option for us: enable 'rcu table free' on boot (e.g. by taking tlb_remove_table to pv_ops and doing boot-time patching for it) so bare metal and other hypervisors are not affected by the change. -- Vitaly
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 09:16:29AM -0700, Linus Torvalds wrote: > On Fri, Aug 11, 2017 at 2:03 AM, Peter Zijlstra wrote: > > > > I'm sure we talked about using HAVE_RCU_TABLE_FREE for x86 (and yes that > > would make it work again), but this was some years ago and I cannot > > readily find those emails. > > I think the only time we really talked about HAVE_RCU_TABLE_FREE for > x86 (at least that I was cc'd on) was not because of RCU freeing, but > because we just wanted to use the generic page table lookup code on > x86 *despite* not using RCU freeing. > > And we just ended up renaming HAVE_GENERIC_RCU_GUP as HAVE_GENERIC_GUP. > > There was only passing mention of maybe making x86 use RCU, but the > discussion was really about why the IF flag meant that x86 didn't need > to, iirc. > > I don't recall us ever discussing *really* making x86 use RCU. Google finds me this: https://lwn.net/Articles/500188/ Which includes: http://www.mail-archive.com/kvm@vger.kernel.org/msg72918.html which does as was suggested here, selects HAVE_RCU_TABLE_FREE for PARAVIRT_TLB_FLUSH. But yes, this is very much virt specific nonsense, native would never need this.
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 2:03 AM, Peter Zijlstra wrote: > > I'm sure we talked about using HAVE_RCU_TABLE_FREE for x86 (and yes that > would make it work again), but this was some years ago and I cannot > readily find those emails. I think the only time we really talked about HAVE_RCU_TABLE_FREE for x86 (at least that I was cc'd on) was not because of RCU freeing, but because we just wanted to use the generic page table lookup code on x86 *despite* not using RCU freeing. And we just ended up renaming HAVE_GENERIC_RCU_GUP as HAVE_GENERIC_GUP. There was only passing mention of maybe making x86 use RCU, but the discussion was really about why the IF flag meant that x86 didn't need to, iirc. I don't recall us ever discussing *really* making x86 use RCU. Linus
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 03:07:29PM +0200, Juergen Gross wrote: > On 11/08/17 14:54, Peter Zijlstra wrote: > > On Fri, Aug 11, 2017 at 02:46:41PM +0200, Juergen Gross wrote: > >> Aah, okay. Now I understand the problem. The TLB isn't the issue but the > >> IPI is serving two purposes here: TLB flushing (which is allowed to > >> happen at any time) and serialization regarding access to critical pages > >> (which seems to be broken in the Xen case as you suggest). > > > > Indeed, and now hyper-v as well. > > Is it possible to distinguish between non-critical calls of > flush_tlb_others() (which should be the majority IMHO) and critical ones > regarding above problem? I guess the only problem is the case when a > page table can be freed because its last valid entry is gone, right? > > We might want to add a serialization flag to indicate flushing _and_ > serialization via IPI should be performed. Possible, but not trivial. Esp things like transparent huge pages, which swizzles PMDs around makes things tricky. The by far easiest solution is to switch over to HAVE_RCU_TABLE_FREE when either Xen or Hyper-V is doing this. Ideally it would not have a significant performance hit (needs testing) and we can simply always do this when PARAVIRT, or otherwise we need to get creative with static_keys or something.
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On 11/08/17 14:54, Peter Zijlstra wrote: > On Fri, Aug 11, 2017 at 02:46:41PM +0200, Juergen Gross wrote: >> Aah, okay. Now I understand the problem. The TLB isn't the issue but the >> IPI is serving two purposes here: TLB flushing (which is allowed to >> happen at any time) and serialization regarding access to critical pages >> (which seems to be broken in the Xen case as you suggest). > > Indeed, and now hyper-v as well. Is it possible to distinguish between non-critical calls of flush_tlb_others() (which should be the majority IMHO) and critical ones regarding above problem? I guess the only problem is the case when a page table can be freed because its last valid entry is gone, right? We might want to add a serialization flag to indicate flushing _and_ serialization via IPI should be performed. Juergen
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 02:46:41PM +0200, Juergen Gross wrote: > Aah, okay. Now I understand the problem. The TLB isn't the issue but the > IPI is serving two purposes here: TLB flushing (which is allowed to > happen at any time) and serialization regarding access to critical pages > (which seems to be broken in the Xen case as you suggest). Indeed, and now hyper-v as well.
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On 11/08/17 14:35, Peter Zijlstra wrote: > On Fri, Aug 11, 2017 at 02:22:25PM +0200, Juergen Gross wrote: >> Wait - the TLB can be cleared at any time, as Andrew was pointing out. >> No cpu can rely on an address being accessible just because IF is being >> cleared. All that matters is the existing and valid page table entry. >> >> So clearing IF on a cpu isn't meant to secure the TLB from being >> cleared, but just to avoid interrupts (as the name of the flag is >> suggesting). > > Yes, but by holding off the TLB invalidate IPI, we hold off the freeing > of the concurrently unhooked page-table. > >> In the Xen case the hypervisor does the following: >> >> - it checks whether any of the vcpus specified in the cpumask of the >> flush request is running on any physical cpu >> - if any running vcpu is found an IPI will be sent to the physical cpu >> and the hypervisor will do the TLB flush there > > And this will preempt a vcpu which could have IF cleared, right? > >> - any vcpu addressed by the flush and not running will be flagged to >> flush its TLB when being scheduled the next time >> >> This ensures no TLB entry to be flushed can be used after return of >> xen_flush_tlb_others(). > > But that is not a sufficient guarantee. We need the IF to hold off the > TLB invalidate and thereby hold off the freeing of our page-table pages. Aah, okay. Now I understand the problem. The TLB isn't the issue but the IPI is serving two purposes here: TLB flushing (which is allowed to happen at any time) and serialization regarding access to critical pages (which seems to be broken in the Xen case as you suggest). Juergen >
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 02:22:25PM +0200, Juergen Gross wrote: > Wait - the TLB can be cleared at any time, as Andrew was pointing out. > No cpu can rely on an address being accessible just because IF is being > cleared. All that matters is the existing and valid page table entry. > > So clearing IF on a cpu isn't meant to secure the TLB from being > cleared, but just to avoid interrupts (as the name of the flag is > suggesting). Yes, but by holding off the TLB invalidate IPI, we hold off the freeing of the concurrently unhooked page-table. > In the Xen case the hypervisor does the following: > > - it checks whether any of the vcpus specified in the cpumask of the > flush request is running on any physical cpu > - if any running vcpu is found an IPI will be sent to the physical cpu > and the hypervisor will do the TLB flush there And this will preempt a vcpu which could have IF cleared, right? > - any vcpu addressed by the flush and not running will be flagged to > flush its TLB when being scheduled the next time > > This ensures no TLB entry to be flushed can be used after return of > xen_flush_tlb_others(). But that is not a sufficient guarantee. We need the IF to hold off the TLB invalidate and thereby hold off the freeing of our page-table pages.
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On 11/08/17 12:56, Peter Zijlstra wrote: > On Fri, Aug 11, 2017 at 11:23:10AM +0200, Vitaly Kuznetsov wrote: >> Peter Zijlstra writes: >> >>> On Thu, Aug 10, 2017 at 07:08:22PM +, Jork Loeser wrote: >>> >>>>>> Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote >>>>>> TLB flush >>>> >>>>>> Hold on.. if we don't IPI for TLB invalidation. What serializes our >>>>>> software page table walkers like fast_gup() ? >>>>> >>>>> Hypervisor may implement this functionality via an IPI. >>>>> >>>>> K. Y >>>> >>>> HvFlushVirtualAddressList() states: >>>> This call guarantees that by the time control returns back to the >>>> caller, the observable effects of all flushes on the specified virtual >>>> processors have occurred. >>>> >>>> HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as >>>> adding sparse target VP lists. >>>> >>>> Is this enough of a guarantee, or do you see other races? >>> >>> That's nowhere near enough. We need the remote CPU to have completed any >>> guest IF section that was in progress at the time of the call. >>> >>> So if a host IPI can interrupt a guest while the guest has IF cleared, >>> and we then process the host IPI -- clear the TLBs -- before resuming the >>> guest, which still has IF cleared, we've got a problem. >>> >>> Because at that point, our software page-table walker, that relies on IF >>> being clear to guarantee the page-tables exist, because it holds off the >>> TLB invalidate and thereby the freeing of the pages, gets its pages >>> ripped out from under it. >> >> Oh, I see your concern. Hyper-V, however, is not the first x86 >> hypervisor trying to avoid IPIs on remote TLB flush, Xen does this >> too. Briefly looking at xen_flush_tlb_others() I don't see anything >> special, do we know how serialization is achieved there? > > No idea on how Xen works, I always just hope it goes away :-) But lets > ask some Xen folks. Wait - the TLB can be cleared at any time, as Andrew was pointing out. No cpu can rely on an address being accessible just because IF is being cleared. All that matters is the existing and valid page table entry. So clearing IF on a cpu isn't meant to secure the TLB from being cleared, but just to avoid interrupts (as the name of the flag is suggesting). In the Xen case the hypervisor does the following: - it checks whether any of the vcpus specified in the cpumask of the flush request is running on any physical cpu - if any running vcpu is found an IPI will be sent to the physical cpu and the hypervisor will do the TLB flush there - any vcpu addressed by the flush and not running will be flagged to flush its TLB when being scheduled the next time This ensures no TLB entry to be flushed can be used after return of xen_flush_tlb_others(). Juergen
Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 12:05:45PM +0100, Andrew Cooper wrote: > >> Oh, I see your concern. Hyper-V, however, is not the first x86 > >> hypervisor trying to avoid IPIs on remote TLB flush, Xen does this > >> too. Briefly looking at xen_flush_tlb_others() I don't see anything > >> special, do we know how serialization is achieved there? > > No idea on how Xen works, I always just hope it goes away :-) But lets > > ask some Xen folks. > > How is the software pagewalker relying on IF being clear safe at all (on > native, let alone under virtualisation)? Hardware has no architectural > requirement to keep entries in the TLB. No, but it _can_, therefore when we unhook pages we _must_ invalidate. It goes like: CPU0CPU1 unhook page cli traverse page tables TLB invalidate ---> sti TLB invalidate <-- complete free page So the CPU1 page-table walker gets an existence guarantee of the page-tables by clearing IF. > In the virtualisation case, at any point the vcpu can be scheduled on a > different pcpu even during a critical region like that, so the TLB > really can empty itself under your feet. Not the point.
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 11:03:36AM +0200, Peter Zijlstra wrote: > On Fri, Aug 11, 2017 at 01:15:18AM +, Jork Loeser wrote: > > > > > HvFlushVirtualAddressList() states: > > > > This call guarantees that by the time control returns back to the > > > > caller, the observable effects of all flushes on the specified virtual > > > > processors have occurred. > > > > > > > > HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as > > > > adding > > > > sparse target VP lists. > > > > > > > > Is this enough of a guarantee, or do you see other races? > > > > > > That's nowhere near enough. We need the remote CPU to have completed any > > > guest IF section that was in progress at the time of the call. > > > > > > So if a host IPI can interrupt a guest while the guest has IF cleared, > > > and we then > > > process the host IPI -- clear the TLBs -- before resuming the guest, > > > which still has > > > IF cleared, we've got a problem. > > > > > > Because at that point, our software page-table walker, that relies on IF > > > being > > > clear to guarantee the page-tables exist, because it holds off the TLB > > > invalidate > > > and thereby the freeing of the pages, gets its pages ripped out from > > > under it. > > > > I see, IF is used as a locking mechanism for the pages. Would > > CONFIG_HAVE_RCU_TABLE_FREE be an option for x86? There are caveats > > (statically enabled, RCU for page-free), yet if the resulting perf is > > still a gain it would be worthwhile for Hyper-V targeted kernels. > > I'm sure we talked about using HAVE_RCU_TABLE_FREE for x86 (and yes that > would make it work again), but this was some years ago and I cannot > readily find those emails. > > Kirill would you have any opinions? I guess we can try this. The main question is what would be performance implications of such move. -- Kirill A. Shutemov
Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On 11/08/17 11:56, Peter Zijlstra wrote: > On Fri, Aug 11, 2017 at 11:23:10AM +0200, Vitaly Kuznetsov wrote: >> Peter Zijlstra writes: >> >>> On Thu, Aug 10, 2017 at 07:08:22PM +, Jork Loeser wrote: >>> >>>>>> Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote >>>>>> TLB flush >>>>>> Hold on.. if we don't IPI for TLB invalidation. What serializes our >>>>>> software page table walkers like fast_gup() ? >>>>> Hypervisor may implement this functionality via an IPI. >>>>> >>>>> K. Y >>>> HvFlushVirtualAddressList() states: >>>> This call guarantees that by the time control returns back to the >>>> caller, the observable effects of all flushes on the specified virtual >>>> processors have occurred. >>>> >>>> HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as >>>> adding sparse target VP lists. >>>> >>>> Is this enough of a guarantee, or do you see other races? >>> That's nowhere near enough. We need the remote CPU to have completed any >>> guest IF section that was in progress at the time of the call. >>> >>> So if a host IPI can interrupt a guest while the guest has IF cleared, >>> and we then process the host IPI -- clear the TLBs -- before resuming the >>> guest, which still has IF cleared, we've got a problem. >>> >>> Because at that point, our software page-table walker, that relies on IF >>> being clear to guarantee the page-tables exist, because it holds off the >>> TLB invalidate and thereby the freeing of the pages, gets its pages >>> ripped out from under it. >> Oh, I see your concern. Hyper-V, however, is not the first x86 >> hypervisor trying to avoid IPIs on remote TLB flush, Xen does this >> too. Briefly looking at xen_flush_tlb_others() I don't see anything >> special, do we know how serialization is achieved there? > No idea on how Xen works, I always just hope it goes away :-) But lets > ask some Xen folks. How is the software pagewalker relying on IF being clear safe at all (on native, let alone under virtualisation)? Hardware has no architectural requirement to keep entries in the TLB. In the virtualisation case, at any point the vcpu can be scheduled on a different pcpu even during a critical region like that, so the TLB really can empty itself under your feet. ~Andrew
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 11:23:10AM +0200, Vitaly Kuznetsov wrote: > Peter Zijlstra writes: > > > On Thu, Aug 10, 2017 at 07:08:22PM +, Jork Loeser wrote: > > > >> > > Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote > >> > > TLB flush > >> > >> > > Hold on.. if we don't IPI for TLB invalidation. What serializes our > >> > > software page table walkers like fast_gup() ? > >> > > >> > Hypervisor may implement this functionality via an IPI. > >> > > >> > K. Y > >> > >> HvFlushVirtualAddressList() states: > >> This call guarantees that by the time control returns back to the > >> caller, the observable effects of all flushes on the specified virtual > >> processors have occurred. > >> > >> HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as > >> adding sparse target VP lists. > >> > >> Is this enough of a guarantee, or do you see other races? > > > > That's nowhere near enough. We need the remote CPU to have completed any > > guest IF section that was in progress at the time of the call. > > > > So if a host IPI can interrupt a guest while the guest has IF cleared, > > and we then process the host IPI -- clear the TLBs -- before resuming the > > guest, which still has IF cleared, we've got a problem. > > > > Because at that point, our software page-table walker, that relies on IF > > being clear to guarantee the page-tables exist, because it holds off the > > TLB invalidate and thereby the freeing of the pages, gets its pages > > ripped out from under it. > > Oh, I see your concern. Hyper-V, however, is not the first x86 > hypervisor trying to avoid IPIs on remote TLB flush, Xen does this > too. Briefly looking at xen_flush_tlb_others() I don't see anything > special, do we know how serialization is achieved there? No idea on how Xen works, I always just hope it goes away :-) But lets ask some Xen folks.
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
Peter Zijlstra writes: > On Thu, Aug 10, 2017 at 07:08:22PM +, Jork Loeser wrote: > >> > > Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote >> > > TLB flush >> >> > > Hold on.. if we don't IPI for TLB invalidation. What serializes our >> > > software page table walkers like fast_gup() ? >> > >> > Hypervisor may implement this functionality via an IPI. >> > >> > K. Y >> >> HvFlushVirtualAddressList() states: >> This call guarantees that by the time control returns back to the >> caller, the observable effects of all flushes on the specified virtual >> processors have occurred. >> >> HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as >> adding sparse target VP lists. >> >> Is this enough of a guarantee, or do you see other races? > > That's nowhere near enough. We need the remote CPU to have completed any > guest IF section that was in progress at the time of the call. > > So if a host IPI can interrupt a guest while the guest has IF cleared, > and we then process the host IPI -- clear the TLBs -- before resuming the > guest, which still has IF cleared, we've got a problem. > > Because at that point, our software page-table walker, that relies on IF > being clear to guarantee the page-tables exist, because it holds off the > TLB invalidate and thereby the freeing of the pages, gets its pages > ripped out from under it. Oh, I see your concern. Hyper-V, however, is not the first x86 hypervisor trying to avoid IPIs on remote TLB flush, Xen does this too. Briefly looking at xen_flush_tlb_others() I don't see anything special, do we know how serialization is achieved there? -- Vitaly
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 01:15:18AM +, Jork Loeser wrote: > > > HvFlushVirtualAddressList() states: > > > This call guarantees that by the time control returns back to the > > > caller, the observable effects of all flushes on the specified virtual > > > processors have occurred. > > > > > > HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as > > > adding > > > sparse target VP lists. > > > > > > Is this enough of a guarantee, or do you see other races? > > > > That's nowhere near enough. We need the remote CPU to have completed any > > guest IF section that was in progress at the time of the call. > > > > So if a host IPI can interrupt a guest while the guest has IF cleared, and > > we then > > process the host IPI -- clear the TLBs -- before resuming the guest, which > > still has > > IF cleared, we've got a problem. > > > > Because at that point, our software page-table walker, that relies on IF > > being > > clear to guarantee the page-tables exist, because it holds off the TLB > > invalidate > > and thereby the freeing of the pages, gets its pages ripped out from under > > it. > > I see, IF is used as a locking mechanism for the pages. Would > CONFIG_HAVE_RCU_TABLE_FREE be an option for x86? There are caveats > (statically enabled, RCU for page-free), yet if the resulting perf is > still a gain it would be worthwhile for Hyper-V targeted kernels. I'm sure we talked about using HAVE_RCU_TABLE_FREE for x86 (and yes that would make it work again), but this was some years ago and I cannot readily find those emails. Kirill would you have any opinions?
RE: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
> -Original Message- > From: Peter Zijlstra [mailto:pet...@infradead.org] > Sent: Thursday, August 10, 2017 12:28 > To: Jork Loeser > Cc: KY Srinivasan ; Simon Xiao ; > Haiyang Zhang ; Stephen Hemminger > ; torva...@linux-foundation.org; l...@kernel.org; > h...@zytor.com; vkuzn...@redhat.com; linux-kernel@vger.kernel.org; > rost...@goodmis.org; andy.shevche...@gmail.com; t...@linutronix.de; > mi...@kernel.org; linux-tip-comm...@vger.kernel.org > Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB > flush > > > > Hold on.. if we don't IPI for TLB invalidation. What serializes > > > > our software page table walkers like fast_gup() ? > > > > > > Hypervisor may implement this functionality via an IPI. > > > > > > K. Y > > > > HvFlushVirtualAddressList() states: > > This call guarantees that by the time control returns back to the > > caller, the observable effects of all flushes on the specified virtual > > processors have occurred. > > > > HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as > > adding > sparse target VP lists. > > > > Is this enough of a guarantee, or do you see other races? > > That's nowhere near enough. We need the remote CPU to have completed any > guest IF section that was in progress at the time of the call. > > So if a host IPI can interrupt a guest while the guest has IF cleared, and we > then > process the host IPI -- clear the TLBs -- before resuming the guest, which > still has > IF cleared, we've got a problem. > > Because at that point, our software page-table walker, that relies on IF being > clear to guarantee the page-tables exist, because it holds off the TLB > invalidate > and thereby the freeing of the pages, gets its pages ripped out from under it. I see, IF is used as a locking mechanism for the pages. Would CONFIG_HAVE_RCU_TABLE_FREE be an option for x86? There are caveats (statically enabled, RCU for page-free), yet if the resulting perf is still a gain it would be worthwhile for Hyper-V targeted kernels. Regards, Jork
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Thu, Aug 10, 2017 at 07:08:22PM +, Jork Loeser wrote: > > > Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB > > > flush > > > > Hold on.. if we don't IPI for TLB invalidation. What serializes our > > > software page table walkers like fast_gup() ? > > > > Hypervisor may implement this functionality via an IPI. > > > > K. Y > > HvFlushVirtualAddressList() states: > This call guarantees that by the time control returns back to the > caller, the observable effects of all flushes on the specified virtual > processors have occurred. > > HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as adding > sparse target VP lists. > > Is this enough of a guarantee, or do you see other races? That's nowhere near enough. We need the remote CPU to have completed any guest IF section that was in progress at the time of the call. So if a host IPI can interrupt a guest while the guest has IF cleared, and we then process the host IPI -- clear the TLBs -- before resuming the guest, which still has IF cleared, we've got a problem. Because at that point, our software page-table walker, that relies on IF being clear to guarantee the page-tables exist, because it holds off the TLB invalidate and thereby the freeing of the pages, gets its pages ripped out from under it.
RE: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
> -Original Message- > From: KY Srinivasan > > -Original Message- > > From: Peter Zijlstra [mailto:pet...@infradead.org] > > Sent: Thursday, August 10, 2017 11:57 AM > > To: Simon Xiao ; Haiyang Zhang > > ; Jork Loeser ; > > Stephen Hemminger ; torvalds@linux- > > foundation.org; l...@kernel.org; h...@zytor.com; vkuzn...@redhat.com; > > linux-kernel@vger.kernel.org; rost...@goodmis.org; > > andy.shevche...@gmail.com; t...@linutronix.de; KY Srinivasan > > ; mi...@kernel.org > > Cc: linux-tip-comm...@vger.kernel.org > > Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote > > TLB flush > > Hold on.. if we don't IPI for TLB invalidation. What serializes our > > software page table walkers like fast_gup() ? > > Hypervisor may implement this functionality via an IPI. > > K. Y HvFlushVirtualAddressList() states: This call guarantees that by the time control returns back to the caller, the observable effects of all flushes on the specified virtual processors have occurred. HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as adding sparse target VP lists. Is this enough of a guarantee, or do you see other races? Regards, Jork
RE: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
> -Original Message- > From: Peter Zijlstra [mailto:pet...@infradead.org] > Sent: Thursday, August 10, 2017 11:57 AM > To: Simon Xiao ; Haiyang Zhang > ; Jork Loeser ; > Stephen Hemminger ; torvalds@linux- > foundation.org; l...@kernel.org; h...@zytor.com; vkuzn...@redhat.com; > linux-kernel@vger.kernel.org; rost...@goodmis.org; > andy.shevche...@gmail.com; t...@linutronix.de; KY Srinivasan > ; mi...@kernel.org > Cc: linux-tip-comm...@vger.kernel.org > Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB > flush > > On Thu, Aug 10, 2017 at 11:21:49AM -0700, tip-bot for Vitaly Kuznetsov > wrote: > > Commit-ID: 2ffd9e33ce4af4e8cfa3e17bf493defe8474e2eb > > Gitweb: > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgit.kern > el.org%2Ftip%2F2ffd9e33ce4af4e8cfa3e17bf493defe8474e2eb&data=02%7C > 01%7Ckys%40microsoft.com%7C2537372f38d3414e999e08d4e0218ec8%7C72 > f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636379882129411812&sdata > =odsJ2NnQdD8LCEtDPfVf5rL%2F2sQX4fKUhlqVSjKhjCI%3D&reserved=0 > > Author: Vitaly Kuznetsov > > AuthorDate: Wed, 2 Aug 2017 18:09:19 +0200 > > Committer: Ingo Molnar > > CommitDate: Thu, 10 Aug 2017 20:16:44 +0200 > > > > x86/hyper-v: Use hypercall for remote TLB flush > > > > Hyper-V host can suggest us to use hypercall for doing remote TLB flush, > > this is supposed to work faster than IPIs. > > > > Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls > > we need to put the input somewhere in memory and we don't really want > to > > have memory allocation on each call so we pre-allocate per cpu memory > areas > > on boot. > > > > pv_ops patching is happening very early so we need to separate > > hyperv_setup_mmu_ops() and hyper_alloc_mmu(). > > > > It is possible and easy to implement local TLB flushing too and there is > > even a hint for that. However, I don't see a room for optimization on the > > host side as both hypercall and native tlb flush will result in vmexit. The > > hint is also not set on modern Hyper-V versions. > > Hold on.. if we don't IPI for TLB invalidation. What serializes our > software page table walkers like fast_gup() ? Hypervisor may implement this functionality via an IPI. K. Y
Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Thu, Aug 10, 2017 at 11:21:49AM -0700, tip-bot for Vitaly Kuznetsov wrote: > Commit-ID: 2ffd9e33ce4af4e8cfa3e17bf493defe8474e2eb > Gitweb: http://git.kernel.org/tip/2ffd9e33ce4af4e8cfa3e17bf493defe8474e2eb > Author: Vitaly Kuznetsov > AuthorDate: Wed, 2 Aug 2017 18:09:19 +0200 > Committer: Ingo Molnar > CommitDate: Thu, 10 Aug 2017 20:16:44 +0200 > > x86/hyper-v: Use hypercall for remote TLB flush > > Hyper-V host can suggest us to use hypercall for doing remote TLB flush, > this is supposed to work faster than IPIs. > > Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls > we need to put the input somewhere in memory and we don't really want to > have memory allocation on each call so we pre-allocate per cpu memory areas > on boot. > > pv_ops patching is happening very early so we need to separate > hyperv_setup_mmu_ops() and hyper_alloc_mmu(). > > It is possible and easy to implement local TLB flushing too and there is > even a hint for that. However, I don't see a room for optimization on the > host side as both hypercall and native tlb flush will result in vmexit. The > hint is also not set on modern Hyper-V versions. Hold on.. if we don't IPI for TLB invalidation. What serializes our software page table walkers like fast_gup() ?
[tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
Commit-ID: 2ffd9e33ce4af4e8cfa3e17bf493defe8474e2eb Gitweb: http://git.kernel.org/tip/2ffd9e33ce4af4e8cfa3e17bf493defe8474e2eb Author: Vitaly Kuznetsov AuthorDate: Wed, 2 Aug 2017 18:09:19 +0200 Committer: Ingo Molnar CommitDate: Thu, 10 Aug 2017 20:16:44 +0200 x86/hyper-v: Use hypercall for remote TLB flush Hyper-V host can suggest us to use hypercall for doing remote TLB flush, this is supposed to work faster than IPIs. Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls we need to put the input somewhere in memory and we don't really want to have memory allocation on each call so we pre-allocate per cpu memory areas on boot. pv_ops patching is happening very early so we need to separate hyperv_setup_mmu_ops() and hyper_alloc_mmu(). It is possible and easy to implement local TLB flushing too and there is even a hint for that. However, I don't see a room for optimization on the host side as both hypercall and native tlb flush will result in vmexit. The hint is also not set on modern Hyper-V versions. Signed-off-by: Vitaly Kuznetsov Reviewed-by: Andy Shevchenko Reviewed-by: Stephen Hemminger Cc: Andy Lutomirski Cc: Haiyang Zhang Cc: Jork Loeser Cc: K. Y. Srinivasan Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Simon Xiao Cc: Steven Rostedt Cc: Thomas Gleixner Cc: de...@linuxdriverproject.org Link: http://lkml.kernel.org/r/20170802160921.21791-8-vkuzn...@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/hyperv/Makefile | 2 +- arch/x86/hyperv/hv_init.c | 2 + arch/x86/hyperv/mmu.c | 138 + arch/x86/include/asm/mshyperv.h| 3 + arch/x86/include/uapi/asm/hyperv.h | 7 ++ arch/x86/kernel/cpu/mshyperv.c | 1 + drivers/hv/Kconfig | 1 + 7 files changed, 153 insertions(+), 1 deletion(-) diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile index 171ae09..367a820 100644 --- a/arch/x86/hyperv/Makefile +++ b/arch/x86/hyperv/Makefile @@ -1 +1 @@ -obj-y := hv_init.o +obj-y := hv_init.o mmu.o diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index e93b9a0..1a8eb55 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -140,6 +140,8 @@ void hyperv_init(void) hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg); wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); + hyper_alloc_mmu(); + /* * Register Hyper-V specific clocksource. */ diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c new file mode 100644 index 000..9419a20 --- /dev/null +++ b/arch/x86/hyperv/mmu.c @@ -0,0 +1,138 @@ +#define pr_fmt(fmt) "Hyper-V: " fmt + +#include +#include +#include +#include + +#include +#include +#include +#include + +/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */ +struct hv_flush_pcpu { + u64 address_space; + u64 flags; + u64 processor_mask; + u64 gva_list[]; +}; + +/* Each gva in gva_list encodes up to 4096 pages to flush */ +#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE) + +static struct hv_flush_pcpu __percpu *pcpu_flush; + +/* + * Fills in gva_list starting from offset. Returns the number of items added. + */ +static inline int fill_gva_list(u64 gva_list[], int offset, + unsigned long start, unsigned long end) +{ + int gva_n = offset; + unsigned long cur = start, diff; + + do { + diff = end > cur ? end - cur : 0; + + gva_list[gva_n] = cur & PAGE_MASK; + /* +* Lower 12 bits encode the number of additional +* pages to flush (in addition to the 'cur' page). +*/ + if (diff >= HV_TLB_FLUSH_UNIT) + gva_list[gva_n] |= ~PAGE_MASK; + else if (diff) + gva_list[gva_n] |= (diff - 1) >> PAGE_SHIFT; + + cur += HV_TLB_FLUSH_UNIT; + gva_n++; + + } while (cur < end); + + return gva_n - offset; +} + +static void hyperv_flush_tlb_others(const struct cpumask *cpus, + const struct flush_tlb_info *info) +{ + int cpu, vcpu, gva_n, max_gvas; + struct hv_flush_pcpu *flush; + u64 status = U64_MAX; + unsigned long flags; + + if (!pcpu_flush || !hv_hypercall_pg) + goto do_native; + + if (cpumask_empty(cpus)) + return; + + local_irq_save(flags); + + flush = this_cpu_ptr(pcpu_flush); + + if (info->mm) { + flush->address_space = virt_to_phys(info->mm->pgd); + flush->flags = 0; + } else { + flush->address_space = 0; + flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES; + } + + flush->processor_mask = 0; + if (cpumask_equal(cpus, cpu_present_mask)) { + flush->f
[tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
Commit-ID: 88b46342eb037d35decda4d651cfee5216f4f822 Gitweb: http://git.kernel.org/tip/88b46342eb037d35decda4d651cfee5216f4f822 Author: Vitaly Kuznetsov AuthorDate: Wed, 2 Aug 2017 18:09:19 +0200 Committer: Ingo Molnar CommitDate: Thu, 10 Aug 2017 16:50:23 +0200 x86/hyper-v: Use hypercall for remote TLB flush Hyper-V host can suggest us to use hypercall for doing remote TLB flush, this is supposed to work faster than IPIs. Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls we need to put the input somewhere in memory and we don't really want to have memory allocation on each call so we pre-allocate per cpu memory areas on boot. pv_ops patching is happening very early so we need to separate hyperv_setup_mmu_ops() and hyper_alloc_mmu(). It is possible and easy to implement local TLB flushing too and there is even a hint for that. However, I don't see a room for optimization on the host side as both hypercall and native tlb flush will result in vmexit. The hint is also not set on modern Hyper-V versions. Signed-off-by: Vitaly Kuznetsov Reviewed-by: Andy Shevchenko Reviewed-by: Stephen Hemminger Cc: Andy Lutomirski Cc: Haiyang Zhang Cc: Jork Loeser Cc: K. Y. Srinivasan Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Simon Xiao Cc: Steven Rostedt Cc: Thomas Gleixner Cc: de...@linuxdriverproject.org Link: http://lkml.kernel.org/r/20170802160921.21791-8-vkuzn...@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/hyperv/Makefile | 2 +- arch/x86/hyperv/hv_init.c | 2 + arch/x86/hyperv/mmu.c | 138 + arch/x86/include/asm/mshyperv.h| 3 + arch/x86/include/uapi/asm/hyperv.h | 7 ++ arch/x86/kernel/cpu/mshyperv.c | 1 + 6 files changed, 152 insertions(+), 1 deletion(-) diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile index 171ae09..367a820 100644 --- a/arch/x86/hyperv/Makefile +++ b/arch/x86/hyperv/Makefile @@ -1 +1 @@ -obj-y := hv_init.o +obj-y := hv_init.o mmu.o diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index e93b9a0..1a8eb55 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -140,6 +140,8 @@ void hyperv_init(void) hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg); wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); + hyper_alloc_mmu(); + /* * Register Hyper-V specific clocksource. */ diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c new file mode 100644 index 000..9419a20 --- /dev/null +++ b/arch/x86/hyperv/mmu.c @@ -0,0 +1,138 @@ +#define pr_fmt(fmt) "Hyper-V: " fmt + +#include +#include +#include +#include + +#include +#include +#include +#include + +/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */ +struct hv_flush_pcpu { + u64 address_space; + u64 flags; + u64 processor_mask; + u64 gva_list[]; +}; + +/* Each gva in gva_list encodes up to 4096 pages to flush */ +#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE) + +static struct hv_flush_pcpu __percpu *pcpu_flush; + +/* + * Fills in gva_list starting from offset. Returns the number of items added. + */ +static inline int fill_gva_list(u64 gva_list[], int offset, + unsigned long start, unsigned long end) +{ + int gva_n = offset; + unsigned long cur = start, diff; + + do { + diff = end > cur ? end - cur : 0; + + gva_list[gva_n] = cur & PAGE_MASK; + /* +* Lower 12 bits encode the number of additional +* pages to flush (in addition to the 'cur' page). +*/ + if (diff >= HV_TLB_FLUSH_UNIT) + gva_list[gva_n] |= ~PAGE_MASK; + else if (diff) + gva_list[gva_n] |= (diff - 1) >> PAGE_SHIFT; + + cur += HV_TLB_FLUSH_UNIT; + gva_n++; + + } while (cur < end); + + return gva_n - offset; +} + +static void hyperv_flush_tlb_others(const struct cpumask *cpus, + const struct flush_tlb_info *info) +{ + int cpu, vcpu, gva_n, max_gvas; + struct hv_flush_pcpu *flush; + u64 status = U64_MAX; + unsigned long flags; + + if (!pcpu_flush || !hv_hypercall_pg) + goto do_native; + + if (cpumask_empty(cpus)) + return; + + local_irq_save(flags); + + flush = this_cpu_ptr(pcpu_flush); + + if (info->mm) { + flush->address_space = virt_to_phys(info->mm->pgd); + flush->flags = 0; + } else { + flush->address_space = 0; + flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES; + } + + flush->processor_mask = 0; + if (cpumask_equal(cpus, cpu_present_mask)) { + flush->flags |= HV_FLUSH_ALL_PROCESSORS; + } e