Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, 11 Aug 2017 14:07:14 +0200 Peter Zijlstrawrote: > It goes like: > > CPU0CPU1 > > unhook page > cli > traverse page tables > TLB invalidate ---> > sti > >TLB invalidate > <-- complete I guess the important part here is the above "complete". CPU0 doesn't proceed until its receives it. Thus it does act like cli~rcu_read_lock(), sti~rcu_read_unlock(), and "TLB invalidate" is equivalent to synchronize_rcu(). [ this response is for clarification for the casual observer of this thread ;-) ] -- Steve > > free page > > So the CPU1 page-table walker gets an existence guarantee of the > page-tables by clearing IF. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 03:07:29PM +0200, Juergen Gross wrote: > On 11/08/17 14:54, Peter Zijlstra wrote: > > On Fri, Aug 11, 2017 at 02:46:41PM +0200, Juergen Gross wrote: > >> Aah, okay. Now I understand the problem. The TLB isn't the issue but the > >> IPI is serving two purposes here: TLB flushing (which is allowed to > >> happen at any time) and serialization regarding access to critical pages > >> (which seems to be broken in the Xen case as you suggest). > > > > Indeed, and now hyper-v as well. > > Is it possible to distinguish between non-critical calls of > flush_tlb_others() (which should be the majority IMHO) and critical ones > regarding above problem? I guess the only problem is the case when a > page table can be freed because its last valid entry is gone, right? > > We might want to add a serialization flag to indicate flushing _and_ > serialization via IPI should be performed. Possible, but not trivial. Esp things like transparent huge pages, which swizzles PMDs around makes things tricky. The by far easiest solution is to switch over to HAVE_RCU_TABLE_FREE when either Xen or Hyper-V is doing this. Ideally it would not have a significant performance hit (needs testing) and we can simply always do this when PARAVIRT, or otherwise we need to get creative with static_keys or something. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On 11/08/17 14:54, Peter Zijlstra wrote: > On Fri, Aug 11, 2017 at 02:46:41PM +0200, Juergen Gross wrote: >> Aah, okay. Now I understand the problem. The TLB isn't the issue but the >> IPI is serving two purposes here: TLB flushing (which is allowed to >> happen at any time) and serialization regarding access to critical pages >> (which seems to be broken in the Xen case as you suggest). > > Indeed, and now hyper-v as well. Is it possible to distinguish between non-critical calls of flush_tlb_others() (which should be the majority IMHO) and critical ones regarding above problem? I guess the only problem is the case when a page table can be freed because its last valid entry is gone, right? We might want to add a serialization flag to indicate flushing _and_ serialization via IPI should be performed. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 02:46:41PM +0200, Juergen Gross wrote: > Aah, okay. Now I understand the problem. The TLB isn't the issue but the > IPI is serving two purposes here: TLB flushing (which is allowed to > happen at any time) and serialization regarding access to critical pages > (which seems to be broken in the Xen case as you suggest). Indeed, and now hyper-v as well. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On 11/08/17 14:35, Peter Zijlstra wrote: > On Fri, Aug 11, 2017 at 02:22:25PM +0200, Juergen Gross wrote: >> Wait - the TLB can be cleared at any time, as Andrew was pointing out. >> No cpu can rely on an address being accessible just because IF is being >> cleared. All that matters is the existing and valid page table entry. >> >> So clearing IF on a cpu isn't meant to secure the TLB from being >> cleared, but just to avoid interrupts (as the name of the flag is >> suggesting). > > Yes, but by holding off the TLB invalidate IPI, we hold off the freeing > of the concurrently unhooked page-table. > >> In the Xen case the hypervisor does the following: >> >> - it checks whether any of the vcpus specified in the cpumask of the >> flush request is running on any physical cpu >> - if any running vcpu is found an IPI will be sent to the physical cpu >> and the hypervisor will do the TLB flush there > > And this will preempt a vcpu which could have IF cleared, right? > >> - any vcpu addressed by the flush and not running will be flagged to >> flush its TLB when being scheduled the next time >> >> This ensures no TLB entry to be flushed can be used after return of >> xen_flush_tlb_others(). > > But that is not a sufficient guarantee. We need the IF to hold off the > TLB invalidate and thereby hold off the freeing of our page-table pages. Aah, okay. Now I understand the problem. The TLB isn't the issue but the IPI is serving two purposes here: TLB flushing (which is allowed to happen at any time) and serialization regarding access to critical pages (which seems to be broken in the Xen case as you suggest). Juergen > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 02:22:25PM +0200, Juergen Gross wrote: > Wait - the TLB can be cleared at any time, as Andrew was pointing out. > No cpu can rely on an address being accessible just because IF is being > cleared. All that matters is the existing and valid page table entry. > > So clearing IF on a cpu isn't meant to secure the TLB from being > cleared, but just to avoid interrupts (as the name of the flag is > suggesting). Yes, but by holding off the TLB invalidate IPI, we hold off the freeing of the concurrently unhooked page-table. > In the Xen case the hypervisor does the following: > > - it checks whether any of the vcpus specified in the cpumask of the > flush request is running on any physical cpu > - if any running vcpu is found an IPI will be sent to the physical cpu > and the hypervisor will do the TLB flush there And this will preempt a vcpu which could have IF cleared, right? > - any vcpu addressed by the flush and not running will be flagged to > flush its TLB when being scheduled the next time > > This ensures no TLB entry to be flushed can be used after return of > xen_flush_tlb_others(). But that is not a sufficient guarantee. We need the IF to hold off the TLB invalidate and thereby hold off the freeing of our page-table pages. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On 11/08/17 12:56, Peter Zijlstra wrote: > On Fri, Aug 11, 2017 at 11:23:10AM +0200, Vitaly Kuznetsov wrote: >> Peter Zijlstrawrites: >> >>> On Thu, Aug 10, 2017 at 07:08:22PM +, Jork Loeser wrote: >>> >> Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote >> TLB flush >> Hold on.. if we don't IPI for TLB invalidation. What serializes our >> software page table walkers like fast_gup() ? > > Hypervisor may implement this functionality via an IPI. > > K. Y HvFlushVirtualAddressList() states: This call guarantees that by the time control returns back to the caller, the observable effects of all flushes on the specified virtual processors have occurred. HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as adding sparse target VP lists. Is this enough of a guarantee, or do you see other races? >>> >>> That's nowhere near enough. We need the remote CPU to have completed any >>> guest IF section that was in progress at the time of the call. >>> >>> So if a host IPI can interrupt a guest while the guest has IF cleared, >>> and we then process the host IPI -- clear the TLBs -- before resuming the >>> guest, which still has IF cleared, we've got a problem. >>> >>> Because at that point, our software page-table walker, that relies on IF >>> being clear to guarantee the page-tables exist, because it holds off the >>> TLB invalidate and thereby the freeing of the pages, gets its pages >>> ripped out from under it. >> >> Oh, I see your concern. Hyper-V, however, is not the first x86 >> hypervisor trying to avoid IPIs on remote TLB flush, Xen does this >> too. Briefly looking at xen_flush_tlb_others() I don't see anything >> special, do we know how serialization is achieved there? > > No idea on how Xen works, I always just hope it goes away :-) But lets > ask some Xen folks. Wait - the TLB can be cleared at any time, as Andrew was pointing out. No cpu can rely on an address being accessible just because IF is being cleared. All that matters is the existing and valid page table entry. So clearing IF on a cpu isn't meant to secure the TLB from being cleared, but just to avoid interrupts (as the name of the flag is suggesting). In the Xen case the hypervisor does the following: - it checks whether any of the vcpus specified in the cpumask of the flush request is running on any physical cpu - if any running vcpu is found an IPI will be sent to the physical cpu and the hypervisor will do the TLB flush there - any vcpu addressed by the flush and not running will be flagged to flush its TLB when being scheduled the next time This ensures no TLB entry to be flushed can be used after return of xen_flush_tlb_others(). Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 12:05:45PM +0100, Andrew Cooper wrote: > >> Oh, I see your concern. Hyper-V, however, is not the first x86 > >> hypervisor trying to avoid IPIs on remote TLB flush, Xen does this > >> too. Briefly looking at xen_flush_tlb_others() I don't see anything > >> special, do we know how serialization is achieved there? > > No idea on how Xen works, I always just hope it goes away :-) But lets > > ask some Xen folks. > > How is the software pagewalker relying on IF being clear safe at all (on > native, let alone under virtualisation)? Hardware has no architectural > requirement to keep entries in the TLB. No, but it _can_, therefore when we unhook pages we _must_ invalidate. It goes like: CPU0CPU1 unhook page cli traverse page tables TLB invalidate ---> sti TLB invalidate <-- complete free page So the CPU1 page-table walker gets an existence guarantee of the page-tables by clearing IF. > In the virtualisation case, at any point the vcpu can be scheduled on a > different pcpu even during a critical region like that, so the TLB > really can empty itself under your feet. Not the point. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On 11/08/17 11:56, Peter Zijlstra wrote: > On Fri, Aug 11, 2017 at 11:23:10AM +0200, Vitaly Kuznetsov wrote: >> Peter Zijlstrawrites: >> >>> On Thu, Aug 10, 2017 at 07:08:22PM +, Jork Loeser wrote: >>> >> Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote >> TLB flush >> Hold on.. if we don't IPI for TLB invalidation. What serializes our >> software page table walkers like fast_gup() ? > Hypervisor may implement this functionality via an IPI. > > K. Y HvFlushVirtualAddressList() states: This call guarantees that by the time control returns back to the caller, the observable effects of all flushes on the specified virtual processors have occurred. HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as adding sparse target VP lists. Is this enough of a guarantee, or do you see other races? >>> That's nowhere near enough. We need the remote CPU to have completed any >>> guest IF section that was in progress at the time of the call. >>> >>> So if a host IPI can interrupt a guest while the guest has IF cleared, >>> and we then process the host IPI -- clear the TLBs -- before resuming the >>> guest, which still has IF cleared, we've got a problem. >>> >>> Because at that point, our software page-table walker, that relies on IF >>> being clear to guarantee the page-tables exist, because it holds off the >>> TLB invalidate and thereby the freeing of the pages, gets its pages >>> ripped out from under it. >> Oh, I see your concern. Hyper-V, however, is not the first x86 >> hypervisor trying to avoid IPIs on remote TLB flush, Xen does this >> too. Briefly looking at xen_flush_tlb_others() I don't see anything >> special, do we know how serialization is achieved there? > No idea on how Xen works, I always just hope it goes away :-) But lets > ask some Xen folks. How is the software pagewalker relying on IF being clear safe at all (on native, let alone under virtualisation)? Hardware has no architectural requirement to keep entries in the TLB. In the virtualisation case, at any point the vcpu can be scheduled on a different pcpu even during a critical region like that, so the TLB really can empty itself under your feet. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush
On Fri, Aug 11, 2017 at 11:23:10AM +0200, Vitaly Kuznetsov wrote: > Peter Zijlstrawrites: > > > On Thu, Aug 10, 2017 at 07:08:22PM +, Jork Loeser wrote: > > > >> > > Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote > >> > > TLB flush > >> > >> > > Hold on.. if we don't IPI for TLB invalidation. What serializes our > >> > > software page table walkers like fast_gup() ? > >> > > >> > Hypervisor may implement this functionality via an IPI. > >> > > >> > K. Y > >> > >> HvFlushVirtualAddressList() states: > >> This call guarantees that by the time control returns back to the > >> caller, the observable effects of all flushes on the specified virtual > >> processors have occurred. > >> > >> HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as > >> adding sparse target VP lists. > >> > >> Is this enough of a guarantee, or do you see other races? > > > > That's nowhere near enough. We need the remote CPU to have completed any > > guest IF section that was in progress at the time of the call. > > > > So if a host IPI can interrupt a guest while the guest has IF cleared, > > and we then process the host IPI -- clear the TLBs -- before resuming the > > guest, which still has IF cleared, we've got a problem. > > > > Because at that point, our software page-table walker, that relies on IF > > being clear to guarantee the page-tables exist, because it holds off the > > TLB invalidate and thereby the freeing of the pages, gets its pages > > ripped out from under it. > > Oh, I see your concern. Hyper-V, however, is not the first x86 > hypervisor trying to avoid IPIs on remote TLB flush, Xen does this > too. Briefly looking at xen_flush_tlb_others() I don't see anything > special, do we know how serialization is achieved there? No idea on how Xen works, I always just hope it goes away :-) But lets ask some Xen folks. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel