Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-11-20 Thread Xiao Guangrong

On Nov 21, 2013, at 3:47 AM, Marcelo Tosatti  wrote:

> On Wed, Nov 20, 2013 at 10:20:09PM +0800, Xiao Guangrong wrote:
>>> But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus 
>>> are
>>> executing? 
>> 
>> Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be 
>> generated
>> when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages 
>> the vcpus
>> should be stopped. 
>> 
>>> 
>>> With fast page fault:
>>> 
>>> if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
>>> /* The window in here... */
>>> mark_page_dirty(vcpu->kvm, gfn);
>>> 
>>> And the $SUBJECT set_spte reordering, the rule becomes
>>> 
>>> A call to GET_DIRTY_LOG guarantees to return correct information about 
>>> dirty pages before invocation of the previous GET_DIRTY_LOG call.
>>> 
>>> (see example 1: the next GET_DIRTY_LOG will return the dirty information
>>> there).
>>> 
>> 
>> It seems no.
>> 
>> The first GET_DIRTY_LOG can happen before fast-page-fault,
>> the second GET_DIRTY_LOG happens in the window between cmpxchg()
>> and mark_page_dirty(), for the second one, the information is still 
>> “incorrect”.
> 
> Its correct for the previous GET_DIRTY_LOG call.

Oh, yes.

> 
>>> The rule for sptes that is, because kvm_write_guest does not match the
>>> documentation at all.
>> 
>> You mean the case of “kvm_write_guest” is valid (I do not know why it is)?
>> Or anything else?
>> 
>>> 
>>> So before example 1 and this patch, the rule (well for sptes at least) was
>>> 
>>> "Given a memory slot, return a bitmap containing any pages dirtied
>>> since the last call to this ioctl.  Bit 0 is the first page in the
>>> memory slot.  Ensure the entire structure is cleared to avoid padding
>>> issues."
>>> 
>>> Can you explain why it is OK to relax this rule?
>> 
>> It’s because:
>> 1) it doesn’t break current use cases, i.e. Live migration and FB-flushing.
>> 2) the current code, like kvm_write_guest  has already broken the 
>> documentation
>>   (the guest page has been written but missed in the dirty bitmap).
>> 3) it’s needless to implement a exact get-dirty-pages since the dirty pages 
>> can
>>   no be exactly got except stopping vcpus. 
>> 
>> So i think we'd document this case instead. No?
> 
> Lets figure out the requirements, then. I don't understand why
> FB-flushing is safe (think kvm-autotest: one pixel off the entire
> test fails).

I did not read FB-flushing code, i guess the reason why it can work is:
FB-flushing do periodicity get-dirty-page and flush it.  After guest writes
the page, the page will be flushed in the next GET_DIRTY_LOG.

> 
> Before fast page fault: Pages are either write protected or the
> corresponding dirty bitmap bit is set. Any write faults to dirty logged
> sptes while GET_DIRTY log is executing in the protected section are
> allowed to instantiate writeable spte after GET_DIRTY log is finished.
> 
> After fast page fault: Pages can be writeable and the dirty bitmap not
> set. Therefore data can be dirty before GET_DIRTY executes and still
> fail to be returned in the bitmap.

It’s right. The current GET_DIRTY fail to get the dirty page but the
next GET_DIRTY can get it properly since the current GET_DIRTY
need to flush all TLBs that waits for fast-page-fault finish.

I do not think it’s a big problem since single GET_DIRTY is useless as
i explained in the previous mail.

> 
> Since this patchset does not introduce change in behaviour (fast pf
> did), no reason to avoid merging this.

Yes, thank you, Marcelo! :)

> 
> BTW, since GET_DIRTY log does not require to report concurrent (to
> GET_DIRTY_LOG) write faults, it is not necessary to rewalk the spte
> list, is it?

You mean the “rewalk” we introduced in pte_list_walk_lockless() in this 
patchset?
I think this rewalk is needed because it’s caused by meet a unexpected nulls 
that
means we’re walking on the unexpected rmap. If we do not do this, some writable
sptes will be missed.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-11-20 Thread Marcelo Tosatti
On Wed, Nov 20, 2013 at 10:20:09PM +0800, Xiao Guangrong wrote:
> > But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus 
> > are
> > executing? 
> 
> Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be 
> generated
> when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages 
> the vcpus
> should be stopped. 
> 
> > 
> > With fast page fault:
> > 
> >  if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
> >  /* The window in here... */
> >  mark_page_dirty(vcpu->kvm, gfn);
> > 
> > And the $SUBJECT set_spte reordering, the rule becomes
> > 
> > A call to GET_DIRTY_LOG guarantees to return correct information about 
> > dirty pages before invocation of the previous GET_DIRTY_LOG call.
> > 
> > (see example 1: the next GET_DIRTY_LOG will return the dirty information
> > there).
> > 
> 
> It seems no.
> 
> The first GET_DIRTY_LOG can happen before fast-page-fault,
> the second GET_DIRTY_LOG happens in the window between cmpxchg()
> and mark_page_dirty(), for the second one, the information is still 
> “incorrect”.

Its correct for the previous GET_DIRTY_LOG call.

> > The rule for sptes that is, because kvm_write_guest does not match the
> > documentation at all.
> 
> You mean the case of “kvm_write_guest” is valid (I do not know why it is)?
> Or anything else?
> 
> > 
> > So before example 1 and this patch, the rule (well for sptes at least) was
> > 
> > "Given a memory slot, return a bitmap containing any pages dirtied
> > since the last call to this ioctl.  Bit 0 is the first page in the
> > memory slot.  Ensure the entire structure is cleared to avoid padding
> > issues."
> > 
> > Can you explain why it is OK to relax this rule?
> 
> It’s because:
> 1) it doesn’t break current use cases, i.e. Live migration and FB-flushing.
> 2) the current code, like kvm_write_guest  has already broken the 
> documentation
>(the guest page has been written but missed in the dirty bitmap).
> 3) it’s needless to implement a exact get-dirty-pages since the dirty pages 
> can
>no be exactly got except stopping vcpus. 
> 
> So i think we'd document this case instead. No?

Lets figure out the requirements, then. I don't understand why
FB-flushing is safe (think kvm-autotest: one pixel off the entire
test fails).

Before fast page fault: Pages are either write protected or the
corresponding dirty bitmap bit is set. Any write faults to dirty logged
sptes while GET_DIRTY log is executing in the protected section are
allowed to instantiate writeable spte after GET_DIRTY log is finished.

After fast page fault: Pages can be writeable and the dirty bitmap not
set. Therefore data can be dirty before GET_DIRTY executes and still
fail to be returned in the bitmap.

Since this patchset does not introduce change in behaviour (fast pf
did), no reason to avoid merging this.

BTW, since GET_DIRTY log does not require to report concurrent (to
GET_DIRTY_LOG) write faults, it is not necessary to rewalk the spte
list, is it?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-11-20 Thread Xiao Guangrong

On Nov 20, 2013, at 8:29 AM, Marcelo Tosatti  wrote:

> On Wed, Aug 07, 2013 at 12:06:49PM +0800, Xiao Guangrong wrote:
>> On 08/07/2013 09:48 AM, Marcelo Tosatti wrote:
>>> On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
 Make sure we can see the writable spte before the dirt bitmap is visible
 
 We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte 
 based
 on the dirty bitmap, we should ensure the writable spte can be found in 
 rmap
 before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap 
 and
 failed to write-protect the page
 
 Signed-off-by: Xiao Guangrong 
 ---
 arch/x86/kvm/mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
>>> 
>>> Can you explain why this is safe, with regard to the rule 
>>> at edde99ce05290e50 ?
>> 
>> BTW, this log fixed this case:
>> 
>>VCPU 0KVM migration control
>> 
>>  write-protects all pages
>> #Pf happen then the page
>> become writable, set dirty
>> bit on the bitmap
>> 
>> swap the bitmap, current bitmap is empty
>> 
>> write the page (no dirty log)
>> 
>> stop the guest and push
>> the remaining dirty pages
>> Stopped
>> See current bitmap is empty that means
>> no page is dirty.
>>> 
>>> "The rule is that all pages are either dirty in the current bitmap,
>>> or write-protected, which is violated here."
>> 
>> Actually, this rule is not complete true, there's the 3th case:
>> the window between write guest page and set dirty bitmap is valid.
>> In that window, page is write-free and not dirty logged.
>> 
>> This case is based on the fact that at the final step of live migration,
>> kvm should stop the guest and push the remaining dirty pages to the
>> destination.
>> 
>> They're some examples in the current code:
>> example 1, in fast_pf_fix_direct_spte():
>>  if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
>>  /* The window in here... */
>>  mark_page_dirty(vcpu->kvm, gfn);
>> 
>> example 2, in kvm_write_guest_page():
>>  r = __copy_to_user((void __user *)addr + offset, data, len);
>>  if (r)
>>  return -EFAULT;
>>  /*
>>   * The window is here, the page is dirty but not logged in
>>* The bitmap.
>>   */
>>  mark_page_dirty(kvm, gfn);
>>  return 0;
> 

Hi Marcelo,

> Why is this valid ? That is, the obviously correct rule is
> 
> "that all pages are either dirty in the current bitmap,
> or write-protected, which is violated here."
> 
> With the window above, GET_DIRTY_LOG can be called 100 times while the 
> page is dirty, but the corresponding bit not set in the dirty bitmap.
> 
> It violates the documentation:
> 
> /* for KVM_GET_DIRTY_LOG */
> struct kvm_dirty_log {
>   __u32 slot;
>   __u32 padding;
>   union {
>   void __user *dirty_bitmap; /* one bit per page */
>   __u64 padding;
>   };
> };
> 
> Given a memory slot, return a bitmap containing any pages dirtied
> since the last call to this ioctl.  Bit 0 is the first page in the
> memory slot.  Ensure the entire structure is cleared to avoid padding
> issues.
> 
> The point about migration, is that GET_DIRTY_LOG is strictly correct
> because it stops vcpus.
> 
> But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus are
> executing? 

Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be 
generated
when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages 
the vcpus
should be stopped. 

> 
> With fast page fault:
> 
>  if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
>  /* The window in here... */
>  mark_page_dirty(vcpu->kvm, gfn);
> 
> And the $SUBJECT set_spte reordering, the rule becomes
> 
> A call to GET_DIRTY_LOG guarantees to return correct information about 
> dirty pages before invocation of the previous GET_DIRTY_LOG call.
> 
> (see example 1: the next GET_DIRTY_LOG will return the dirty information
> there).
> 

It seems no.

The first GET_DIRTY_LOG can happen before fast-page-fault,
the second GET_DIRTY_LOG happens in the window between cmpxchg()
and mark_page_dirty(), for the second one, the information is still “incorrect”.

> The rule for sptes that is, because kvm_write_guest does not match the
> documentation at all.

You mean the case of “kvm_write_guest” is valid (I do not know why it is)?
Or anything else?

> 
> So before example 1 and this patch, the rule (well for sptes at least) was
> 
> "Given a memory slot, return a bitmap containing any pages dirtied
> since the last call to this ioctl.  Bit 0 is the first page in the
> memory slot.  Ensure the entire structure is cleared to avoid padding
> issues."
> 
> Can you explain why it is OK to relax this rule?

It’s because:
1) it 

Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-11-20 Thread Xiao Guangrong

On Nov 20, 2013, at 8:29 AM, Marcelo Tosatti mtosa...@redhat.com wrote:

 On Wed, Aug 07, 2013 at 12:06:49PM +0800, Xiao Guangrong wrote:
 On 08/07/2013 09:48 AM, Marcelo Tosatti wrote:
 On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
 Make sure we can see the writable spte before the dirt bitmap is visible
 
 We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte 
 based
 on the dirty bitmap, we should ensure the writable spte can be found in 
 rmap
 before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap 
 and
 failed to write-protect the page
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
 arch/x86/kvm/mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
 
 Can you explain why this is safe, with regard to the rule 
 at edde99ce05290e50 ?
 
 BTW, this log fixed this case:
 
VCPU 0KVM migration control
 
  write-protects all pages
 #Pf happen then the page
 become writable, set dirty
 bit on the bitmap
 
 swap the bitmap, current bitmap is empty
 
 write the page (no dirty log)
 
 stop the guest and push
 the remaining dirty pages
 Stopped
 See current bitmap is empty that means
 no page is dirty.
 
 The rule is that all pages are either dirty in the current bitmap,
 or write-protected, which is violated here.
 
 Actually, this rule is not complete true, there's the 3th case:
 the window between write guest page and set dirty bitmap is valid.
 In that window, page is write-free and not dirty logged.
 
 This case is based on the fact that at the final step of live migration,
 kvm should stop the guest and push the remaining dirty pages to the
 destination.
 
 They're some examples in the current code:
 example 1, in fast_pf_fix_direct_spte():
  if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
  /* The window in here... */
  mark_page_dirty(vcpu-kvm, gfn);
 
 example 2, in kvm_write_guest_page():
  r = __copy_to_user((void __user *)addr + offset, data, len);
  if (r)
  return -EFAULT;
  /*
   * The window is here, the page is dirty but not logged in
* The bitmap.
   */
  mark_page_dirty(kvm, gfn);
  return 0;
 

Hi Marcelo,

 Why is this valid ? That is, the obviously correct rule is
 
 that all pages are either dirty in the current bitmap,
 or write-protected, which is violated here.
 
 With the window above, GET_DIRTY_LOG can be called 100 times while the 
 page is dirty, but the corresponding bit not set in the dirty bitmap.
 
 It violates the documentation:
 
 /* for KVM_GET_DIRTY_LOG */
 struct kvm_dirty_log {
   __u32 slot;
   __u32 padding;
   union {
   void __user *dirty_bitmap; /* one bit per page */
   __u64 padding;
   };
 };
 
 Given a memory slot, return a bitmap containing any pages dirtied
 since the last call to this ioctl.  Bit 0 is the first page in the
 memory slot.  Ensure the entire structure is cleared to avoid padding
 issues.
 
 The point about migration, is that GET_DIRTY_LOG is strictly correct
 because it stops vcpus.
 
 But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus are
 executing? 

Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be 
generated
when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages 
the vcpus
should be stopped. 

 
 With fast page fault:
 
  if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
  /* The window in here... */
  mark_page_dirty(vcpu-kvm, gfn);
 
 And the $SUBJECT set_spte reordering, the rule becomes
 
 A call to GET_DIRTY_LOG guarantees to return correct information about 
 dirty pages before invocation of the previous GET_DIRTY_LOG call.
 
 (see example 1: the next GET_DIRTY_LOG will return the dirty information
 there).
 

It seems no.

The first GET_DIRTY_LOG can happen before fast-page-fault,
the second GET_DIRTY_LOG happens in the window between cmpxchg()
and mark_page_dirty(), for the second one, the information is still “incorrect”.

 The rule for sptes that is, because kvm_write_guest does not match the
 documentation at all.

You mean the case of “kvm_write_guest” is valid (I do not know why it is)?
Or anything else?

 
 So before example 1 and this patch, the rule (well for sptes at least) was
 
 Given a memory slot, return a bitmap containing any pages dirtied
 since the last call to this ioctl.  Bit 0 is the first page in the
 memory slot.  Ensure the entire structure is cleared to avoid padding
 issues.
 
 Can you explain why it is OK to relax this rule?

It’s because:
1) it doesn’t break current use cases, i.e. Live migration and FB-flushing.
2) the current code, like kvm_write_guest  has already broken the documentation
   (the guest page has been 

Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-11-20 Thread Marcelo Tosatti
On Wed, Nov 20, 2013 at 10:20:09PM +0800, Xiao Guangrong wrote:
  But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus 
  are
  executing? 
 
 Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be 
 generated
 when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages 
 the vcpus
 should be stopped. 
 
  
  With fast page fault:
  
   if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
   /* The window in here... */
   mark_page_dirty(vcpu-kvm, gfn);
  
  And the $SUBJECT set_spte reordering, the rule becomes
  
  A call to GET_DIRTY_LOG guarantees to return correct information about 
  dirty pages before invocation of the previous GET_DIRTY_LOG call.
  
  (see example 1: the next GET_DIRTY_LOG will return the dirty information
  there).
  
 
 It seems no.
 
 The first GET_DIRTY_LOG can happen before fast-page-fault,
 the second GET_DIRTY_LOG happens in the window between cmpxchg()
 and mark_page_dirty(), for the second one, the information is still 
 “incorrect”.

Its correct for the previous GET_DIRTY_LOG call.

  The rule for sptes that is, because kvm_write_guest does not match the
  documentation at all.
 
 You mean the case of “kvm_write_guest” is valid (I do not know why it is)?
 Or anything else?
 
  
  So before example 1 and this patch, the rule (well for sptes at least) was
  
  Given a memory slot, return a bitmap containing any pages dirtied
  since the last call to this ioctl.  Bit 0 is the first page in the
  memory slot.  Ensure the entire structure is cleared to avoid padding
  issues.
  
  Can you explain why it is OK to relax this rule?
 
 It’s because:
 1) it doesn’t break current use cases, i.e. Live migration and FB-flushing.
 2) the current code, like kvm_write_guest  has already broken the 
 documentation
(the guest page has been written but missed in the dirty bitmap).
 3) it’s needless to implement a exact get-dirty-pages since the dirty pages 
 can
no be exactly got except stopping vcpus. 
 
 So i think we'd document this case instead. No?

Lets figure out the requirements, then. I don't understand why
FB-flushing is safe (think kvm-autotest: one pixel off the entire
test fails).

Before fast page fault: Pages are either write protected or the
corresponding dirty bitmap bit is set. Any write faults to dirty logged
sptes while GET_DIRTY log is executing in the protected section are
allowed to instantiate writeable spte after GET_DIRTY log is finished.

After fast page fault: Pages can be writeable and the dirty bitmap not
set. Therefore data can be dirty before GET_DIRTY executes and still
fail to be returned in the bitmap.

Since this patchset does not introduce change in behaviour (fast pf
did), no reason to avoid merging this.

BTW, since GET_DIRTY log does not require to report concurrent (to
GET_DIRTY_LOG) write faults, it is not necessary to rewalk the spte
list, is it?



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-11-20 Thread Xiao Guangrong

On Nov 21, 2013, at 3:47 AM, Marcelo Tosatti mtosa...@redhat.com wrote:

 On Wed, Nov 20, 2013 at 10:20:09PM +0800, Xiao Guangrong wrote:
 But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus 
 are
 executing? 
 
 Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be 
 generated
 when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages 
 the vcpus
 should be stopped. 
 
 
 With fast page fault:
 
 if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
 /* The window in here... */
 mark_page_dirty(vcpu-kvm, gfn);
 
 And the $SUBJECT set_spte reordering, the rule becomes
 
 A call to GET_DIRTY_LOG guarantees to return correct information about 
 dirty pages before invocation of the previous GET_DIRTY_LOG call.
 
 (see example 1: the next GET_DIRTY_LOG will return the dirty information
 there).
 
 
 It seems no.
 
 The first GET_DIRTY_LOG can happen before fast-page-fault,
 the second GET_DIRTY_LOG happens in the window between cmpxchg()
 and mark_page_dirty(), for the second one, the information is still 
 “incorrect”.
 
 Its correct for the previous GET_DIRTY_LOG call.

Oh, yes.

 
 The rule for sptes that is, because kvm_write_guest does not match the
 documentation at all.
 
 You mean the case of “kvm_write_guest” is valid (I do not know why it is)?
 Or anything else?
 
 
 So before example 1 and this patch, the rule (well for sptes at least) was
 
 Given a memory slot, return a bitmap containing any pages dirtied
 since the last call to this ioctl.  Bit 0 is the first page in the
 memory slot.  Ensure the entire structure is cleared to avoid padding
 issues.
 
 Can you explain why it is OK to relax this rule?
 
 It’s because:
 1) it doesn’t break current use cases, i.e. Live migration and FB-flushing.
 2) the current code, like kvm_write_guest  has already broken the 
 documentation
   (the guest page has been written but missed in the dirty bitmap).
 3) it’s needless to implement a exact get-dirty-pages since the dirty pages 
 can
   no be exactly got except stopping vcpus. 
 
 So i think we'd document this case instead. No?
 
 Lets figure out the requirements, then. I don't understand why
 FB-flushing is safe (think kvm-autotest: one pixel off the entire
 test fails).

I did not read FB-flushing code, i guess the reason why it can work is:
FB-flushing do periodicity get-dirty-page and flush it.  After guest writes
the page, the page will be flushed in the next GET_DIRTY_LOG.

 
 Before fast page fault: Pages are either write protected or the
 corresponding dirty bitmap bit is set. Any write faults to dirty logged
 sptes while GET_DIRTY log is executing in the protected section are
 allowed to instantiate writeable spte after GET_DIRTY log is finished.
 
 After fast page fault: Pages can be writeable and the dirty bitmap not
 set. Therefore data can be dirty before GET_DIRTY executes and still
 fail to be returned in the bitmap.

It’s right. The current GET_DIRTY fail to get the dirty page but the
next GET_DIRTY can get it properly since the current GET_DIRTY
need to flush all TLBs that waits for fast-page-fault finish.

I do not think it’s a big problem since single GET_DIRTY is useless as
i explained in the previous mail.

 
 Since this patchset does not introduce change in behaviour (fast pf
 did), no reason to avoid merging this.

Yes, thank you, Marcelo! :)

 
 BTW, since GET_DIRTY log does not require to report concurrent (to
 GET_DIRTY_LOG) write faults, it is not necessary to rewalk the spte
 list, is it?

You mean the “rewalk” we introduced in pte_list_walk_lockless() in this 
patchset?
I think this rewalk is needed because it’s caused by meet a unexpected nulls 
that
means we’re walking on the unexpected rmap. If we do not do this, some writable
sptes will be missed.




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-11-19 Thread Marcelo Tosatti
On Tue, Nov 19, 2013 at 10:29:20PM -0200, Marcelo Tosatti wrote:
> A call to GET_DIRTY_LOG guarantees to return correct information about 
> dirty pages before invocation of the previous GET_DIRTY_LOG call.

> Can you explain why it is OK to relax this rule?

That is, this might be OK, but better understand why it is.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-11-19 Thread Marcelo Tosatti
On Wed, Aug 07, 2013 at 12:06:49PM +0800, Xiao Guangrong wrote:
> On 08/07/2013 09:48 AM, Marcelo Tosatti wrote:
> > On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
> >> Make sure we can see the writable spte before the dirt bitmap is visible
> >>
> >> We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte 
> >> based
> >> on the dirty bitmap, we should ensure the writable spte can be found in 
> >> rmap
> >> before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap 
> >> and
> >> failed to write-protect the page
> >>
> >> Signed-off-by: Xiao Guangrong 
> >> ---
> >>  arch/x86/kvm/mmu.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > Can you explain why this is safe, with regard to the rule 
> > at edde99ce05290e50 ?
> 
> BTW, this log fixed this case:
> 
>  VCPU 0KVM migration control
> 
>write-protects all pages
> #Pf happen then the page
> become writable, set dirty
> bit on the bitmap
> 
>   swap the bitmap, current bitmap is empty
> 
> write the page (no dirty log)
> 
>   stop the guest and push
>   the remaining dirty pages
> Stopped
>   See current bitmap is empty that means
>   no page is dirty.
> > 
> > "The rule is that all pages are either dirty in the current bitmap,
> > or write-protected, which is violated here."
> 
> Actually, this rule is not complete true, there's the 3th case:
> the window between write guest page and set dirty bitmap is valid.
> In that window, page is write-free and not dirty logged.
> 
> This case is based on the fact that at the final step of live migration,
> kvm should stop the guest and push the remaining dirty pages to the
> destination.
> 
> They're some examples in the current code:
> example 1, in fast_pf_fix_direct_spte():
>   if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
>   /* The window in here... */
>   mark_page_dirty(vcpu->kvm, gfn);
> 
> example 2, in kvm_write_guest_page():
>   r = __copy_to_user((void __user *)addr + offset, data, len);
>   if (r)
>   return -EFAULT;
>   /*
>* The window is here, the page is dirty but not logged in
>  * The bitmap.
>*/
>   mark_page_dirty(kvm, gfn);
>   return 0;

Why is this valid ? That is, the obviously correct rule is

"that all pages are either dirty in the current bitmap,
or write-protected, which is violated here."

With the window above, GET_DIRTY_LOG can be called 100 times while the 
page is dirty, but the corresponding bit not set in the dirty bitmap.

It violates the documentation:

/* for KVM_GET_DIRTY_LOG */
struct kvm_dirty_log {
__u32 slot;
__u32 padding;
union {
void __user *dirty_bitmap; /* one bit per page */
__u64 padding;
};
};

Given a memory slot, return a bitmap containing any pages dirtied
since the last call to this ioctl.  Bit 0 is the first page in the
memory slot.  Ensure the entire structure is cleared to avoid padding
issues.

The point about migration, is that GET_DIRTY_LOG is strictly correct
because it stops vcpus.

But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus are
executing? 

With fast page fault:

   if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
   /* The window in here... */
   mark_page_dirty(vcpu->kvm, gfn);
 
And the $SUBJECT set_spte reordering, the rule becomes

A call to GET_DIRTY_LOG guarantees to return correct information about 
dirty pages before invocation of the previous GET_DIRTY_LOG call.

(see example 1: the next GET_DIRTY_LOG will return the dirty information
there).

The rule for sptes that is, because kvm_write_guest does not match the
documentation at all.

So before example 1 and this patch, the rule (well for sptes at least) was

"Given a memory slot, return a bitmap containing any pages dirtied
since the last call to this ioctl.  Bit 0 is the first page in the
memory slot.  Ensure the entire structure is cleared to avoid padding
issues."

Can you explain why it is OK to relax this rule?

Thanks

(reviewing the nulls desc patch...).


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-11-19 Thread Marcelo Tosatti
On Wed, Aug 07, 2013 at 12:06:49PM +0800, Xiao Guangrong wrote:
 On 08/07/2013 09:48 AM, Marcelo Tosatti wrote:
  On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
  Make sure we can see the writable spte before the dirt bitmap is visible
 
  We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte 
  based
  on the dirty bitmap, we should ensure the writable spte can be found in 
  rmap
  before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap 
  and
  failed to write-protect the page
 
  Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
  ---
   arch/x86/kvm/mmu.c | 6 +++---
   1 file changed, 3 insertions(+), 3 deletions(-)
  
  Can you explain why this is safe, with regard to the rule 
  at edde99ce05290e50 ?
 
 BTW, this log fixed this case:
 
  VCPU 0KVM migration control
 
write-protects all pages
 #Pf happen then the page
 become writable, set dirty
 bit on the bitmap
 
   swap the bitmap, current bitmap is empty
 
 write the page (no dirty log)
 
   stop the guest and push
   the remaining dirty pages
 Stopped
   See current bitmap is empty that means
   no page is dirty.
  
  The rule is that all pages are either dirty in the current bitmap,
  or write-protected, which is violated here.
 
 Actually, this rule is not complete true, there's the 3th case:
 the window between write guest page and set dirty bitmap is valid.
 In that window, page is write-free and not dirty logged.
 
 This case is based on the fact that at the final step of live migration,
 kvm should stop the guest and push the remaining dirty pages to the
 destination.
 
 They're some examples in the current code:
 example 1, in fast_pf_fix_direct_spte():
   if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
   /* The window in here... */
   mark_page_dirty(vcpu-kvm, gfn);
 
 example 2, in kvm_write_guest_page():
   r = __copy_to_user((void __user *)addr + offset, data, len);
   if (r)
   return -EFAULT;
   /*
* The window is here, the page is dirty but not logged in
  * The bitmap.
*/
   mark_page_dirty(kvm, gfn);
   return 0;

Why is this valid ? That is, the obviously correct rule is

that all pages are either dirty in the current bitmap,
or write-protected, which is violated here.

With the window above, GET_DIRTY_LOG can be called 100 times while the 
page is dirty, but the corresponding bit not set in the dirty bitmap.

It violates the documentation:

/* for KVM_GET_DIRTY_LOG */
struct kvm_dirty_log {
__u32 slot;
__u32 padding;
union {
void __user *dirty_bitmap; /* one bit per page */
__u64 padding;
};
};

Given a memory slot, return a bitmap containing any pages dirtied
since the last call to this ioctl.  Bit 0 is the first page in the
memory slot.  Ensure the entire structure is cleared to avoid padding
issues.

The point about migration, is that GET_DIRTY_LOG is strictly correct
because it stops vcpus.

But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus are
executing? 

With fast page fault:

   if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
   /* The window in here... */
   mark_page_dirty(vcpu-kvm, gfn);
 
And the $SUBJECT set_spte reordering, the rule becomes

A call to GET_DIRTY_LOG guarantees to return correct information about 
dirty pages before invocation of the previous GET_DIRTY_LOG call.

(see example 1: the next GET_DIRTY_LOG will return the dirty information
there).

The rule for sptes that is, because kvm_write_guest does not match the
documentation at all.

So before example 1 and this patch, the rule (well for sptes at least) was

Given a memory slot, return a bitmap containing any pages dirtied
since the last call to this ioctl.  Bit 0 is the first page in the
memory slot.  Ensure the entire structure is cleared to avoid padding
issues.

Can you explain why it is OK to relax this rule?

Thanks

(reviewing the nulls desc patch...).


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-11-19 Thread Marcelo Tosatti
On Tue, Nov 19, 2013 at 10:29:20PM -0200, Marcelo Tosatti wrote:
 A call to GET_DIRTY_LOG guarantees to return correct information about 
 dirty pages before invocation of the previous GET_DIRTY_LOG call.

 Can you explain why it is OK to relax this rule?

That is, this might be OK, but better understand why it is.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-08-08 Thread Xiao Guangrong
[ Post again after adjusting the format since the mail list rejected to deliver 
my previous one. ]

On Aug 8, 2013, at 11:06 PM, Marcelo Tosatti  wrote:

> On Wed, Aug 07, 2013 at 12:06:49PM +0800, Xiao Guangrong wrote:
>> On 08/07/2013 09:48 AM, Marcelo Tosatti wrote:
>>> On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
 Make sure we can see the writable spte before the dirt bitmap is visible
 
 We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte 
 based
 on the dirty bitmap, we should ensure the writable spte can be found in 
 rmap
 before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap 
 and
 failed to write-protect the page
 
 Signed-off-by: Xiao Guangrong 
 ---
 arch/x86/kvm/mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
>>> 
>>> Can you explain why this is safe, with regard to the rule 
>>> at edde99ce05290e50 ?
>> 
>> BTW, this log fixed this case:
>> 
>> VCPU 0KVM migration control
>> 
>>   write-protects all pages
>> #Pf happen then the page
>> become writable, set dirty
>> bit on the bitmap
>> 
>>  swap the bitmap, current bitmap is empty
>> 
>> write the page (no dirty log)
>> 
>>  stop the guest and push
>>  the remaining dirty pages
>> Stopped
>>  See current bitmap is empty that means
>>  no page is dirty.
>>> 
>>> "The rule is that all pages are either dirty in the current bitmap,
>>> or write-protected, which is violated here."
>> 
>> Actually, this rule is not complete true, there's the 3th case:
>> the window between write guest page and set dirty bitmap is valid.
>> In that window, page is write-free and not dirty logged.
>> 
>> This case is based on the fact that at the final step of live migration,
>> kvm should stop the guest and push the remaining dirty pages to the
>> destination.
>> 
>> They're some examples in the current code:
>> example 1, in fast_pf_fix_direct_spte():
>>  if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
>>  /* The window in here... */
>>  mark_page_dirty(vcpu->kvm, gfn);
>> 
>> example 2, in kvm_write_guest_page():
>>  r = __copy_to_user((void __user *)addr + offset, data, len);
>>  if (r)
>>  return -EFAULT;
>>  /*
>>   * The window is here, the page is dirty but not logged in
>> * The bitmap.
>>   */
>>  mark_page_dirty(kvm, gfn);
>>  return 0;
>> 
>>> 
>>> Overall, please document what is the required order of operations for
>>> both set_spte and get_dirty_log and why this order is safe (right on top
>>> of the code).
>> 
>> Okay.
>> 
>> The order we required here is, we should 1) set spte to writable __before__
>> set the dirty bitmap and 2) add the spte into rmap __before__ set the dirty
>> bitmap.
>> 
>> The point 1) is the same as fast_pf_fix_direct_spte(), which i explained 
>> above.
>> The point 1) and 2) can ensure we can find the spte on rmap and see the spte 
>> is
>> writable when we do lockless write-protection, otherwise these cases will 
>> happen
>> 
>> VCPU 0   kvm ioctl doing get-dirty-pages
>> 
>> mark_page_dirty(gfn) which
>> set the gfn on the dirty maps
>>  mask = xchg(dirty_bitmap, 0)
>> 
>>  walk all gfns which set on "mask" and
>>  locklessly write-protect the gfn,
>>  then walk rmap, see no spte on that rmap
>>  
>> 
>> add the spte into rmap
>> 
>> !! Then the page can be freely wrote but not recorded in the dirty 
>> bitmap.
>> 
>> Or:
>> 
>> VCPU 0   kvm ioctl doing get-dirty-pages
>> 
>> mark_page_dirty(gfn) which
>> set the gfn on the dirty maps
>> 
>> add spte into rmap
>>  mask = xchg(dirty_bitmap, 0)
>> 
>>  walk all gfns which set on "mask" and
>>  locklessly write-protect the gfn,
>>  then walk rmap, see spte is on the ramp
>>  but it is readonly or nonpresent.
>>  
>> Mark spte writable
>> 
>> !! Then the page can be freely wrote but not recorded in the dirty 
>> bitmap.
>> 
>> Hopefully, i have clarified it, if you have any questions, please let me 
>> know.
> 
> Yes, partially. Please on top of the explanation above, have something along
> the lines of
> 
> "The flush IPI assumes that a thread switch happens in this order"
> comment at arch/x86/mm/tlb.c
> 
> "With get_user_pages_fast, we walk down the pagetables without taking"
> comment at arch/x86/mm/gup.c

Marcelo, thanks for your suggestion, i will improve both the changelog and
the 

Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-08-08 Thread Marcelo Tosatti
On Wed, Aug 07, 2013 at 12:06:49PM +0800, Xiao Guangrong wrote:
> On 08/07/2013 09:48 AM, Marcelo Tosatti wrote:
> > On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
> >> Make sure we can see the writable spte before the dirt bitmap is visible
> >>
> >> We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte 
> >> based
> >> on the dirty bitmap, we should ensure the writable spte can be found in 
> >> rmap
> >> before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap 
> >> and
> >> failed to write-protect the page
> >>
> >> Signed-off-by: Xiao Guangrong 
> >> ---
> >>  arch/x86/kvm/mmu.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > Can you explain why this is safe, with regard to the rule 
> > at edde99ce05290e50 ?
> 
> BTW, this log fixed this case:
> 
>  VCPU 0KVM migration control
> 
>write-protects all pages
> #Pf happen then the page
> become writable, set dirty
> bit on the bitmap
> 
>   swap the bitmap, current bitmap is empty
> 
> write the page (no dirty log)
> 
>   stop the guest and push
>   the remaining dirty pages
> Stopped
>   See current bitmap is empty that means
>   no page is dirty.
> > 
> > "The rule is that all pages are either dirty in the current bitmap,
> > or write-protected, which is violated here."
> 
> Actually, this rule is not complete true, there's the 3th case:
> the window between write guest page and set dirty bitmap is valid.
> In that window, page is write-free and not dirty logged.
> 
> This case is based on the fact that at the final step of live migration,
> kvm should stop the guest and push the remaining dirty pages to the
> destination.
> 
> They're some examples in the current code:
> example 1, in fast_pf_fix_direct_spte():
>   if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
>   /* The window in here... */
>   mark_page_dirty(vcpu->kvm, gfn);
> 
> example 2, in kvm_write_guest_page():
>   r = __copy_to_user((void __user *)addr + offset, data, len);
>   if (r)
>   return -EFAULT;
>   /*
>* The window is here, the page is dirty but not logged in
>  * The bitmap.
>*/
>   mark_page_dirty(kvm, gfn);
>   return 0;
> 
> > 
> > Overall, please document what is the required order of operations for
> > both set_spte and get_dirty_log and why this order is safe (right on top
> > of the code).
> 
> Okay.
> 
> The order we required here is, we should 1) set spte to writable __before__
> set the dirty bitmap and 2) add the spte into rmap __before__ set the dirty
> bitmap.
> 
> The point 1) is the same as fast_pf_fix_direct_spte(), which i explained 
> above.
> The point 1) and 2) can ensure we can find the spte on rmap and see the spte 
> is
> writable when we do lockless write-protection, otherwise these cases will 
> happen
> 
> VCPU 0kvm ioctl doing get-dirty-pages
> 
> mark_page_dirty(gfn) which
> set the gfn on the dirty maps
>   mask = xchg(dirty_bitmap, 0)
> 
>   walk all gfns which set on "mask" and
>   locklessly write-protect the gfn,
>   then walk rmap, see no spte on that rmap
>   
> 
> add the spte into rmap
> 
> !! Then the page can be freely wrote but not recorded in the dirty bitmap.
> 
> Or:
> 
> VCPU 0kvm ioctl doing get-dirty-pages
> 
> mark_page_dirty(gfn) which
> set the gfn on the dirty maps
> 
> add spte into rmap
>   mask = xchg(dirty_bitmap, 0)
> 
>   walk all gfns which set on "mask" and
>   locklessly write-protect the gfn,
>   then walk rmap, see spte is on the ramp
>   but it is readonly or nonpresent.
>   
> Mark spte writable
> 
> !! Then the page can be freely wrote but not recorded in the dirty bitmap.
> 
> Hopefully, i have clarified it, if you have any questions, please let me know.

Yes, partially. Please on top of the explanation above, have something along
the lines of

"The flush IPI assumes that a thread switch happens in this order"
comment at arch/x86/mm/tlb.c

"With get_user_pages_fast, we walk down the pagetables without taking"
comment at arch/x86/mm/gup.c


What about the relation between read-only spte and respective TLB flush?
That is:

vcpu0   vcpu1

lockless write protect
mark spte read-only
either some mmu-lockless path or not
write protect page:
   

Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-08-08 Thread Marcelo Tosatti
On Wed, Aug 07, 2013 at 12:06:49PM +0800, Xiao Guangrong wrote:
 On 08/07/2013 09:48 AM, Marcelo Tosatti wrote:
  On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
  Make sure we can see the writable spte before the dirt bitmap is visible
 
  We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte 
  based
  on the dirty bitmap, we should ensure the writable spte can be found in 
  rmap
  before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap 
  and
  failed to write-protect the page
 
  Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
  ---
   arch/x86/kvm/mmu.c | 6 +++---
   1 file changed, 3 insertions(+), 3 deletions(-)
  
  Can you explain why this is safe, with regard to the rule 
  at edde99ce05290e50 ?
 
 BTW, this log fixed this case:
 
  VCPU 0KVM migration control
 
write-protects all pages
 #Pf happen then the page
 become writable, set dirty
 bit on the bitmap
 
   swap the bitmap, current bitmap is empty
 
 write the page (no dirty log)
 
   stop the guest and push
   the remaining dirty pages
 Stopped
   See current bitmap is empty that means
   no page is dirty.
  
  The rule is that all pages are either dirty in the current bitmap,
  or write-protected, which is violated here.
 
 Actually, this rule is not complete true, there's the 3th case:
 the window between write guest page and set dirty bitmap is valid.
 In that window, page is write-free and not dirty logged.
 
 This case is based on the fact that at the final step of live migration,
 kvm should stop the guest and push the remaining dirty pages to the
 destination.
 
 They're some examples in the current code:
 example 1, in fast_pf_fix_direct_spte():
   if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
   /* The window in here... */
   mark_page_dirty(vcpu-kvm, gfn);
 
 example 2, in kvm_write_guest_page():
   r = __copy_to_user((void __user *)addr + offset, data, len);
   if (r)
   return -EFAULT;
   /*
* The window is here, the page is dirty but not logged in
  * The bitmap.
*/
   mark_page_dirty(kvm, gfn);
   return 0;
 
  
  Overall, please document what is the required order of operations for
  both set_spte and get_dirty_log and why this order is safe (right on top
  of the code).
 
 Okay.
 
 The order we required here is, we should 1) set spte to writable __before__
 set the dirty bitmap and 2) add the spte into rmap __before__ set the dirty
 bitmap.
 
 The point 1) is the same as fast_pf_fix_direct_spte(), which i explained 
 above.
 The point 1) and 2) can ensure we can find the spte on rmap and see the spte 
 is
 writable when we do lockless write-protection, otherwise these cases will 
 happen
 
 VCPU 0kvm ioctl doing get-dirty-pages
 
 mark_page_dirty(gfn) which
 set the gfn on the dirty maps
   mask = xchg(dirty_bitmap, 0)
 
   walk all gfns which set on mask and
   locklessly write-protect the gfn,
   then walk rmap, see no spte on that rmap
   
 
 add the spte into rmap
 
 !! Then the page can be freely wrote but not recorded in the dirty bitmap.
 
 Or:
 
 VCPU 0kvm ioctl doing get-dirty-pages
 
 mark_page_dirty(gfn) which
 set the gfn on the dirty maps
 
 add spte into rmap
   mask = xchg(dirty_bitmap, 0)
 
   walk all gfns which set on mask and
   locklessly write-protect the gfn,
   then walk rmap, see spte is on the ramp
   but it is readonly or nonpresent.
   
 Mark spte writable
 
 !! Then the page can be freely wrote but not recorded in the dirty bitmap.
 
 Hopefully, i have clarified it, if you have any questions, please let me know.

Yes, partially. Please on top of the explanation above, have something along
the lines of

The flush IPI assumes that a thread switch happens in this order
comment at arch/x86/mm/tlb.c

With get_user_pages_fast, we walk down the pagetables without taking
comment at arch/x86/mm/gup.c


What about the relation between read-only spte and respective TLB flush?
That is:

vcpu0   vcpu1

lockless write protect
mark spte read-only
either some mmu-lockless path or not
write protect page:
find read-only page
assumes tlb is flushed


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-08-08 Thread Xiao Guangrong
[ Post again after adjusting the format since the mail list rejected to deliver 
my previous one. ]

On Aug 8, 2013, at 11:06 PM, Marcelo Tosatti mtosa...@redhat.com wrote:

 On Wed, Aug 07, 2013 at 12:06:49PM +0800, Xiao Guangrong wrote:
 On 08/07/2013 09:48 AM, Marcelo Tosatti wrote:
 On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
 Make sure we can see the writable spte before the dirt bitmap is visible
 
 We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte 
 based
 on the dirty bitmap, we should ensure the writable spte can be found in 
 rmap
 before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap 
 and
 failed to write-protect the page
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
 arch/x86/kvm/mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
 
 Can you explain why this is safe, with regard to the rule 
 at edde99ce05290e50 ?
 
 BTW, this log fixed this case:
 
 VCPU 0KVM migration control
 
   write-protects all pages
 #Pf happen then the page
 become writable, set dirty
 bit on the bitmap
 
  swap the bitmap, current bitmap is empty
 
 write the page (no dirty log)
 
  stop the guest and push
  the remaining dirty pages
 Stopped
  See current bitmap is empty that means
  no page is dirty.
 
 The rule is that all pages are either dirty in the current bitmap,
 or write-protected, which is violated here.
 
 Actually, this rule is not complete true, there's the 3th case:
 the window between write guest page and set dirty bitmap is valid.
 In that window, page is write-free and not dirty logged.
 
 This case is based on the fact that at the final step of live migration,
 kvm should stop the guest and push the remaining dirty pages to the
 destination.
 
 They're some examples in the current code:
 example 1, in fast_pf_fix_direct_spte():
  if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
  /* The window in here... */
  mark_page_dirty(vcpu-kvm, gfn);
 
 example 2, in kvm_write_guest_page():
  r = __copy_to_user((void __user *)addr + offset, data, len);
  if (r)
  return -EFAULT;
  /*
   * The window is here, the page is dirty but not logged in
 * The bitmap.
   */
  mark_page_dirty(kvm, gfn);
  return 0;
 
 
 Overall, please document what is the required order of operations for
 both set_spte and get_dirty_log and why this order is safe (right on top
 of the code).
 
 Okay.
 
 The order we required here is, we should 1) set spte to writable __before__
 set the dirty bitmap and 2) add the spte into rmap __before__ set the dirty
 bitmap.
 
 The point 1) is the same as fast_pf_fix_direct_spte(), which i explained 
 above.
 The point 1) and 2) can ensure we can find the spte on rmap and see the spte 
 is
 writable when we do lockless write-protection, otherwise these cases will 
 happen
 
 VCPU 0   kvm ioctl doing get-dirty-pages
 
 mark_page_dirty(gfn) which
 set the gfn on the dirty maps
  mask = xchg(dirty_bitmap, 0)
 
  walk all gfns which set on mask and
  locklessly write-protect the gfn,
  then walk rmap, see no spte on that rmap
  
 
 add the spte into rmap
 
 !! Then the page can be freely wrote but not recorded in the dirty 
 bitmap.
 
 Or:
 
 VCPU 0   kvm ioctl doing get-dirty-pages
 
 mark_page_dirty(gfn) which
 set the gfn on the dirty maps
 
 add spte into rmap
  mask = xchg(dirty_bitmap, 0)
 
  walk all gfns which set on mask and
  locklessly write-protect the gfn,
  then walk rmap, see spte is on the ramp
  but it is readonly or nonpresent.
  
 Mark spte writable
 
 !! Then the page can be freely wrote but not recorded in the dirty 
 bitmap.
 
 Hopefully, i have clarified it, if you have any questions, please let me 
 know.
 
 Yes, partially. Please on top of the explanation above, have something along
 the lines of
 
 The flush IPI assumes that a thread switch happens in this order
 comment at arch/x86/mm/tlb.c
 
 With get_user_pages_fast, we walk down the pagetables without taking
 comment at arch/x86/mm/gup.c

Marcelo, thanks for your suggestion, i will improve both the changelog and
the comments in the next version.

 
 
 What about the relation between read-only spte and respective TLB flush?
 That is:
 
 vcpu0 vcpu1
 
 lockless write protect
 mark spte read-only
   either 

Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-08-06 Thread Xiao Guangrong
On 08/07/2013 09:48 AM, Marcelo Tosatti wrote:
> On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
>> Make sure we can see the writable spte before the dirt bitmap is visible
>>
>> We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte based
>> on the dirty bitmap, we should ensure the writable spte can be found in rmap
>> before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap 
>> and
>> failed to write-protect the page
>>
>> Signed-off-by: Xiao Guangrong 
>> ---
>>  arch/x86/kvm/mmu.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> Can you explain why this is safe, with regard to the rule 
> at edde99ce05290e50 ?

BTW, this log fixed this case:

 VCPU 0KVM migration control

   write-protects all pages
#Pf happen then the page
become writable, set dirty
bit on the bitmap

  swap the bitmap, current bitmap is empty

write the page (no dirty log)

  stop the guest and push
  the remaining dirty pages
Stopped
  See current bitmap is empty that means
  no page is dirty.
> 
> "The rule is that all pages are either dirty in the current bitmap,
> or write-protected, which is violated here."

Actually, this rule is not complete true, there's the 3th case:
the window between write guest page and set dirty bitmap is valid.
In that window, page is write-free and not dirty logged.

This case is based on the fact that at the final step of live migration,
kvm should stop the guest and push the remaining dirty pages to the
destination.

They're some examples in the current code:
example 1, in fast_pf_fix_direct_spte():
if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
/* The window in here... */
mark_page_dirty(vcpu->kvm, gfn);

example 2, in kvm_write_guest_page():
r = __copy_to_user((void __user *)addr + offset, data, len);
if (r)
return -EFAULT;
/*
 * The window is here, the page is dirty but not logged in
 * The bitmap.
 */
mark_page_dirty(kvm, gfn);
return 0;

> 
> Overall, please document what is the required order of operations for
> both set_spte and get_dirty_log and why this order is safe (right on top
> of the code).

Okay.

The order we required here is, we should 1) set spte to writable __before__
set the dirty bitmap and 2) add the spte into rmap __before__ set the dirty
bitmap.

The point 1) is the same as fast_pf_fix_direct_spte(), which i explained above.
The point 1) and 2) can ensure we can find the spte on rmap and see the spte is
writable when we do lockless write-protection, otherwise these cases will happen

VCPU 0  kvm ioctl doing get-dirty-pages

mark_page_dirty(gfn) which
set the gfn on the dirty maps
  mask = xchg(dirty_bitmap, 0)

  walk all gfns which set on "mask" and
  locklessly write-protect the gfn,
  then walk rmap, see no spte on that rmap


add the spte into rmap

!! Then the page can be freely wrote but not recorded in the dirty bitmap.

Or:

VCPU 0  kvm ioctl doing get-dirty-pages

mark_page_dirty(gfn) which
set the gfn on the dirty maps

add spte into rmap
  mask = xchg(dirty_bitmap, 0)

  walk all gfns which set on "mask" and
  locklessly write-protect the gfn,
  then walk rmap, see spte is on the ramp
  but it is readonly or nonpresent.

Mark spte writable

!! Then the page can be freely wrote but not recorded in the dirty bitmap.

Hopefully, i have clarified it, if you have any questions, please let me know.

> 
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 35d4b50..0fe56ad 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2486,12 +2486,12 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
>> *sptep,
>>  }
>>  }
>>  
>> -if (pte_access & ACC_WRITE_MASK)
>> -mark_page_dirty(vcpu->kvm, gfn);
>> -
>>  set_pte:
>>  if (mmu_spte_update(sptep, spte))
>>  kvm_flush_remote_tlbs(vcpu->kvm);
> 
> Here, there is a modified guest page without dirty log bit set (think
> another vcpu writing to the page via this spte).

This is okay since we call mark_page_dirty(vcpu->kvm, gfn) after that, this
is the same case as fast_pf_fix_direct_spte() and i have explained why it is
safe above.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-08-06 Thread Marcelo Tosatti
On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
> Make sure we can see the writable spte before the dirt bitmap is visible
> 
> We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte based
> on the dirty bitmap, we should ensure the writable spte can be found in rmap
> before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap and
> failed to write-protect the page
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mmu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Can you explain why this is safe, with regard to the rule 
at edde99ce05290e50 ?

"The rule is that all pages are either dirty in the current bitmap,
or write-protected, which is violated here."

Overall, please document what is the required order of operations for
both set_spte and get_dirty_log and why this order is safe (right on top
of the code).

> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 35d4b50..0fe56ad 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2486,12 +2486,12 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>   }
>   }
>  
> - if (pte_access & ACC_WRITE_MASK)
> - mark_page_dirty(vcpu->kvm, gfn);
> -
>  set_pte:
>   if (mmu_spte_update(sptep, spte))
>   kvm_flush_remote_tlbs(vcpu->kvm);

Here, there is a modified guest page without dirty log bit set (think
another vcpu writing to the page via this spte).

> +
> + if (pte_access & ACC_WRITE_MASK)
> + mark_page_dirty(vcpu->kvm, gfn);
>  done:
>   return ret;
>  }
> -- 
> 1.8.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-08-06 Thread Marcelo Tosatti
On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
 Make sure we can see the writable spte before the dirt bitmap is visible
 
 We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte based
 on the dirty bitmap, we should ensure the writable spte can be found in rmap
 before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap and
 failed to write-protect the page
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

Can you explain why this is safe, with regard to the rule 
at edde99ce05290e50 ?

The rule is that all pages are either dirty in the current bitmap,
or write-protected, which is violated here.

Overall, please document what is the required order of operations for
both set_spte and get_dirty_log and why this order is safe (right on top
of the code).

 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 35d4b50..0fe56ad 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -2486,12 +2486,12 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
   }
   }
  
 - if (pte_access  ACC_WRITE_MASK)
 - mark_page_dirty(vcpu-kvm, gfn);
 -
  set_pte:
   if (mmu_spte_update(sptep, spte))
   kvm_flush_remote_tlbs(vcpu-kvm);

Here, there is a modified guest page without dirty log bit set (think
another vcpu writing to the page via this spte).

 +
 + if (pte_access  ACC_WRITE_MASK)
 + mark_page_dirty(vcpu-kvm, gfn);
  done:
   return ret;
  }
 -- 
 1.8.1.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-08-06 Thread Xiao Guangrong
On 08/07/2013 09:48 AM, Marcelo Tosatti wrote:
 On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
 Make sure we can see the writable spte before the dirt bitmap is visible

 We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte based
 on the dirty bitmap, we should ensure the writable spte can be found in rmap
 before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap 
 and
 failed to write-protect the page

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 Can you explain why this is safe, with regard to the rule 
 at edde99ce05290e50 ?

BTW, this log fixed this case:

 VCPU 0KVM migration control

   write-protects all pages
#Pf happen then the page
become writable, set dirty
bit on the bitmap

  swap the bitmap, current bitmap is empty

write the page (no dirty log)

  stop the guest and push
  the remaining dirty pages
Stopped
  See current bitmap is empty that means
  no page is dirty.
 
 The rule is that all pages are either dirty in the current bitmap,
 or write-protected, which is violated here.

Actually, this rule is not complete true, there's the 3th case:
the window between write guest page and set dirty bitmap is valid.
In that window, page is write-free and not dirty logged.

This case is based on the fact that at the final step of live migration,
kvm should stop the guest and push the remaining dirty pages to the
destination.

They're some examples in the current code:
example 1, in fast_pf_fix_direct_spte():
if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
/* The window in here... */
mark_page_dirty(vcpu-kvm, gfn);

example 2, in kvm_write_guest_page():
r = __copy_to_user((void __user *)addr + offset, data, len);
if (r)
return -EFAULT;
/*
 * The window is here, the page is dirty but not logged in
 * The bitmap.
 */
mark_page_dirty(kvm, gfn);
return 0;

 
 Overall, please document what is the required order of operations for
 both set_spte and get_dirty_log and why this order is safe (right on top
 of the code).

Okay.

The order we required here is, we should 1) set spte to writable __before__
set the dirty bitmap and 2) add the spte into rmap __before__ set the dirty
bitmap.

The point 1) is the same as fast_pf_fix_direct_spte(), which i explained above.
The point 1) and 2) can ensure we can find the spte on rmap and see the spte is
writable when we do lockless write-protection, otherwise these cases will happen

VCPU 0  kvm ioctl doing get-dirty-pages

mark_page_dirty(gfn) which
set the gfn on the dirty maps
  mask = xchg(dirty_bitmap, 0)

  walk all gfns which set on mask and
  locklessly write-protect the gfn,
  then walk rmap, see no spte on that rmap


add the spte into rmap

!! Then the page can be freely wrote but not recorded in the dirty bitmap.

Or:

VCPU 0  kvm ioctl doing get-dirty-pages

mark_page_dirty(gfn) which
set the gfn on the dirty maps

add spte into rmap
  mask = xchg(dirty_bitmap, 0)

  walk all gfns which set on mask and
  locklessly write-protect the gfn,
  then walk rmap, see spte is on the ramp
  but it is readonly or nonpresent.

Mark spte writable

!! Then the page can be freely wrote but not recorded in the dirty bitmap.

Hopefully, i have clarified it, if you have any questions, please let me know.

 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 35d4b50..0fe56ad 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -2486,12 +2486,12 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
 *sptep,
  }
  }
  
 -if (pte_access  ACC_WRITE_MASK)
 -mark_page_dirty(vcpu-kvm, gfn);
 -
  set_pte:
  if (mmu_spte_update(sptep, spte))
  kvm_flush_remote_tlbs(vcpu-kvm);
 
 Here, there is a modified guest page without dirty log bit set (think
 another vcpu writing to the page via this spte).

This is okay since we call mark_page_dirty(vcpu-kvm, gfn) after that, this
is the same case as fast_pf_fix_direct_spte() and i have explained why it is
safe above.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the 

Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-07-31 Thread Xiao Guangrong
On 07/30/2013 09:26 PM, Paolo Bonzini wrote:
> Il 30/07/2013 15:02, Xiao Guangrong ha scritto:
>> Make sure we can see the writable spte before the dirt bitmap is visible
>>
>> We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte based
>> on the dirty bitmap, we should ensure the writable spte can be found in rmap
>> before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap 
>> and
>> failed to write-protect the page
>>
>> Signed-off-by: Xiao Guangrong 
>> ---
>>  arch/x86/kvm/mmu.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 35d4b50..0fe56ad 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2486,12 +2486,12 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
>> *sptep,
>>  }
>>  }
>>  
>> -if (pte_access & ACC_WRITE_MASK)
>> -mark_page_dirty(vcpu->kvm, gfn);
>> -
>>  set_pte:
>>  if (mmu_spte_update(sptep, spte))
>>  kvm_flush_remote_tlbs(vcpu->kvm);
>> +
>> +if (pte_access & ACC_WRITE_MASK)
>> +mark_page_dirty(vcpu->kvm, gfn);
>>  done:
>>  return ret;
>>  }
>>
> 
> What about this comment above:
> 
> /*
>  * Optimization: for pte sync, if spte was writable the hash
>  * lookup is unnecessary (and expensive). Write protection
>  * is responsibility of mmu_get_page / kvm_sync_page.

This comments mean no sync shadow page created if the the spte is still writable
because add a sync page need to writable all spte point to this page. So we can
keep the spte as writable.

I think it is better to checking SPTE_MMU_WRITEABLE bit instead of 
PT_WRITABLE_MASK
since the latter bit can be cleared by dirty log and it can be a separate patch 
i
think.

>  * Same reasoning can be applied to dirty page accounting.

This comment means if the spte is writable the corresponding bit on dirty bitmap
should have been set.

Thanks to your reminder, i think this comment should be dropped, now we need to
mark_page_dirty() whenever the spte update to writable. Otherwise this will 
happen:

   VCPU 0 VCPU 1
Clear dirty bit on the bitmap
   Read the spte, it is writable
write the spte
   update the spte, keep it as writable
   and do not call mark_page_dirty().
Flush tlb

Then vcpu 1 can continue to write the page but fail to set the bit on the 
bitmap.

>  */
> if (!can_unsync && is_writable_pte(*sptep))
> goto set_pte;
> 
> if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
> 
> 
> ?
> 
> Should it be changed to
> 
> if (!can_unsync && is_writable_pte(*sptep))
> pte_access &= ~ACC_WRITE_MASK; /* do not mark dirty */

Yes, this can avoid the issue above.

But there is only a small window between sync the spte and locklessly 
write-protect
the spte (since the sptep is already writable), i think we'd better keep the 
spte
writable to speed up the normal case. :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-07-31 Thread Xiao Guangrong
On 07/30/2013 09:26 PM, Paolo Bonzini wrote:
 Il 30/07/2013 15:02, Xiao Guangrong ha scritto:
 Make sure we can see the writable spte before the dirt bitmap is visible

 We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte based
 on the dirty bitmap, we should ensure the writable spte can be found in rmap
 before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap 
 and
 failed to write-protect the page

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 35d4b50..0fe56ad 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -2486,12 +2486,12 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
 *sptep,
  }
  }
  
 -if (pte_access  ACC_WRITE_MASK)
 -mark_page_dirty(vcpu-kvm, gfn);
 -
  set_pte:
  if (mmu_spte_update(sptep, spte))
  kvm_flush_remote_tlbs(vcpu-kvm);
 +
 +if (pte_access  ACC_WRITE_MASK)
 +mark_page_dirty(vcpu-kvm, gfn);
  done:
  return ret;
  }

 
 What about this comment above:
 
 /*
  * Optimization: for pte sync, if spte was writable the hash
  * lookup is unnecessary (and expensive). Write protection
  * is responsibility of mmu_get_page / kvm_sync_page.

This comments mean no sync shadow page created if the the spte is still writable
because add a sync page need to writable all spte point to this page. So we can
keep the spte as writable.

I think it is better to checking SPTE_MMU_WRITEABLE bit instead of 
PT_WRITABLE_MASK
since the latter bit can be cleared by dirty log and it can be a separate patch 
i
think.

  * Same reasoning can be applied to dirty page accounting.

This comment means if the spte is writable the corresponding bit on dirty bitmap
should have been set.

Thanks to your reminder, i think this comment should be dropped, now we need to
mark_page_dirty() whenever the spte update to writable. Otherwise this will 
happen:

   VCPU 0 VCPU 1
Clear dirty bit on the bitmap
   Read the spte, it is writable
write the spte
   update the spte, keep it as writable
   and do not call mark_page_dirty().
Flush tlb

Then vcpu 1 can continue to write the page but fail to set the bit on the 
bitmap.

  */
 if (!can_unsync  is_writable_pte(*sptep))
 goto set_pte;
 
 if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 
 
 ?
 
 Should it be changed to
 
 if (!can_unsync  is_writable_pte(*sptep))
 pte_access = ~ACC_WRITE_MASK; /* do not mark dirty */

Yes, this can avoid the issue above.

But there is only a small window between sync the spte and locklessly 
write-protect
the spte (since the sptep is already writable), i think we'd better keep the 
spte
writable to speed up the normal case. :)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-07-30 Thread Paolo Bonzini
Il 30/07/2013 15:02, Xiao Guangrong ha scritto:
> Make sure we can see the writable spte before the dirt bitmap is visible
> 
> We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte based
> on the dirty bitmap, we should ensure the writable spte can be found in rmap
> before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap and
> failed to write-protect the page
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mmu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 35d4b50..0fe56ad 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2486,12 +2486,12 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>   }
>   }
>  
> - if (pte_access & ACC_WRITE_MASK)
> - mark_page_dirty(vcpu->kvm, gfn);
> -
>  set_pte:
>   if (mmu_spte_update(sptep, spte))
>   kvm_flush_remote_tlbs(vcpu->kvm);
> +
> + if (pte_access & ACC_WRITE_MASK)
> + mark_page_dirty(vcpu->kvm, gfn);
>  done:
>   return ret;
>  }
> 

What about this comment above:

/*
 * Optimization: for pte sync, if spte was writable the hash
 * lookup is unnecessary (and expensive). Write protection
 * is responsibility of mmu_get_page / kvm_sync_page.
 * Same reasoning can be applied to dirty page accounting.
 */
if (!can_unsync && is_writable_pte(*sptep))
goto set_pte;

if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {


?

Should it be changed to

if (!can_unsync && is_writable_pte(*sptep))
pte_access &= ~ACC_WRITE_MASK; /* do not mark dirty */

else if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {

?

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-07-30 Thread Xiao Guangrong
Make sure we can see the writable spte before the dirt bitmap is visible

We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte based
on the dirty bitmap, we should ensure the writable spte can be found in rmap
before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap and
failed to write-protect the page

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 35d4b50..0fe56ad 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2486,12 +2486,12 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
}
}
 
-   if (pte_access & ACC_WRITE_MASK)
-   mark_page_dirty(vcpu->kvm, gfn);
-
 set_pte:
if (mmu_spte_update(sptep, spte))
kvm_flush_remote_tlbs(vcpu->kvm);
+
+   if (pte_access & ACC_WRITE_MASK)
+   mark_page_dirty(vcpu->kvm, gfn);
 done:
return ret;
 }
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-07-30 Thread Xiao Guangrong
Make sure we can see the writable spte before the dirt bitmap is visible

We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte based
on the dirty bitmap, we should ensure the writable spte can be found in rmap
before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap and
failed to write-protect the page

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/kvm/mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 35d4b50..0fe56ad 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2486,12 +2486,12 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
}
}
 
-   if (pte_access  ACC_WRITE_MASK)
-   mark_page_dirty(vcpu-kvm, gfn);
-
 set_pte:
if (mmu_spte_update(sptep, spte))
kvm_flush_remote_tlbs(vcpu-kvm);
+
+   if (pte_access  ACC_WRITE_MASK)
+   mark_page_dirty(vcpu-kvm, gfn);
 done:
return ret;
 }
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

2013-07-30 Thread Paolo Bonzini
Il 30/07/2013 15:02, Xiao Guangrong ha scritto:
 Make sure we can see the writable spte before the dirt bitmap is visible
 
 We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte based
 on the dirty bitmap, we should ensure the writable spte can be found in rmap
 before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap and
 failed to write-protect the page
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 35d4b50..0fe56ad 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -2486,12 +2486,12 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
   }
   }
  
 - if (pte_access  ACC_WRITE_MASK)
 - mark_page_dirty(vcpu-kvm, gfn);
 -
  set_pte:
   if (mmu_spte_update(sptep, spte))
   kvm_flush_remote_tlbs(vcpu-kvm);
 +
 + if (pte_access  ACC_WRITE_MASK)
 + mark_page_dirty(vcpu-kvm, gfn);
  done:
   return ret;
  }
 

What about this comment above:

/*
 * Optimization: for pte sync, if spte was writable the hash
 * lookup is unnecessary (and expensive). Write protection
 * is responsibility of mmu_get_page / kvm_sync_page.
 * Same reasoning can be applied to dirty page accounting.
 */
if (!can_unsync  is_writable_pte(*sptep))
goto set_pte;

if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {


?

Should it be changed to

if (!can_unsync  is_writable_pte(*sptep))
pte_access = ~ACC_WRITE_MASK; /* do not mark dirty */

else if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {

?

Thanks,

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/