Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 03:03, David Matlack ha scritto:
 On Tue, Aug 19, 2014 at 5:29 PM, Xiao Guangrong
 xiaoguangr...@linux.vnet.ibm.com wrote:
 On 08/19/2014 05:03 PM, Paolo Bonzini wrote:
 Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
 Okay, what confused me it that it seems that the single line patch
 is ok to you. :)

 No, it was late and I was confused. :)

 Now, do we really need to care the case 2? like David said:
 Sorry I didn't explain myself very well: Since we can get a single wrong
 mmio exit no matter what, it has to be handled in userspace. So my point
 was, it doesn't really help to fix that one very specific way that it can
 happen, because it can just happen in other ways. (E.g. update memslots
 occurs after is_noslot_pfn() and before mmio exit).

 What's your idea?

 I think if you always treat the low bit as zero in mmio sptes, you can
 do that without losing a bit of the generation.

 What's you did is avoiding cache a invalid generation number into spte, but
 actually if we can figure it out when we check mmio access, it's ok. Like 
 the
 updated patch i posted should fix it, that way avoids doubly increase the 
 number.

 Yes.

 Okay, if you're interested increasing the number doubly, there is the 
 simpler
 one:

 This wastes a bit in the mmio spte though.  My idea is to increase the
 memslots generation twice, but drop the low bit in the mmio spte.

 Yeah, really smart idea. :)

 Paolo/David, would you mind making a patch for this (+ the comments in 
 David's
 patch)?
 
 Paolo, since it was your idea would you like to write it? I don't mind either
 way.

Sure, I'll post the patch for review.

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-19 Thread Paolo Bonzini
Il 19/08/2014 05:50, Xiao Guangrong ha scritto:
 
 Note in the step *, my approach detects the invalid generation-number which
 will invalidate the mmio spte properly .

You are right, in fact my mail included another part: Another 
alternative could be to use the low bit to mark an in-progress change, 
*and skip the caching if the low bit is set*.

I think if you always treat the low bit as zero in mmio sptes, you can 
do that without losing a bit of the generation.

Something like this (untested/uncompiled):

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 931467881da7..3a56d377c6d7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -199,16 +199,20 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask)
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
 
 /*
- * spte bits of bit 3 ~ bit 11 are used as low 9 bits of generation number,
- * the bits of bits 52 ~ bit 61 are used as high 10 bits of generation
- * number.
+ * the low bit of the generation number is always presumed to be zero.
+ * This disables mmio caching during memslot updates.  The concept is
+ * similar to a seqcount but instead of retrying the access we just punt
+ * and ignore the cache.
+ *
+ * spte bits 3-11 are used as bits 1-9 of the generation number,
+ * the bits 52-61 are used as bits 10-19 of the generation number.
  */
-#define MMIO_SPTE_GEN_LOW_SHIFT3
+#define MMIO_SPTE_GEN_LOW_SHIFT2
 #define MMIO_SPTE_GEN_HIGH_SHIFT   52
 
-#define MMIO_GEN_SHIFT 19
-#define MMIO_GEN_LOW_SHIFT 9
-#define MMIO_GEN_LOW_MASK  ((1  MMIO_GEN_LOW_SHIFT) - 1)
+#define MMIO_GEN_SHIFT 20
+#define MMIO_GEN_LOW_SHIFT 10
+#define MMIO_GEN_LOW_MASK  ((1  MMIO_GEN_LOW_SHIFT) - 2)
 #define MMIO_GEN_MASK  ((1  MMIO_GEN_SHIFT) - 1)
 #define MMIO_MAX_GEN   ((1  MMIO_GEN_SHIFT) - 1)
 
@@ -236,12 +240,7 @@ static unsigned int get_mmio_spte_generation(u64 spte)
 
 static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
 {
-   /*
-* Init kvm generation close to MMIO_MAX_GEN to easily test the
-* code of handling generation number wrap-around.
-*/
-   return (kvm_memslots(kvm)-generation +
- MMIO_MAX_GEN - 150)  MMIO_GEN_MASK;
+   return kvm_memslots(kvm)-generation  MMIO_GEN_MASK;
 }
 
 static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a69a623938b8..c7e2800313b8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -474,6 +476,13 @@ static struct kvm *kvm_create_vm(unsigned long type)
kvm-memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
if (!kvm-memslots)
goto out_err_no_srcu;
+
+   /*
+* Init kvm generation close to MMIO_MAX_GEN to easily test the
+* code of handling generation number wrap-around.
+*/
+   kvm-memslots-generation = -150;
+
kvm_init_memslots_id(kvm);
if (init_srcu_struct(kvm-srcu))
goto out_err_no_srcu;
@@ -725,6 +732,8 @@ static struct kvm_memslots *install_new_memslots(struct kvm 
*kvm,
synchronize_srcu_expedited(kvm-srcu);
 
kvm_arch_memslots_updated(kvm);
+   slots-generation++;
+   WARN_ON(slots-generation  1);
 
return old_memslots;
 }

(modulo the changes to always set the generation in install_new_memslots, which
I'm eliding for clarity).

Moving the initialization to kvm_create_vm ensures that the low bit is untouched
between install_new_memslots and kvm_current_mmio_generation.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-19 Thread Xiao Guangrong
On 08/19/2014 04:28 PM, Paolo Bonzini wrote:
 Il 19/08/2014 05:50, Xiao Guangrong ha scritto:

 Note in the step *, my approach detects the invalid generation-number which
 will invalidate the mmio spte properly .
 
 You are right, in fact my mail included another part: Another 
 alternative could be to use the low bit to mark an in-progress change, 
 *and skip the caching if the low bit is set*.

Okay, what confused me it that it seems that the single line patch
is ok to you. :)

Now, do we really need to care the case 2? like David said:
Sorry I didn't explain myself very well: Since we can get a single wrong
mmio exit no matter what, it has to be handled in userspace. So my point
was, it doesn't really help to fix that one very specific way that it can
happen, because it can just happen in other ways. (E.g. update memslots
occurs after is_noslot_pfn() and before mmio exit).

What's your idea?

 
 I think if you always treat the low bit as zero in mmio sptes, you can 
 do that without losing a bit of the generation.

What's you did is avoiding cache a invalid generation number into spte, but
actually if we can figure it out when we check mmio access, it's ok. Like the
updated patch i posted should fix it, that way avoids doubly increase the 
number.

Okay, if you're interested increasing the number doubly, there is the simpler
one:

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..bf49170 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -236,6 +236,9 @@ static unsigned int get_mmio_spte_generation(u64 spte)

 static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
 {
+   /* The initialized generation number should be even. */
+   BUILD_BUG_ON((MMIO_MAX_GEN - 150)  0x1);
+
/*
 * Init kvm generation close to MMIO_MAX_GEN to easily test the
 * code of handling generation number wrap-around.
@@ -292,6 +295,14 @@ static bool check_mmio_spte(struct kvm *kvm, u64 spte)
kvm_gen = kvm_current_mmio_generation(kvm);
spte_gen = get_mmio_spte_generation(spte);

+   /*
+* Aha, we cached a being-updated generation number or
+* generation number is currnetly being updated, let do the
+* real check for mmio access.
+*/
+   if ((kvm_gen | spte_gen)  0x1)
+   return false;
+
trace_check_mmio_spte(spte, kvm_gen, spte_gen);
return likely(kvm_gen == spte_gen);
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..5c3f7b7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -725,7 +725,7 @@ static struct kvm_memslots *install_new_memslots(struct kvm 
*kvm,
update_memslots(slots, new, kvm-memslots-generation);
rcu_assign_pointer(kvm-memslots, slots);
synchronize_srcu_expedited(kvm-srcu);
-
+   kvm-memslots-generation++;
kvm_arch_memslots_updated(kvm);

return old_memslots;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-19 Thread Paolo Bonzini
Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
 Okay, what confused me it that it seems that the single line patch
 is ok to you. :)

No, it was late and I was confused. :)

 Now, do we really need to care the case 2? like David said:
 Sorry I didn't explain myself very well: Since we can get a single wrong
 mmio exit no matter what, it has to be handled in userspace. So my point
 was, it doesn't really help to fix that one very specific way that it can
 happen, because it can just happen in other ways. (E.g. update memslots
 occurs after is_noslot_pfn() and before mmio exit).
 
 What's your idea?
 
  I think if you always treat the low bit as zero in mmio sptes, you can 
  do that without losing a bit of the generation.
 
 What's you did is avoiding cache a invalid generation number into spte, but
 actually if we can figure it out when we check mmio access, it's ok. Like the
 updated patch i posted should fix it, that way avoids doubly increase the 
 number.

Yes.

 Okay, if you're interested increasing the number doubly, there is the simpler
 one:

This wastes a bit in the mmio spte though.  My idea is to increase the
memslots generation twice, but drop the low bit in the mmio spte.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-19 Thread Xiao Guangrong
On 08/19/2014 05:03 PM, Paolo Bonzini wrote:
 Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
 Okay, what confused me it that it seems that the single line patch
 is ok to you. :)
 
 No, it was late and I was confused. :)
 
 Now, do we really need to care the case 2? like David said:
 Sorry I didn't explain myself very well: Since we can get a single wrong
 mmio exit no matter what, it has to be handled in userspace. So my point
 was, it doesn't really help to fix that one very specific way that it can
 happen, because it can just happen in other ways. (E.g. update memslots
 occurs after is_noslot_pfn() and before mmio exit).

 What's your idea?

 I think if you always treat the low bit as zero in mmio sptes, you can 
 do that without losing a bit of the generation.

 What's you did is avoiding cache a invalid generation number into spte, but
 actually if we can figure it out when we check mmio access, it's ok. Like the
 updated patch i posted should fix it, that way avoids doubly increase the 
 number.
 
 Yes.
 
 Okay, if you're interested increasing the number doubly, there is the simpler
 one:
 
 This wastes a bit in the mmio spte though.  My idea is to increase the
 memslots generation twice, but drop the low bit in the mmio spte.

Yeah, really smart idea. :)

Paolo/David, would you mind making a patch for this (+ the comments in David's
patch)?

Please feel free to add my:
Reviewed-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-19 Thread David Matlack
On Tue, Aug 19, 2014 at 5:29 PM, Xiao Guangrong
xiaoguangr...@linux.vnet.ibm.com wrote:
 On 08/19/2014 05:03 PM, Paolo Bonzini wrote:
 Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
 Okay, what confused me it that it seems that the single line patch
 is ok to you. :)

 No, it was late and I was confused. :)

 Now, do we really need to care the case 2? like David said:
 Sorry I didn't explain myself very well: Since we can get a single wrong
 mmio exit no matter what, it has to be handled in userspace. So my point
 was, it doesn't really help to fix that one very specific way that it can
 happen, because it can just happen in other ways. (E.g. update memslots
 occurs after is_noslot_pfn() and before mmio exit).

 What's your idea?

 I think if you always treat the low bit as zero in mmio sptes, you can
 do that without losing a bit of the generation.

 What's you did is avoiding cache a invalid generation number into spte, but
 actually if we can figure it out when we check mmio access, it's ok. Like 
 the
 updated patch i posted should fix it, that way avoids doubly increase the 
 number.

 Yes.

 Okay, if you're interested increasing the number doubly, there is the 
 simpler
 one:

 This wastes a bit in the mmio spte though.  My idea is to increase the
 memslots generation twice, but drop the low bit in the mmio spte.

 Yeah, really smart idea. :)

 Paolo/David, would you mind making a patch for this (+ the comments in David's
 patch)?

Paolo, since it was your idea would you like to write it? I don't mind either
way.


 Please feel free to add my:
 Reviewed-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Paolo Bonzini
Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
 - update_memslots(slots, new, kvm-memslots-generation);
 + /* ensure generation number is always increased. */
 + slots-generation = old_memslots-generation;
 + update_memslots(slots, new);
   rcu_assign_pointer(kvm-memslots, slots);
   synchronize_srcu_expedited(kvm-srcu);
 + slots-generation++;

I don't trust my brain enough to review this patch.

kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
patch I trust myself reviewing would change a bunch of functions in
kvm_main.c to take a memslots struct.  This would make it easy to
respect the hard and fast rule of not dereferencing the same pointer
twice.  But it would be a tedious change.

Another alternative could be to use the low bit to mark an in-progress
change, and skip the caching if the low bit is set.  Similar to a
seqcount (except if read_seqcount_retry fails, we just punt and not
retry anything), you could use it even though the memory barriers
provided by write_seqcount_begin/end are not too useful in this case.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Xiao Guangrong

Hi Paolo,

Thank you to review the patch!

On Aug 18, 2014, at 9:57 PM, Paolo Bonzini pbonz...@redhat.com wrote:

 Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
 -update_memslots(slots, new, kvm-memslots-generation);
 +/* ensure generation number is always increased. */
 +slots-generation = old_memslots-generation;
 +update_memslots(slots, new);
  rcu_assign_pointer(kvm-memslots, slots);
  synchronize_srcu_expedited(kvm-srcu);
 +slots-generation++;
 
 I don't trust my brain enough to review this patch.

Sorry to make you confused. I should expain it more clearly.

What this patch tried to fix is:  kvm will generate wrong mmio-exit forever
if no luck event cleans mmio spte. (eg. if no memory pressure or  no
context-sync on that spte.)

Note, it is hard to do precise sync between kvm_vm_ioctl_set_memory_region
and mmio-exit - that means userspace is able to get mmio-exit even if
kvm_vm_ioctl_set_memory_region have finished, for example, kvm identifies
a mmio access before issuing the ioctl and injects mmio-exit to userspace after
ioctl return. So checking if mmio-exit is a real mmio access in userspace is
needed anyway.

 kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
 patch I trust myself reviewing would change a bunch of functions in
 kvm_main.c to take a memslots struct.  This would make it easy to
 respect the hard and fast rule of not dereferencing the same pointer
 twice.  But it would be a tedious change.

kvm_set_memory_region is the only place updating memslot and
kvm_current_mmio_generation accesses memslot by rcu-dereference,
i do not know why other places need to take into account.

I think this patch is auditable, page-fault is always called by holding
srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
Only these cases can happen:

1)  page fault occurs before synchronize_srcu_expedited.
In this case, vcpu will generate mmio-exit for the memslot being registered
by the ioctl. That’s ok since the ioctl have not finished.

2) page fault occurs after synchronize_srcu_expedited and during
   increasing generation-number.
   In this case, userspace may get wrong mmio-exit (that happen if handing
   page-fault is slower that the ioctl), that’s ok too since userspace need do
  the check anyway like i said above.

3) page fault occurs after generation-number update
   that’s definitely correct. :)

 Another alternative could be to use the low bit to mark an in-progress
 change, and skip the caching if the low bit is set.  Similar to a
 seqcount (except if read_seqcount_retry fails, we just punt and not
 retry anything), you could use it even though the memory barriers
 provided by write_seqcount_begin/end are not too useful in this case.

I do not know how the bit works, page fault will cache the memslot before
the bit set and cache the generation-number after the bit set.

Maybe i missed your idea, could you please detail it?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread David Matlack
On Mon, Aug 18, 2014 at 9:35 AM, Xiao Guangrong
xiaoguangrong.e...@gmail.com wrote:

 Hi Paolo,

 Thank you to review the patch!

 On Aug 18, 2014, at 9:57 PM, Paolo Bonzini pbonz...@redhat.com wrote:

 Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
 -update_memslots(slots, new, kvm-memslots-generation);
 +/* ensure generation number is always increased. */
 +slots-generation = old_memslots-generation;
 +update_memslots(slots, new);
  rcu_assign_pointer(kvm-memslots, slots);
  synchronize_srcu_expedited(kvm-srcu);
 +slots-generation++;

 I don't trust my brain enough to review this patch.

Xiao, I thought about your approach a lot and I can't think of a bad race
that isn't already possible due to the fact that kvm allows memslot
mutation to race with vm exits. That being said, it's hard to reason about
all the other clients of memslots and what weirdness (or badness) will
be caused by updating generation after srcu_synch. I like Paolo's two
approaches because they fix the bug without any side-effects.

 Sorry to make you confused. I should expain it more clearly.

 What this patch tried to fix is:  kvm will generate wrong mmio-exit forever
 if no luck event cleans mmio spte. (eg. if no memory pressure or  no
 context-sync on that spte.)

 Note, it is hard to do precise sync between kvm_vm_ioctl_set_memory_region
 and mmio-exit - that means userspace is able to get mmio-exit even if
 kvm_vm_ioctl_set_memory_region have finished, for example, kvm identifies
 a mmio access before issuing the ioctl and injects mmio-exit to userspace 
 after
 ioctl return. So checking if mmio-exit is a real mmio access in userspace is
 needed anyway.

 kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
 patch I trust myself reviewing would change a bunch of functions in
 kvm_main.c to take a memslots struct.  This would make it easy to
 respect the hard and fast rule of not dereferencing the same pointer
 twice.  But it would be a tedious change.

 kvm_set_memory_region is the only place updating memslot and
 kvm_current_mmio_generation accesses memslot by rcu-dereference,
 i do not know why other places need to take into account.

If you rcu_dereference() more than once, you can't trust previous
decisions based on rcu_dereference()'s. If the mmio cache code only
did one rcu_dereference() per vm exit, this bug would be gone.

 I think this patch is auditable, page-fault is always called by holding
 srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
 Only these cases can happen:

 1)  page fault occurs before synchronize_srcu_expedited.
 In this case, vcpu will generate mmio-exit for the memslot being 
 registered
 by the ioctl. That’s ok since the ioctl have not finished.

 2) page fault occurs after synchronize_srcu_expedited and during
increasing generation-number.
In this case, userspace may get wrong mmio-exit (that happen if handing
page-fault is slower that the ioctl), that’s ok too since userspace need do
   the check anyway like i said above.

 3) page fault occurs after generation-number update
that’s definitely correct. :)

 Another alternative could be to use the low bit to mark an in-progress
 change, and skip the caching if the low bit is set.  Similar to a
 seqcount (except if read_seqcount_retry fails, we just punt and not
 retry anything), you could use it even though the memory barriers
 provided by write_seqcount_begin/end are not too useful in this case.

I like this approach best. It would have the least code changes and provide
the same guarantees.

 I do not know how the bit works, page fault will cache the memslot before
 the bit set and cache the generation-number after the bit set.

 Maybe i missed your idea, could you please detail it?

In vcpu_cache_mmio_info() if generation is odd, just don't do the caching
because memslots were changed while we were running and we just assume the
worst case.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Paolo Bonzini
Il 18/08/2014 18:35, Xiao Guangrong ha scritto:
 
 Hi Paolo,
 
 Thank you to review the patch!
 
 On Aug 18, 2014, at 9:57 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 
 Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
 -   update_memslots(slots, new, kvm-memslots-generation);
 +   /* ensure generation number is always increased. */
 +   slots-generation = old_memslots-generation;
 +   update_memslots(slots, new);
 rcu_assign_pointer(kvm-memslots, slots);
 synchronize_srcu_expedited(kvm-srcu);
 +   slots-generation++;

 I don't trust my brain enough to review this patch.
 
 Sorry to make you confused. I should expain it more clearly.

Don't worry, it's not your fault. :)

 kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
 patch I trust myself reviewing would change a bunch of functions in
 kvm_main.c to take a memslots struct.  This would make it easy to
 respect the hard and fast rule of not dereferencing the same pointer
 twice.  But it would be a tedious change.
 
 kvm_set_memory_region is the only place updating memslot and
 kvm_current_mmio_generation accesses memslot by rcu-dereference,
 i do not know why other places need to take into account.

The race occurs because gfn_to_pfn_many_atomic or some other function
has already used kvm_memslots().  Calling kvm_memslots() twice is the
root cause the bug.

 I think this patch is auditable, page-fault is always called by holding
 srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
 Only these cases can happen:
 
 1)  page fault occurs before synchronize_srcu_expedited.
 In this case, vcpu will generate mmio-exit for the memslot being 
 registered
 by the ioctl. That’s ok since the ioctl have not finished.
 
 2) page fault occurs after synchronize_srcu_expedited and during
increasing generation-number.
In this case, userspace may get wrong mmio-exit (that happen if handing
page-fault is slower that the ioctl), that’s ok too since userspace need do
   the check anyway like i said above.
 
 3) page fault occurs after generation-number update
that’s definitely correct. :)
 
 Another alternative could be to use the low bit to mark an in-progress
 change, and skip the caching if the low bit is set.  Similar to a
 seqcount (except if read_seqcount_retry fails, we just punt and not
 retry anything), you could use it even though the memory barriers
 provided by write_seqcount_begin/end are not too useful in this case.
 
 I do not know how the bit works, page fault will cache the memslot before
 the bit set and cache the generation-number after the bit set.
 
 Maybe i missed your idea, could you please detail it?

Something like this:

-   update_memslots(slots, new, kvm-memslots-generation);
+   /* ensure generation number is always increased. */
+   slots-generation = old_memslots-generation + 1;
+   update_memslots(slots, new);
rcu_assign_pointer(kvm-memslots, slots);
synchronize_srcu_expedited(kvm-srcu);
+   slots-generation++;

Then case 1 and 2 will just have a cache miss.

The low bit is really just because each slot update does 2 generation
increases.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Xiao Guangrong

On Aug 19, 2014, at 2:47 AM, Paolo Bonzini pbonz...@redhat.com wrote:

 
 I think this patch is auditable, page-fault is always called by holding
 srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
 Only these cases can happen:
 
 1)  page fault occurs before synchronize_srcu_expedited.
In this case, vcpu will generate mmio-exit for the memslot being 
 registered
by the ioctl. That’s ok since the ioctl have not finished.
 
 2) page fault occurs after synchronize_srcu_expedited and during
   increasing generation-number.
   In this case, userspace may get wrong mmio-exit (that happen if handing
   page-fault is slower that the ioctl), that’s ok too since userspace need do
  the check anyway like i said above.
 
 3) page fault occurs after generation-number update
   that’s definitely correct. :)
 
 Another alternative could be to use the low bit to mark an in-progress
 change, and skip the caching if the low bit is set.  Similar to a
 seqcount (except if read_seqcount_retry fails, we just punt and not
 retry anything), you could use it even though the memory barriers
 provided by write_seqcount_begin/end are not too useful in this case.
 
 I do not know how the bit works, page fault will cache the memslot before
 the bit set and cache the generation-number after the bit set.
 
 Maybe i missed your idea, could you please detail it?
 
 Something like this:
 
 - update_memslots(slots, new, kvm-memslots-generation);
 + /* ensure generation number is always increased. */
 + slots-generation = old_memslots-generation + 1;
 + update_memslots(slots, new);
   rcu_assign_pointer(kvm-memslots, slots);
   synchronize_srcu_expedited(kvm-srcu);
 + slots-generation++;
 
 Then case 1 and 2 will just have a cache miss.

So, case 2 is what you concerned? :) I still think it is ok but i do not have
strong opinion on that. How about simplify it like this:

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..9fabf6a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -234,16 +234,22 @@ static unsigned int get_mmio_spte_generation(u64 spte)
return gen;
 }
 
-static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
+static unsigned int __kvm_current_mmio_generation(struct kvm_memslots *slots)
 {
+
/*
 * Init kvm generation close to MMIO_MAX_GEN to easily test the
 * code of handling generation number wrap-around.
 */
-   return (kvm_memslots(kvm)-generation +
+   return (slots-generation +
  MMIO_MAX_GEN - 150)  MMIO_GEN_MASK;
 }
 
+static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
+{
+   return __kvm_current_mmio_generation(kvm_memslots(kvm));
+}
+
 static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
   unsigned access)
 {
@@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, 
gfn_t gfn,
 
 static bool check_mmio_spte(struct kvm *kvm, u64 spte)
 {
+   struct kvm_memslots *slots = kvm_memslots(kvm);
unsigned int kvm_gen, spte_gen;
 
-   kvm_gen = kvm_current_mmio_generation(kvm);
+   if (slots-updated)
+   return false;
+
+   smp_rmb();
+   
+   kvm_gen = __kvm_current_mmio_generation(slots);
spte_gen = get_mmio_spte_generation(spte);
 
trace_check_mmio_spte(spte, kvm_gen, spte_gen);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4b6c01b..1d4e78f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -96,7 +96,7 @@ static void hardware_disable_all(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 static void update_memslots(struct kvm_memslots *slots,
-   struct kvm_memory_slot *new, u64 last_generation);
+   struct kvm_memory_slot *new);
 
 static void kvm_release_pfn_dirty(pfn_t pfn);
 static void mark_page_dirty_in_slot(struct kvm *kvm,
@@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
 }
 
 static void update_memslots(struct kvm_memslots *slots,
-   struct kvm_memory_slot *new,
-   u64 last_generation)
+   struct kvm_memory_slot *new)
 {
if (new) {
int id = new-id;
@@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
if (new-npages != npages)
sort_memslots(slots);
}
-
-   slots-generation = last_generation + 1;
 }
 
 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
@@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct 
kvm *kvm,
 {
struct kvm_memslots *old_memslots = kvm-memslots;
 
-   update_memslots(slots, new, kvm-memslots-generation);
+   /* ensure generation number is always increased. */
+   slots-updated = true;
+   slots-generation = old_memslots-generation;
+   

Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread David Matlack
On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
xiaoguangrong.e...@gmail.com wrote:
 @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, 
 gfn_t gfn,

  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
  {
 +   struct kvm_memslots *slots = kvm_memslots(kvm);
 unsigned int kvm_gen, spte_gen;

 -   kvm_gen = kvm_current_mmio_generation(kvm);
 +   if (slots-updated)
 +   return false;
 +
 +   smp_rmb();
 +
 +   kvm_gen = __kvm_current_mmio_generation(slots);
 spte_gen = get_mmio_spte_generation(spte);


What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we
block during memslot updates, which I don't think we should :).

 trace_check_mmio_spte(spte, kvm_gen, spte_gen);
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 4b6c01b..1d4e78f 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -96,7 +96,7 @@ static void hardware_disable_all(void);

  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
  static void update_memslots(struct kvm_memslots *slots,
 -   struct kvm_memory_slot *new, u64 last_generation);
 +   struct kvm_memory_slot *new);

  static void kvm_release_pfn_dirty(pfn_t pfn);
  static void mark_page_dirty_in_slot(struct kvm *kvm,
 @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
  }

  static void update_memslots(struct kvm_memslots *slots,
 -   struct kvm_memory_slot *new,
 -   u64 last_generation)
 +   struct kvm_memory_slot *new)
  {
 if (new) {
 int id = new-id;
 @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
 if (new-npages != npages)
 sort_memslots(slots);
 }
 -
 -   slots-generation = last_generation + 1;
  }

  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
 @@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct 
 kvm *kvm,
  {
 struct kvm_memslots *old_memslots = kvm-memslots;

 -   update_memslots(slots, new, kvm-memslots-generation);
 +   /* ensure generation number is always increased. */
 +   slots-updated = true;
 +   slots-generation = old_memslots-generation;
 +   update_memslots(slots, new);
 rcu_assign_pointer(kvm-memslots, slots);
 synchronize_srcu_expedited(kvm-srcu);

 +   slots-generation++;
 +   smp_wmb();
 +   slots-updated = false;
 +
 kvm_arch_memslots_updated(kvm);

 return old_memslots;


This is effectively the same as the first approach.

I just realized how simple Paolo's idea is. I think it can be a one line
patch (without comments):

[...]
update_memslots(slots, new, kvm-memslots-generation);
rcu_assign_pointer(kvm-memslots, slots);
synchronize_srcu_expedited(kvm-srcu);
+   slots-generation++;

kvm_arch_memslots_updated(kvm);
[...]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Paolo Bonzini
Il 18/08/2014 23:15, David Matlack ha scritto:
 I just realized how simple Paolo's idea is. I think it can be a one line
 patch (without comments):
 
 [...]
 update_memslots(slots, new, kvm-memslots-generation);
 rcu_assign_pointer(kvm-memslots, slots);
 synchronize_srcu_expedited(kvm-srcu);
 +   slots-generation++;
 
 kvm_arch_memslots_updated(kvm);
 [...]

Yeah, you're right.  I think at this point it makes sense to put all
generation handling in install_new_memslots, but with proper comments
the above can do as well.

Would you like to send it?  Patch 2 still applies on top.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread David Matlack
On Mon, Aug 18, 2014 at 2:24 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 18/08/2014 23:15, David Matlack ha scritto:
 I just realized how simple Paolo's idea is. I think it can be a one line
 patch (without comments):

 [...]
 update_memslots(slots, new, kvm-memslots-generation);
 rcu_assign_pointer(kvm-memslots, slots);
 synchronize_srcu_expedited(kvm-srcu);
 +   slots-generation++;

 kvm_arch_memslots_updated(kvm);
 [...]

 Yeah, you're right.  I think at this point it makes sense to put all
 generation handling in install_new_memslots, but with proper comments
 the above can do as well.

 Would you like to send it?  Patch 2 still applies on top.

Sure, I like doing everything in install_new_memslots() as well so I'll fix
that.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Xiao Guangrong
On 08/19/2014 05:15 AM, David Matlack wrote:
 On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
 xiaoguangrong.e...@gmail.com wrote:
 @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, 
 gfn_t gfn,

  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
  {
 +   struct kvm_memslots *slots = kvm_memslots(kvm);
 unsigned int kvm_gen, spte_gen;

 -   kvm_gen = kvm_current_mmio_generation(kvm);
 +   if (slots-updated)
 +   return false;
 +
 +   smp_rmb();
 +
 +   kvm_gen = __kvm_current_mmio_generation(slots);
 spte_gen = get_mmio_spte_generation(spte);

 
 What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we
 block during memslot updates, which I don't think we should :).

This exactly fixes case 2, slots-updated just acts as the low bit
but avoid generation number wrap-around and trick handling of the number.
More details please see below.

 
 trace_check_mmio_spte(spte, kvm_gen, spte_gen);
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 4b6c01b..1d4e78f 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -96,7 +96,7 @@ static void hardware_disable_all(void);

  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
  static void update_memslots(struct kvm_memslots *slots,
 -   struct kvm_memory_slot *new, u64 
 last_generation);
 +   struct kvm_memory_slot *new);

  static void kvm_release_pfn_dirty(pfn_t pfn);
  static void mark_page_dirty_in_slot(struct kvm *kvm,
 @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
  }

  static void update_memslots(struct kvm_memslots *slots,
 -   struct kvm_memory_slot *new,
 -   u64 last_generation)
 +   struct kvm_memory_slot *new)
  {
 if (new) {
 int id = new-id;
 @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
 if (new-npages != npages)
 sort_memslots(slots);
 }
 -
 -   slots-generation = last_generation + 1;
  }

  static int check_memory_region_flags(struct kvm_userspace_memory_region 
 *mem)
 @@ -720,10 +717,17 @@ static struct kvm_memslots 
 *install_new_memslots(struct kvm *kvm,
  {
 struct kvm_memslots *old_memslots = kvm-memslots;

 -   update_memslots(slots, new, kvm-memslots-generation);
 +   /* ensure generation number is always increased. */
 +   slots-updated = true;
 +   slots-generation = old_memslots-generation;
 +   update_memslots(slots, new);
 rcu_assign_pointer(kvm-memslots, slots);
 synchronize_srcu_expedited(kvm-srcu);

 +   slots-generation++;
 +   smp_wmb();
 +   slots-updated = false;
 +
 kvm_arch_memslots_updated(kvm);

 return old_memslots;

 
 This is effectively the same as the first approach.
 
 I just realized how simple Paolo's idea is. I think it can be a one line
 patch (without comments):
 
 [...]
 update_memslots(slots, new, kvm-memslots-generation);
 rcu_assign_pointer(kvm-memslots, slots);
 synchronize_srcu_expedited(kvm-srcu);
 +   slots-generation++;
 
 kvm_arch_memslots_updated(kvm);
 [...]

Really? Unfortunately no. :)

See this scenario:

CPU 0  CPU 1
ioctl registering a new memslot which
contains GPA:
   page-fault handler:
 see it'a mmio access on GPA;

 assign the new memslots with generation number increased
 cache the generation-number into spte;
 fix the access and comeback to guest;
SRCU-sync
 page-fault again and check the spte is a valid 
mmio-spte(*)
generation-number++;
return to userspace;
 do mmio-emulation and inject mmio-exit;

!!! userspace receives a unexpected mmio-exit, that is case 2 i exactly
said in the last mail.


Note in the step *, my approach detects the invalid generation-number which
will invalidate the mmio spte properly .

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread David Matlack
On Mon, Aug 18, 2014 at 8:50 PM, Xiao Guangrong
xiaoguangr...@linux.vnet.ibm.com wrote:
 On 08/19/2014 05:15 AM, David Matlack wrote:
 On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
 xiaoguangrong.e...@gmail.com wrote:
 @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, 
 gfn_t gfn,

  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
  {
 +   struct kvm_memslots *slots = kvm_memslots(kvm);
 unsigned int kvm_gen, spte_gen;

 -   kvm_gen = kvm_current_mmio_generation(kvm);
 +   if (slots-updated)
 +   return false;
 +
 +   smp_rmb();
 +
 +   kvm_gen = __kvm_current_mmio_generation(slots);
 spte_gen = get_mmio_spte_generation(spte);


 What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we
 block during memslot updates, which I don't think we should :).

 This exactly fixes case 2, slots-updated just acts as the low bit
 but avoid generation number wrap-around and trick handling of the number.
 More details please see below.


 trace_check_mmio_spte(spte, kvm_gen, spte_gen);
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 4b6c01b..1d4e78f 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -96,7 +96,7 @@ static void hardware_disable_all(void);

  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
  static void update_memslots(struct kvm_memslots *slots,
 -   struct kvm_memory_slot *new, u64 
 last_generation);
 +   struct kvm_memory_slot *new);

  static void kvm_release_pfn_dirty(pfn_t pfn);
  static void mark_page_dirty_in_slot(struct kvm *kvm,
 @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
  }

  static void update_memslots(struct kvm_memslots *slots,
 -   struct kvm_memory_slot *new,
 -   u64 last_generation)
 +   struct kvm_memory_slot *new)
  {
 if (new) {
 int id = new-id;
 @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
 if (new-npages != npages)
 sort_memslots(slots);
 }
 -
 -   slots-generation = last_generation + 1;
  }

  static int check_memory_region_flags(struct kvm_userspace_memory_region 
 *mem)
 @@ -720,10 +717,17 @@ static struct kvm_memslots 
 *install_new_memslots(struct kvm *kvm,
  {
 struct kvm_memslots *old_memslots = kvm-memslots;

 -   update_memslots(slots, new, kvm-memslots-generation);
 +   /* ensure generation number is always increased. */
 +   slots-updated = true;
 +   slots-generation = old_memslots-generation;
 +   update_memslots(slots, new);
 rcu_assign_pointer(kvm-memslots, slots);
 synchronize_srcu_expedited(kvm-srcu);

 +   slots-generation++;
 +   smp_wmb();
 +   slots-updated = false;
 +
 kvm_arch_memslots_updated(kvm);

 return old_memslots;


 This is effectively the same as the first approach.

 I just realized how simple Paolo's idea is. I think it can be a one line
 patch (without comments):

 [...]
 update_memslots(slots, new, kvm-memslots-generation);
 rcu_assign_pointer(kvm-memslots, slots);
 synchronize_srcu_expedited(kvm-srcu);
 +   slots-generation++;

 kvm_arch_memslots_updated(kvm);
 [...]

 Really? Unfortunately no. :)

 See this scenario:

 CPU 0  CPU 1
 ioctl registering a new memslot which
 contains GPA:
page-fault handler:
  see it'a mmio access on GPA;

  assign the new memslots with generation number increased
  cache the generation-number into spte;
  fix the access and comeback to guest;
 SRCU-sync
  page-fault again and check the spte is a valid 
 mmio-spte(*)
 generation-number++;
 return to userspace;
  do mmio-emulation and inject mmio-exit;

 !!! userspace receives a unexpected mmio-exit, that is case 2 i exactly
 said in the last mail.


 Note in the step *, my approach detects the invalid generation-number which
 will invalidate the mmio spte properly .

Sorry I didn't explain myself very well: Since we can get a single wrong
mmio exit no matter what, it has to be handled in userspace. So my point
was, it doesn't really help to fix that one very specific way that it can
happen, because it can just happen in other ways. (E.g. update memslots
occurs after is_noslot_pfn() and before mmio exit).

But it looks like you basically said the same thing earlier, so I think
we're on the same page.

The single line patch I suggested was only intended to fix the forever
incorrectly exit mmio.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Xiao Guangrong
On 08/19/2014 12:31 PM, David Matlack wrote:
 On Mon, Aug 18, 2014 at 8:50 PM, Xiao Guangrong
 xiaoguangr...@linux.vnet.ibm.com wrote:
 On 08/19/2014 05:15 AM, David Matlack wrote:
 On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
 xiaoguangrong.e...@gmail.com wrote:
 @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 
 *sptep, gfn_t gfn,

  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
  {
 +   struct kvm_memslots *slots = kvm_memslots(kvm);
 unsigned int kvm_gen, spte_gen;

 -   kvm_gen = kvm_current_mmio_generation(kvm);
 +   if (slots-updated)
 +   return false;
 +
 +   smp_rmb();
 +
 +   kvm_gen = __kvm_current_mmio_generation(slots);
 spte_gen = get_mmio_spte_generation(spte);


 What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless 
 we
 block during memslot updates, which I don't think we should :).

 This exactly fixes case 2, slots-updated just acts as the low bit
 but avoid generation number wrap-around and trick handling of the number.
 More details please see below.


 trace_check_mmio_spte(spte, kvm_gen, spte_gen);
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 4b6c01b..1d4e78f 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -96,7 +96,7 @@ static void hardware_disable_all(void);

  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
  static void update_memslots(struct kvm_memslots *slots,
 -   struct kvm_memory_slot *new, u64 
 last_generation);
 +   struct kvm_memory_slot *new);

  static void kvm_release_pfn_dirty(pfn_t pfn);
  static void mark_page_dirty_in_slot(struct kvm *kvm,
 @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
  }

  static void update_memslots(struct kvm_memslots *slots,
 -   struct kvm_memory_slot *new,
 -   u64 last_generation)
 +   struct kvm_memory_slot *new)
  {
 if (new) {
 int id = new-id;
 @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
 if (new-npages != npages)
 sort_memslots(slots);
 }
 -
 -   slots-generation = last_generation + 1;
  }

  static int check_memory_region_flags(struct kvm_userspace_memory_region 
 *mem)
 @@ -720,10 +717,17 @@ static struct kvm_memslots 
 *install_new_memslots(struct kvm *kvm,
  {
 struct kvm_memslots *old_memslots = kvm-memslots;

 -   update_memslots(slots, new, kvm-memslots-generation);
 +   /* ensure generation number is always increased. */
 +   slots-updated = true;
 +   slots-generation = old_memslots-generation;
 +   update_memslots(slots, new);
 rcu_assign_pointer(kvm-memslots, slots);
 synchronize_srcu_expedited(kvm-srcu);

 +   slots-generation++;
 +   smp_wmb();
 +   slots-updated = false;
 +
 kvm_arch_memslots_updated(kvm);

 return old_memslots;


 This is effectively the same as the first approach.

 I just realized how simple Paolo's idea is. I think it can be a one line
 patch (without comments):

 [...]
 update_memslots(slots, new, kvm-memslots-generation);
 rcu_assign_pointer(kvm-memslots, slots);
 synchronize_srcu_expedited(kvm-srcu);
 +   slots-generation++;

 kvm_arch_memslots_updated(kvm);
 [...]

 Really? Unfortunately no. :)

 See this scenario:

 CPU 0  CPU 1
 ioctl registering a new memslot which
 contains GPA:
page-fault handler:
  see it'a mmio access on GPA;

  assign the new memslots with generation number increased
  cache the generation-number into spte;
  fix the access and comeback to guest;
 SRCU-sync
  page-fault again and check the spte is a valid 
 mmio-spte(*)
 generation-number++;
 return to userspace;
  do mmio-emulation and inject mmio-exit;

 !!! userspace receives a unexpected mmio-exit, that is case 2 i exactly
 said in the last mail.


 Note in the step *, my approach detects the invalid generation-number which
 will invalidate the mmio spte properly .
 
 Sorry I didn't explain myself very well: Since we can get a single wrong
 mmio exit no matter what, it has to be handled in userspace. So my point
 was, it doesn't really help to fix that one very specific way that it can
 happen, because it can just happen in other ways. (E.g. update memslots
 occurs after is_noslot_pfn() and before mmio exit).
 
 But it looks like you basically said the same thing earlier, so I think
 we're on the same page.
 

Yes, that is what i try to explain in previous mails. :(

 The single line patch I suggested was only intended to fix the forever
 incorrectly exit mmio.

My patch also fixes this case and that does not 

Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread David Matlack
On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong
xiaoguangr...@linux.vnet.ibm.com wrote:
 On 08/19/2014 12:31 PM, David Matlack wrote:
 But it looks like you basically said the same thing earlier, so I think
 we're on the same page.


 Yes, that is what i try to explain in previous mails. :(

I'm glad we understand each other now! Sorry again for my confusion.

 The single line patch I suggested was only intended to fix the forever
 incorrectly exit mmio.

 My patch also fixes this case and that does not doubly increase the
 number. I think this is the better one.

I prefer doubly increasing the generation for this reason: the updated boolean
requires extra code on the client-side to check if there's an update in
progress. And that makes it easy to get wrong. In fact, your patch
forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the
generation requires no client-side code to work.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Xiao Guangrong
On 08/19/2014 01:00 PM, David Matlack wrote:
 On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong
 xiaoguangr...@linux.vnet.ibm.com wrote:
 On 08/19/2014 12:31 PM, David Matlack wrote:
 But it looks like you basically said the same thing earlier, so I think
 we're on the same page.


 Yes, that is what i try to explain in previous mails. :(
 
 I'm glad we understand each other now! Sorry again for my confusion.

Yup, me too. :)

 
 The single line patch I suggested was only intended to fix the forever
 incorrectly exit mmio.

 My patch also fixes this case and that does not doubly increase the
 number. I think this is the better one.
 
 I prefer doubly increasing the generation for this reason: the updated boolean
 requires extra code on the client-side to check if there's an update in
 progress. And that makes it easy to get wrong. In fact, your patch
 forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the
 generation requires no client-side code to work.

No, the updated patch is used to fix case 2 which i draw the scenario in
the last mail. I mean the original patch in this patchset which just
increase the number after srcu-sync.

Then could you tell me that your approach can do but my original patch can not?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread David Matlack
On Mon, Aug 18, 2014 at 10:19 PM, Xiao Guangrong
xiaoguangr...@linux.vnet.ibm.com wrote:
 On 08/19/2014 01:00 PM, David Matlack wrote:
 On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong
 xiaoguangr...@linux.vnet.ibm.com wrote:
 On 08/19/2014 12:31 PM, David Matlack wrote:
 The single line patch I suggested was only intended to fix the forever
 incorrectly exit mmio.

 My patch also fixes this case and that does not doubly increase the
 number. I think this is the better one.

 I prefer doubly increasing the generation for this reason: the updated 
 boolean
 requires extra code on the client-side to check if there's an update in
 progress. And that makes it easy to get wrong. In fact, your patch
 forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the
 generation requires no client-side code to work.

 No, the updated patch is used to fix case 2 which i draw the scenario in
 the last mail. I mean the original patch in this patchset which just
 increase the number after srcu-sync.

 Then could you tell me that your approach can do but my original patch can 
 not?

It avoids publishing new memslots with an old generation number attached to
them (even if it only lasts for a short period of time). Do you have a reason
why you don't want to doubly increase the generation?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Xiao Guangrong
On 08/19/2014 01:40 PM, David Matlack wrote:
 On Mon, Aug 18, 2014 at 10:19 PM, Xiao Guangrong
 xiaoguangr...@linux.vnet.ibm.com wrote:
 On 08/19/2014 01:00 PM, David Matlack wrote:
 On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong
 xiaoguangr...@linux.vnet.ibm.com wrote:
 On 08/19/2014 12:31 PM, David Matlack wrote:
 The single line patch I suggested was only intended to fix the forever
 incorrectly exit mmio.

 My patch also fixes this case and that does not doubly increase the
 number. I think this is the better one.

 I prefer doubly increasing the generation for this reason: the updated 
 boolean
 requires extra code on the client-side to check if there's an update in
 progress. And that makes it easy to get wrong. In fact, your patch
 forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the
 generation requires no client-side code to work.

 No, the updated patch is used to fix case 2 which i draw the scenario in
 the last mail. I mean the original patch in this patchset which just
 increase the number after srcu-sync.

 Then could you tell me that your approach can do but my original patch can 
 not?
 
 It avoids publishing new memslots with an old generation number attached to
 them (even if it only lasts for a short period of time). 

I can not see the problem if that happen, could you please draw the scenario?

 Do you have a reason
 why you don't want to doubly increase the generation?

That more easily causes the number wrap-around.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-14 Thread Xiao Guangrong

Sorry, the title is not clear enough.

This is the v2 which fixes the issue pointed out by David:
 the generation number actually decreases.

Please review.

On 08/14/2014 03:01 PM, Xiao Guangrong wrote:
 We may cache the current mmio generation number and stale memslot info
 into spte, like this scenario:
 
CPU 0  CPU 1
 page fault:add a new memslot
 read memslot and detecting its a mmio access
update memslots
update generation number
 read generation number
 cache the gpa and current gen number into spte
 
 So, if guest accesses the gpa later, it will generate a incorrect
 mmio exit
 
 This patch fixes it by updating the generation number after
 synchronize_srcu_expedited() that makes sure the generation
 number updated only if memslots update is finished
 
 Cc: sta...@vger.kernel.org
 Cc: David Matlack dmatl...@google.com
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  virt/kvm/kvm_main.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 33712fb..bb40df3 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
 
  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
  static void update_memslots(struct kvm_memslots *slots,
 - struct kvm_memory_slot *new, u64 last_generation);
 + struct kvm_memory_slot *new);
 
  static void kvm_release_pfn_dirty(pfn_t pfn);
  static void mark_page_dirty_in_slot(struct kvm *kvm,
 @@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots)
  }
 
  static void update_memslots(struct kvm_memslots *slots,
 - struct kvm_memory_slot *new,
 - u64 last_generation)
 + struct kvm_memory_slot *new)
  {
   if (new) {
   int id = new-id;
 @@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots,
   if (new-npages != npages)
   sort_memslots(slots);
   }
 -
 - slots-generation = last_generation + 1;
  }
 
  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
 @@ -722,9 +719,12 @@ static struct kvm_memslots *install_new_memslots(struct 
 kvm *kvm,
  {
   struct kvm_memslots *old_memslots = kvm-memslots;
 
 - update_memslots(slots, new, kvm-memslots-generation);
 + /* ensure generation number is always increased. */
 + slots-generation = old_memslots-generation;
 + update_memslots(slots, new);
   rcu_assign_pointer(kvm-memslots, slots);
   synchronize_srcu_expedited(kvm-srcu);
 + slots-generation++;
 
   kvm_arch_memslots_updated(kvm);
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-13 Thread Xiao Guangrong
On 08/13/2014 05:18 AM, David Matlack wrote:
 On Mon, Aug 11, 2014 at 10:02 PM, Xiao Guangrong
 xiaoguangr...@linux.vnet.ibm.com wrote:
 @@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct 
 kvm *kvm,
  {
 struct kvm_memslots *old_memslots = kvm-memslots;

 
 I think you want
 
   slots-generation = old_memslots-generation;
 
 here.
 
 On the KVM_MR_DELETE path, install_new_memslots is called twice so this
 patch introduces a short window of time where the generation number
 actually decreases.

Yes, indeed. Thank you for pointing it out, will update.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-12 Thread David Matlack
On Mon, Aug 11, 2014 at 10:02 PM, Xiao Guangrong
xiaoguangr...@linux.vnet.ibm.com wrote:
 @@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct 
 kvm *kvm,
  {
 struct kvm_memslots *old_memslots = kvm-memslots;


I think you want

  slots-generation = old_memslots-generation;

here.

On the KVM_MR_DELETE path, install_new_memslots is called twice so this
patch introduces a short window of time where the generation number
actually decreases.

 -   update_memslots(slots, new, kvm-memslots-generation);
 +   update_memslots(slots, new);
 rcu_assign_pointer(kvm-memslots, slots);
 synchronize_srcu_expedited(kvm-srcu);
 +   slots-generation++;

 kvm_arch_memslots_updated(kvm);

 --
 1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html