Re: [PATCH] KVM: arm/arm64: Close VMID generation race

2018-04-16 Thread Marc Zyngier
On 16/04/18 11:05, Shannon Zhao wrote:
> 
> 
> On 2018/4/11 9:30, Shannon Zhao wrote:
>>
>> On 2018/4/10 23:37, Marc Zyngier wrote:

[...]

 I don't mind either way. If you can be bothered to write a proper commit
 log for this, I'll take it. What I'd really want is Shannon to indicate
 whether or not this solves the issue he was seeing.

>> I'll test Marc's patch. This will take about 3 days since it's not 100%
>> reproducible.
> Hi Marc,
> 
> I've run the test for about 4 days. The issue doesn't appear.
> So Tested-by: Shannon Zhao 

Thanks Shannon, much appreciated. I'll send the fix upstream towards the
end of the week.

Cheers,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Close VMID generation race

2018-04-16 Thread Shannon Zhao


On 2018/4/11 9:30, Shannon Zhao wrote:
> 
> On 2018/4/10 23:37, Marc Zyngier wrote:
>> > On 10/04/18 16:24, Mark Rutland wrote:
>>> >> On Tue, Apr 10, 2018 at 05:05:40PM +0200, Christoffer Dall wrote:
 >>> On Tue, Apr 10, 2018 at 11:51:19AM +0100, Mark Rutland wrote:
>  I think we also need to update kvm->arch.vttbr before updating
>  kvm->arch.vmid_gen, otherwise another CPU can come in, see that the
>  vmid_gen is up-to-date, jump to hyp, and program a stale VTTBR (with 
>  the
>  old VMID).
> 
>  With the smp_wmb() and update of kvm->arch.vmid_gen moved to the end 
>  of
>  the critical section, I think that works, modulo using READ_ONCE() 
>  and
>  WRITE_ONCE() to ensure single-copy-atomicity of the fields we access
>  locklessly.
 >>>
 >>> Indeed, you're right.  I would look something like this, then:
 >>>
 >>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
 >>> index 2e43f9d42bd5..6cb08995e7ff 100644
 >>> --- a/virt/kvm/arm/arm.c
 >>> +++ b/virt/kvm/arm/arm.c
 >>> @@ -450,7 +450,9 @@ void force_vm_exit(const cpumask_t *mask)
 >>>   */
 >>>  static bool need_new_vmid_gen(struct kvm *kvm)
 >>>  {
 >>> -  return unlikely(kvm->arch.vmid_gen != 
 >>> atomic64_read(_vmid_gen));
 >>> +  u64 current_vmid_gen = atomic64_read(_vmid_gen);
 >>> +  smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
 >>> +  return unlikely(READ_ONCE(kvm->arch.vmid_gen) != 
 >>> current_vmid_gen);
 >>>  }
 >>>  
 >>>  /**
 >>> @@ -500,7 +502,6 @@ static void update_vttbr(struct kvm *kvm)
 >>>kvm_call_hyp(__kvm_flush_vm_context);
 >>>}
 >>>  
 >>> -  kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
 >>>kvm->arch.vmid = kvm_next_vmid;
 >>>kvm_next_vmid++;
 >>>kvm_next_vmid &= (1 << kvm_vmid_bits) - 1;
 >>> @@ -509,7 +510,10 @@ static void update_vttbr(struct kvm *kvm)
 >>>pgd_phys = virt_to_phys(kvm->arch.pgd);
 >>>BUG_ON(pgd_phys & ~VTTBR_BADDR_MASK);
 >>>vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & 
 >>> VTTBR_VMID_MASK(kvm_vmid_bits);
 >>> -  kvm->arch.vttbr = pgd_phys | vmid;
 >>> +  WRITE_ONCE(kvm->arch.vttbr, pgd_phys | vmid);
 >>> +
 >>> +  smp_wmb(); /* Ensure vttbr update is observed before vmid_gen 
 >>> update */
 >>> +  kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
 >>>  
 >>>spin_unlock(_vmid_lock);
 >>>  }
>>> >>
>>> >> I think that's right, yes.
>>> >>
>>> >> We could replace the smp_{r,w}mb() barriers with an acquire of the
>>> >> kvm_vmid_gen and a release of kvm->arch.vmid_gen, but if we're really
>>> >> trying to optimize things there are larger algorithmic changes necessary
>>> >> anyhow.
>>> >>
 >>> It's probably easier to convince ourselves about the correctness of
 >>> Marc's code using a rwlock instead, though.  Thoughts?
>>> >>
>>> >> I believe that Marc's preference was the rwlock; I have no preference
>>> >> either way.
>> > 
>> > I don't mind either way. If you can be bothered to write a proper commit
>> > log for this, I'll take it. What I'd really want is Shannon to indicate
>> > whether or not this solves the issue he was seeing.
>> > 
> I'll test Marc's patch. This will take about 3 days since it's not 100%
> reproducible.
Hi Marc,

I've run the test for about 4 days. The issue doesn't appear.
So Tested-by: Shannon Zhao 

Thanks,
-- 
Shannon

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Close VMID generation race

2018-04-10 Thread Shannon Zhao


On 2018/4/10 23:37, Marc Zyngier wrote:
> On 10/04/18 16:24, Mark Rutland wrote:
>> On Tue, Apr 10, 2018 at 05:05:40PM +0200, Christoffer Dall wrote:
>>> On Tue, Apr 10, 2018 at 11:51:19AM +0100, Mark Rutland wrote:
 I think we also need to update kvm->arch.vttbr before updating
 kvm->arch.vmid_gen, otherwise another CPU can come in, see that the
 vmid_gen is up-to-date, jump to hyp, and program a stale VTTBR (with the
 old VMID).

 With the smp_wmb() and update of kvm->arch.vmid_gen moved to the end of
 the critical section, I think that works, modulo using READ_ONCE() and
 WRITE_ONCE() to ensure single-copy-atomicity of the fields we access
 locklessly.
>>>
>>> Indeed, you're right.  I would look something like this, then:
>>>
>>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>>> index 2e43f9d42bd5..6cb08995e7ff 100644
>>> --- a/virt/kvm/arm/arm.c
>>> +++ b/virt/kvm/arm/arm.c
>>> @@ -450,7 +450,9 @@ void force_vm_exit(const cpumask_t *mask)
>>>   */
>>>  static bool need_new_vmid_gen(struct kvm *kvm)
>>>  {
>>> -   return unlikely(kvm->arch.vmid_gen != atomic64_read(_vmid_gen));
>>> +   u64 current_vmid_gen = atomic64_read(_vmid_gen);
>>> +   smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
>>> +   return unlikely(READ_ONCE(kvm->arch.vmid_gen) != current_vmid_gen);
>>>  }
>>>  
>>>  /**
>>> @@ -500,7 +502,6 @@ static void update_vttbr(struct kvm *kvm)
>>> kvm_call_hyp(__kvm_flush_vm_context);
>>> }
>>>  
>>> -   kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
>>> kvm->arch.vmid = kvm_next_vmid;
>>> kvm_next_vmid++;
>>> kvm_next_vmid &= (1 << kvm_vmid_bits) - 1;
>>> @@ -509,7 +510,10 @@ static void update_vttbr(struct kvm *kvm)
>>> pgd_phys = virt_to_phys(kvm->arch.pgd);
>>> BUG_ON(pgd_phys & ~VTTBR_BADDR_MASK);
>>> vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & 
>>> VTTBR_VMID_MASK(kvm_vmid_bits);
>>> -   kvm->arch.vttbr = pgd_phys | vmid;
>>> +   WRITE_ONCE(kvm->arch.vttbr, pgd_phys | vmid);
>>> +
>>> +   smp_wmb(); /* Ensure vttbr update is observed before vmid_gen update */
>>> +   kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
>>>  
>>> spin_unlock(_vmid_lock);
>>>  }
>>
>> I think that's right, yes.
>>
>> We could replace the smp_{r,w}mb() barriers with an acquire of the
>> kvm_vmid_gen and a release of kvm->arch.vmid_gen, but if we're really
>> trying to optimize things there are larger algorithmic changes necessary
>> anyhow.
>>
>>> It's probably easier to convince ourselves about the correctness of
>>> Marc's code using a rwlock instead, though.  Thoughts?
>>
>> I believe that Marc's preference was the rwlock; I have no preference
>> either way.
> 
> I don't mind either way. If you can be bothered to write a proper commit
> log for this, I'll take it. What I'd really want is Shannon to indicate
> whether or not this solves the issue he was seeing.
> 
I'll test Marc's patch. This will take about 3 days since it's not 100%
reproducible.

Thanks,
-- 
Shannon

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Close VMID generation race

2018-04-10 Thread Christoffer Dall
On Tue, Apr 10, 2018 at 04:37:12PM +0100, Marc Zyngier wrote:
> On 10/04/18 16:24, Mark Rutland wrote:
> > On Tue, Apr 10, 2018 at 05:05:40PM +0200, Christoffer Dall wrote:
> >> On Tue, Apr 10, 2018 at 11:51:19AM +0100, Mark Rutland wrote:
> >>> I think we also need to update kvm->arch.vttbr before updating
> >>> kvm->arch.vmid_gen, otherwise another CPU can come in, see that the
> >>> vmid_gen is up-to-date, jump to hyp, and program a stale VTTBR (with the
> >>> old VMID).
> >>>
> >>> With the smp_wmb() and update of kvm->arch.vmid_gen moved to the end of
> >>> the critical section, I think that works, modulo using READ_ONCE() and
> >>> WRITE_ONCE() to ensure single-copy-atomicity of the fields we access
> >>> locklessly.
> >>
> >> Indeed, you're right.  I would look something like this, then:
> >>
> >> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> >> index 2e43f9d42bd5..6cb08995e7ff 100644
> >> --- a/virt/kvm/arm/arm.c
> >> +++ b/virt/kvm/arm/arm.c
> >> @@ -450,7 +450,9 @@ void force_vm_exit(const cpumask_t *mask)
> >>   */
> >>  static bool need_new_vmid_gen(struct kvm *kvm)
> >>  {
> >> -  return unlikely(kvm->arch.vmid_gen != atomic64_read(_vmid_gen));
> >> +  u64 current_vmid_gen = atomic64_read(_vmid_gen);
> >> +  smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
> >> +  return unlikely(READ_ONCE(kvm->arch.vmid_gen) != current_vmid_gen);
> >>  }
> >>  
> >>  /**
> >> @@ -500,7 +502,6 @@ static void update_vttbr(struct kvm *kvm)
> >>kvm_call_hyp(__kvm_flush_vm_context);
> >>}
> >>  
> >> -  kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
> >>kvm->arch.vmid = kvm_next_vmid;
> >>kvm_next_vmid++;
> >>kvm_next_vmid &= (1 << kvm_vmid_bits) - 1;
> >> @@ -509,7 +510,10 @@ static void update_vttbr(struct kvm *kvm)
> >>pgd_phys = virt_to_phys(kvm->arch.pgd);
> >>BUG_ON(pgd_phys & ~VTTBR_BADDR_MASK);
> >>vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & 
> >> VTTBR_VMID_MASK(kvm_vmid_bits);
> >> -  kvm->arch.vttbr = pgd_phys | vmid;
> >> +  WRITE_ONCE(kvm->arch.vttbr, pgd_phys | vmid);
> >> +
> >> +  smp_wmb(); /* Ensure vttbr update is observed before vmid_gen update */
> >> +  kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
> >>  
> >>spin_unlock(_vmid_lock);
> >>  }
> > 
> > I think that's right, yes.
> > 
> > We could replace the smp_{r,w}mb() barriers with an acquire of the
> > kvm_vmid_gen and a release of kvm->arch.vmid_gen, but if we're really
> > trying to optimize things there are larger algorithmic changes necessary
> > anyhow.
> > 
> >> It's probably easier to convince ourselves about the correctness of
> >> Marc's code using a rwlock instead, though.  Thoughts?
> > 
> > I believe that Marc's preference was the rwlock; I have no preference
> > either way.
> 
> I don't mind either way. If you can be bothered to write a proper commit
> log for this, I'll take it. 

You've already done the work, and your patch is easier to read, so let's
just go ahead with that.

I was just curious to which degree my original implementation was
broken; was I trying to achieve something impossible or was I just
writing buggy code.  Seems the latter.  Oh well.

> What I'd really want is Shannon to indicate
> whether or not this solves the issue he was seeing.
> 

Agreed, would like to see that too.

Thanks (and sorry for being noisy),
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Close VMID generation race

2018-04-10 Thread Marc Zyngier
On 10/04/18 16:24, Mark Rutland wrote:
> On Tue, Apr 10, 2018 at 05:05:40PM +0200, Christoffer Dall wrote:
>> On Tue, Apr 10, 2018 at 11:51:19AM +0100, Mark Rutland wrote:
>>> I think we also need to update kvm->arch.vttbr before updating
>>> kvm->arch.vmid_gen, otherwise another CPU can come in, see that the
>>> vmid_gen is up-to-date, jump to hyp, and program a stale VTTBR (with the
>>> old VMID).
>>>
>>> With the smp_wmb() and update of kvm->arch.vmid_gen moved to the end of
>>> the critical section, I think that works, modulo using READ_ONCE() and
>>> WRITE_ONCE() to ensure single-copy-atomicity of the fields we access
>>> locklessly.
>>
>> Indeed, you're right.  I would look something like this, then:
>>
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 2e43f9d42bd5..6cb08995e7ff 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -450,7 +450,9 @@ void force_vm_exit(const cpumask_t *mask)
>>   */
>>  static bool need_new_vmid_gen(struct kvm *kvm)
>>  {
>> -return unlikely(kvm->arch.vmid_gen != atomic64_read(_vmid_gen));
>> +u64 current_vmid_gen = atomic64_read(_vmid_gen);
>> +smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
>> +return unlikely(READ_ONCE(kvm->arch.vmid_gen) != current_vmid_gen);
>>  }
>>  
>>  /**
>> @@ -500,7 +502,6 @@ static void update_vttbr(struct kvm *kvm)
>>  kvm_call_hyp(__kvm_flush_vm_context);
>>  }
>>  
>> -kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
>>  kvm->arch.vmid = kvm_next_vmid;
>>  kvm_next_vmid++;
>>  kvm_next_vmid &= (1 << kvm_vmid_bits) - 1;
>> @@ -509,7 +510,10 @@ static void update_vttbr(struct kvm *kvm)
>>  pgd_phys = virt_to_phys(kvm->arch.pgd);
>>  BUG_ON(pgd_phys & ~VTTBR_BADDR_MASK);
>>  vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & 
>> VTTBR_VMID_MASK(kvm_vmid_bits);
>> -kvm->arch.vttbr = pgd_phys | vmid;
>> +WRITE_ONCE(kvm->arch.vttbr, pgd_phys | vmid);
>> +
>> +smp_wmb(); /* Ensure vttbr update is observed before vmid_gen update */
>> +kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
>>  
>>  spin_unlock(_vmid_lock);
>>  }
> 
> I think that's right, yes.
> 
> We could replace the smp_{r,w}mb() barriers with an acquire of the
> kvm_vmid_gen and a release of kvm->arch.vmid_gen, but if we're really
> trying to optimize things there are larger algorithmic changes necessary
> anyhow.
> 
>> It's probably easier to convince ourselves about the correctness of
>> Marc's code using a rwlock instead, though.  Thoughts?
> 
> I believe that Marc's preference was the rwlock; I have no preference
> either way.

I don't mind either way. If you can be bothered to write a proper commit
log for this, I'll take it. What I'd really want is Shannon to indicate
whether or not this solves the issue he was seeing.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Close VMID generation race

2018-04-10 Thread Christoffer Dall
On Tue, Apr 10, 2018 at 04:24:20PM +0100, Mark Rutland wrote:
> On Tue, Apr 10, 2018 at 05:05:40PM +0200, Christoffer Dall wrote:
> > On Tue, Apr 10, 2018 at 11:51:19AM +0100, Mark Rutland wrote:
> > > I think we also need to update kvm->arch.vttbr before updating
> > > kvm->arch.vmid_gen, otherwise another CPU can come in, see that the
> > > vmid_gen is up-to-date, jump to hyp, and program a stale VTTBR (with the
> > > old VMID).
> > > 
> > > With the smp_wmb() and update of kvm->arch.vmid_gen moved to the end of
> > > the critical section, I think that works, modulo using READ_ONCE() and
> > > WRITE_ONCE() to ensure single-copy-atomicity of the fields we access
> > > locklessly.
> > 
> > Indeed, you're right.  I would look something like this, then:
> > 
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 2e43f9d42bd5..6cb08995e7ff 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -450,7 +450,9 @@ void force_vm_exit(const cpumask_t *mask)
> >   */
> >  static bool need_new_vmid_gen(struct kvm *kvm)
> >  {
> > -   return unlikely(kvm->arch.vmid_gen != atomic64_read(_vmid_gen));
> > +   u64 current_vmid_gen = atomic64_read(_vmid_gen);
> > +   smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
> > +   return unlikely(READ_ONCE(kvm->arch.vmid_gen) != current_vmid_gen);
> >  }
> >  
> >  /**
> > @@ -500,7 +502,6 @@ static void update_vttbr(struct kvm *kvm)
> > kvm_call_hyp(__kvm_flush_vm_context);
> > }
> >  
> > -   kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
> > kvm->arch.vmid = kvm_next_vmid;
> > kvm_next_vmid++;
> > kvm_next_vmid &= (1 << kvm_vmid_bits) - 1;
> > @@ -509,7 +510,10 @@ static void update_vttbr(struct kvm *kvm)
> > pgd_phys = virt_to_phys(kvm->arch.pgd);
> > BUG_ON(pgd_phys & ~VTTBR_BADDR_MASK);
> > vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & 
> > VTTBR_VMID_MASK(kvm_vmid_bits);
> > -   kvm->arch.vttbr = pgd_phys | vmid;
> > +   WRITE_ONCE(kvm->arch.vttbr, pgd_phys | vmid);
> > +
> > +   smp_wmb(); /* Ensure vttbr update is observed before vmid_gen update */
> > +   kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
> >  
> > spin_unlock(_vmid_lock);
> >  }
> 
> I think that's right, yes.
> 
> We could replace the smp_{r,w}mb() barriers with an acquire of the
> kvm_vmid_gen and a release of kvm->arch.vmid_gen, but if we're really
> trying to optimize things there are larger algorithmic changes necessary
> anyhow.
> 
> > It's probably easier to convince ourselves about the correctness of
> > Marc's code using a rwlock instead, though.  Thoughts?
> 
> I believe that Marc's preference was the rwlock; I have no preference
> either way.
> 

I'm fine with both approaches as well, but it was educational for me to
see if this could be done in the lockless way as well.  Thanks for
having a look at that!

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Close VMID generation race

2018-04-10 Thread Mark Rutland
On Tue, Apr 10, 2018 at 05:05:40PM +0200, Christoffer Dall wrote:
> On Tue, Apr 10, 2018 at 11:51:19AM +0100, Mark Rutland wrote:
> > I think we also need to update kvm->arch.vttbr before updating
> > kvm->arch.vmid_gen, otherwise another CPU can come in, see that the
> > vmid_gen is up-to-date, jump to hyp, and program a stale VTTBR (with the
> > old VMID).
> > 
> > With the smp_wmb() and update of kvm->arch.vmid_gen moved to the end of
> > the critical section, I think that works, modulo using READ_ONCE() and
> > WRITE_ONCE() to ensure single-copy-atomicity of the fields we access
> > locklessly.
> 
> Indeed, you're right.  I would look something like this, then:
> 
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 2e43f9d42bd5..6cb08995e7ff 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -450,7 +450,9 @@ void force_vm_exit(const cpumask_t *mask)
>   */
>  static bool need_new_vmid_gen(struct kvm *kvm)
>  {
> - return unlikely(kvm->arch.vmid_gen != atomic64_read(_vmid_gen));
> + u64 current_vmid_gen = atomic64_read(_vmid_gen);
> + smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
> + return unlikely(READ_ONCE(kvm->arch.vmid_gen) != current_vmid_gen);
>  }
>  
>  /**
> @@ -500,7 +502,6 @@ static void update_vttbr(struct kvm *kvm)
>   kvm_call_hyp(__kvm_flush_vm_context);
>   }
>  
> - kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
>   kvm->arch.vmid = kvm_next_vmid;
>   kvm_next_vmid++;
>   kvm_next_vmid &= (1 << kvm_vmid_bits) - 1;
> @@ -509,7 +510,10 @@ static void update_vttbr(struct kvm *kvm)
>   pgd_phys = virt_to_phys(kvm->arch.pgd);
>   BUG_ON(pgd_phys & ~VTTBR_BADDR_MASK);
>   vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & 
> VTTBR_VMID_MASK(kvm_vmid_bits);
> - kvm->arch.vttbr = pgd_phys | vmid;
> + WRITE_ONCE(kvm->arch.vttbr, pgd_phys | vmid);
> +
> + smp_wmb(); /* Ensure vttbr update is observed before vmid_gen update */
> + kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
>  
>   spin_unlock(_vmid_lock);
>  }

I think that's right, yes.

We could replace the smp_{r,w}mb() barriers with an acquire of the
kvm_vmid_gen and a release of kvm->arch.vmid_gen, but if we're really
trying to optimize things there are larger algorithmic changes necessary
anyhow.

> It's probably easier to convince ourselves about the correctness of
> Marc's code using a rwlock instead, though.  Thoughts?

I believe that Marc's preference was the rwlock; I have no preference
either way.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Close VMID generation race

2018-04-10 Thread Christoffer Dall
On Tue, Apr 10, 2018 at 11:51:19AM +0100, Mark Rutland wrote:
> On Mon, Apr 09, 2018 at 10:51:39PM +0200, Christoffer Dall wrote:
> > On Mon, Apr 09, 2018 at 06:07:06PM +0100, Marc Zyngier wrote:
> > > Before entering the guest, we check whether our VMID is still
> > > part of the current generation. In order to avoid taking a lock,
> > > we start with checking that the generation is still current, and
> > > only if not current do we take the lock, recheck, and update the
> > > generation and VMID.
> > > 
> > > This leaves open a small race: A vcpu can bump up the global
> > > generation number as well as the VM's, but has not updated
> > > the VMID itself yet.
> > > 
> > > At that point another vcpu from the same VM comes in, checks
> > > the generation (and finds it not needing anything), and jumps
> > > into the guest. At this point, we end-up with two vcpus belonging
> > > to the same VM running with two different VMIDs. Eventually, the
> > > VMID used by the second vcpu will get reassigned, and things will
> > > really go wrong...
> > > 
> > > A simple solution would be to drop this initial check, and always take
> > > the lock. This is likely to cause performance issues. A middle ground
> > > is to convert the spinlock to a rwlock, and only take the read lock
> > > on the fast path. If the check fails at that point, drop it and
> > > acquire the write lock, rechecking the condition.
> > > 
> > > This ensures that the above scenario doesn't occur.
> > > 
> > > Reported-by: Mark Rutland 
> > > Signed-off-by: Marc Zyngier 
> > > ---
> > > I haven't seen any reply from Shannon, so reposting this to
> > > a slightly wider audience for feedback.
> > > 
> > >  virt/kvm/arm/arm.c | 15 ++-
> > >  1 file changed, 10 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > > index dba629c5f8ac..a4c1b76240df 100644
> > > --- a/virt/kvm/arm/arm.c
> > > +++ b/virt/kvm/arm/arm.c
> > > @@ -63,7 +63,7 @@ static DEFINE_PER_CPU(struct kvm_vcpu *, 
> > > kvm_arm_running_vcpu);
> > >  static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> > >  static u32 kvm_next_vmid;
> > >  static unsigned int kvm_vmid_bits __read_mostly;
> > > -static DEFINE_SPINLOCK(kvm_vmid_lock);
> > > +static DEFINE_RWLOCK(kvm_vmid_lock);
> > >  
> > >  static bool vgic_present;
> > >  
> > > @@ -473,11 +473,16 @@ static void update_vttbr(struct kvm *kvm)
> > >  {
> > >   phys_addr_t pgd_phys;
> > >   u64 vmid;
> > > + bool new_gen;
> > >  
> > > - if (!need_new_vmid_gen(kvm))
> > > + read_lock(_vmid_lock);
> > > + new_gen = need_new_vmid_gen(kvm);
> > > + read_unlock(_vmid_lock);
> > > +
> > > + if (!new_gen)
> > >   return;
> > >  
> > > - spin_lock(_vmid_lock);
> > > + write_lock(_vmid_lock);
> > >  
> > >   /*
> > >* We need to re-check the vmid_gen here to ensure that if another vcpu
> > > @@ -485,7 +490,7 @@ static void update_vttbr(struct kvm *kvm)
> > >* use the same vmid.
> > >*/
> > >   if (!need_new_vmid_gen(kvm)) {
> > > - spin_unlock(_vmid_lock);
> > > + write_unlock(_vmid_lock);
> > >   return;
> > >   }
> > >  
> > > @@ -519,7 +524,7 @@ static void update_vttbr(struct kvm *kvm)
> > >   vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & 
> > > VTTBR_VMID_MASK(kvm_vmid_bits);
> > >   kvm->arch.vttbr = kvm_phys_to_vttbr(pgd_phys) | vmid;
> > >  
> > > - spin_unlock(_vmid_lock);
> > > + write_unlock(_vmid_lock);
> > >  }
> > >  
> > >  static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
> > > -- 
> > > 2.14.2
> > > 
> > 
> > The above looks correct to me.  I am wondering if something like the
> > following would also work, which may be slightly more efficient,
> > although I doubt the difference can be measured:
> > 

[...]

> 
> I think we also need to update kvm->arch.vttbr before updating
> kvm->arch.vmid_gen, otherwise another CPU can come in, see that the
> vmid_gen is up-to-date, jump to hyp, and program a stale VTTBR (with the
> old VMID).
> 
> With the smp_wmb() and update of kvm->arch.vmid_gen moved to the end of
> the critical section, I think that works, modulo using READ_ONCE() and
> WRITE_ONCE() to ensure single-copy-atomicity of the fields we access
> locklessly.

Indeed, you're right.  I would look something like this, then:

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 2e43f9d42bd5..6cb08995e7ff 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -450,7 +450,9 @@ void force_vm_exit(const cpumask_t *mask)
  */
 static bool need_new_vmid_gen(struct kvm *kvm)
 {
-   return unlikely(kvm->arch.vmid_gen != atomic64_read(_vmid_gen));
+   u64 current_vmid_gen = atomic64_read(_vmid_gen);
+   smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
+   return unlikely(READ_ONCE(kvm->arch.vmid_gen) != current_vmid_gen);
 }
 
 /**
@@ -500,7 +502,6 @@ static void update_vttbr(struct kvm *kvm)

Re: [PATCH] KVM: arm/arm64: Close VMID generation race

2018-04-10 Thread Mark Rutland
On Mon, Apr 09, 2018 at 10:51:39PM +0200, Christoffer Dall wrote:
> On Mon, Apr 09, 2018 at 06:07:06PM +0100, Marc Zyngier wrote:
> > Before entering the guest, we check whether our VMID is still
> > part of the current generation. In order to avoid taking a lock,
> > we start with checking that the generation is still current, and
> > only if not current do we take the lock, recheck, and update the
> > generation and VMID.
> > 
> > This leaves open a small race: A vcpu can bump up the global
> > generation number as well as the VM's, but has not updated
> > the VMID itself yet.
> > 
> > At that point another vcpu from the same VM comes in, checks
> > the generation (and finds it not needing anything), and jumps
> > into the guest. At this point, we end-up with two vcpus belonging
> > to the same VM running with two different VMIDs. Eventually, the
> > VMID used by the second vcpu will get reassigned, and things will
> > really go wrong...
> > 
> > A simple solution would be to drop this initial check, and always take
> > the lock. This is likely to cause performance issues. A middle ground
> > is to convert the spinlock to a rwlock, and only take the read lock
> > on the fast path. If the check fails at that point, drop it and
> > acquire the write lock, rechecking the condition.
> > 
> > This ensures that the above scenario doesn't occur.
> > 
> > Reported-by: Mark Rutland 
> > Signed-off-by: Marc Zyngier 
> > ---
> > I haven't seen any reply from Shannon, so reposting this to
> > a slightly wider audience for feedback.
> > 
> >  virt/kvm/arm/arm.c | 15 ++-
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index dba629c5f8ac..a4c1b76240df 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -63,7 +63,7 @@ static DEFINE_PER_CPU(struct kvm_vcpu *, 
> > kvm_arm_running_vcpu);
> >  static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> >  static u32 kvm_next_vmid;
> >  static unsigned int kvm_vmid_bits __read_mostly;
> > -static DEFINE_SPINLOCK(kvm_vmid_lock);
> > +static DEFINE_RWLOCK(kvm_vmid_lock);
> >  
> >  static bool vgic_present;
> >  
> > @@ -473,11 +473,16 @@ static void update_vttbr(struct kvm *kvm)
> >  {
> > phys_addr_t pgd_phys;
> > u64 vmid;
> > +   bool new_gen;
> >  
> > -   if (!need_new_vmid_gen(kvm))
> > +   read_lock(_vmid_lock);
> > +   new_gen = need_new_vmid_gen(kvm);
> > +   read_unlock(_vmid_lock);
> > +
> > +   if (!new_gen)
> > return;
> >  
> > -   spin_lock(_vmid_lock);
> > +   write_lock(_vmid_lock);
> >  
> > /*
> >  * We need to re-check the vmid_gen here to ensure that if another vcpu
> > @@ -485,7 +490,7 @@ static void update_vttbr(struct kvm *kvm)
> >  * use the same vmid.
> >  */
> > if (!need_new_vmid_gen(kvm)) {
> > -   spin_unlock(_vmid_lock);
> > +   write_unlock(_vmid_lock);
> > return;
> > }
> >  
> > @@ -519,7 +524,7 @@ static void update_vttbr(struct kvm *kvm)
> > vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & 
> > VTTBR_VMID_MASK(kvm_vmid_bits);
> > kvm->arch.vttbr = kvm_phys_to_vttbr(pgd_phys) | vmid;
> >  
> > -   spin_unlock(_vmid_lock);
> > +   write_unlock(_vmid_lock);
> >  }
> >  
> >  static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
> > -- 
> > 2.14.2
> > 
> 
> The above looks correct to me.  I am wondering if something like the
> following would also work, which may be slightly more efficient,
> although I doubt the difference can be measured:
> 
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index dba629c5f8ac..7ac869bcad21 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -458,7 +458,9 @@ void force_vm_exit(const cpumask_t *mask)
>   */
>  static bool need_new_vmid_gen(struct kvm *kvm)
>  {
> - return unlikely(kvm->arch.vmid_gen != atomic64_read(_vmid_gen));
> + u64 current_vmid_gen = atomic64_read(_vmid_gen);
> + smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
> + return unlikely(kvm->arch.vmid_gen != current_vmid_gen);
>  }
>  
>  /**
> @@ -508,10 +510,11 @@ static void update_vttbr(struct kvm *kvm)
>   kvm_call_hyp(__kvm_flush_vm_context);
>   }
>  
> - kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
>   kvm->arch.vmid = kvm_next_vmid;
>   kvm_next_vmid++;
>   kvm_next_vmid &= (1 << kvm_vmid_bits) - 1;
> + smp_wmb();
> + kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
>  
>   /* update vttbr to be used with the new vmid */
>   pgd_phys = virt_to_phys(kvm->arch.pgd);
> 

I think we also need to update kvm->arch.vttbr before updating
kvm->arch.vmid_gen, otherwise another CPU can come in, see that the
vmid_gen is up-to-date, jump to hyp, and program a stale VTTBR (with the
old VMID).

With the smp_wmb() and update of kvm->arch.vmid_gen moved to the end of
the critical section, I think that works, modulo using 

Re: [PATCH] KVM: arm/arm64: Close VMID generation race

2018-04-09 Thread Christoffer Dall
On Mon, Apr 09, 2018 at 06:07:06PM +0100, Marc Zyngier wrote:
> Before entering the guest, we check whether our VMID is still
> part of the current generation. In order to avoid taking a lock,
> we start with checking that the generation is still current, and
> only if not current do we take the lock, recheck, and update the
> generation and VMID.
> 
> This leaves open a small race: A vcpu can bump up the global
> generation number as well as the VM's, but has not updated
> the VMID itself yet.
> 
> At that point another vcpu from the same VM comes in, checks
> the generation (and finds it not needing anything), and jumps
> into the guest. At this point, we end-up with two vcpus belonging
> to the same VM running with two different VMIDs. Eventually, the
> VMID used by the second vcpu will get reassigned, and things will
> really go wrong...
> 
> A simple solution would be to drop this initial check, and always take
> the lock. This is likely to cause performance issues. A middle ground
> is to convert the spinlock to a rwlock, and only take the read lock
> on the fast path. If the check fails at that point, drop it and
> acquire the write lock, rechecking the condition.
> 
> This ensures that the above scenario doesn't occur.
> 
> Reported-by: Mark Rutland 
> Signed-off-by: Marc Zyngier 
> ---
> I haven't seen any reply from Shannon, so reposting this to
> a slightly wider audience for feedback.
> 
>  virt/kvm/arm/arm.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index dba629c5f8ac..a4c1b76240df 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -63,7 +63,7 @@ static DEFINE_PER_CPU(struct kvm_vcpu *, 
> kvm_arm_running_vcpu);
>  static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
>  static u32 kvm_next_vmid;
>  static unsigned int kvm_vmid_bits __read_mostly;
> -static DEFINE_SPINLOCK(kvm_vmid_lock);
> +static DEFINE_RWLOCK(kvm_vmid_lock);
>  
>  static bool vgic_present;
>  
> @@ -473,11 +473,16 @@ static void update_vttbr(struct kvm *kvm)
>  {
>   phys_addr_t pgd_phys;
>   u64 vmid;
> + bool new_gen;
>  
> - if (!need_new_vmid_gen(kvm))
> + read_lock(_vmid_lock);
> + new_gen = need_new_vmid_gen(kvm);
> + read_unlock(_vmid_lock);
> +
> + if (!new_gen)
>   return;
>  
> - spin_lock(_vmid_lock);
> + write_lock(_vmid_lock);
>  
>   /*
>* We need to re-check the vmid_gen here to ensure that if another vcpu
> @@ -485,7 +490,7 @@ static void update_vttbr(struct kvm *kvm)
>* use the same vmid.
>*/
>   if (!need_new_vmid_gen(kvm)) {
> - spin_unlock(_vmid_lock);
> + write_unlock(_vmid_lock);
>   return;
>   }
>  
> @@ -519,7 +524,7 @@ static void update_vttbr(struct kvm *kvm)
>   vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & 
> VTTBR_VMID_MASK(kvm_vmid_bits);
>   kvm->arch.vttbr = kvm_phys_to_vttbr(pgd_phys) | vmid;
>  
> - spin_unlock(_vmid_lock);
> + write_unlock(_vmid_lock);
>  }
>  
>  static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
> -- 
> 2.14.2
> 

The above looks correct to me.  I am wondering if something like the
following would also work, which may be slightly more efficient,
although I doubt the difference can be measured:

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index dba629c5f8ac..7ac869bcad21 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -458,7 +458,9 @@ void force_vm_exit(const cpumask_t *mask)
  */
 static bool need_new_vmid_gen(struct kvm *kvm)
 {
-   return unlikely(kvm->arch.vmid_gen != atomic64_read(_vmid_gen));
+   u64 current_vmid_gen = atomic64_read(_vmid_gen);
+   smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
+   return unlikely(kvm->arch.vmid_gen != current_vmid_gen);
 }
 
 /**
@@ -508,10 +510,11 @@ static void update_vttbr(struct kvm *kvm)
kvm_call_hyp(__kvm_flush_vm_context);
}
 
-   kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
kvm->arch.vmid = kvm_next_vmid;
kvm_next_vmid++;
kvm_next_vmid &= (1 << kvm_vmid_bits) - 1;
+   smp_wmb();
+   kvm->arch.vmid_gen = atomic64_read(_vmid_gen);
 
/* update vttbr to be used with the new vmid */
pgd_phys = virt_to_phys(kvm->arch.pgd);


Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: Close VMID generation race

2018-04-09 Thread Marc Zyngier
Before entering the guest, we check whether our VMID is still
part of the current generation. In order to avoid taking a lock,
we start with checking that the generation is still current, and
only if not current do we take the lock, recheck, and update the
generation and VMID.

This leaves open a small race: A vcpu can bump up the global
generation number as well as the VM's, but has not updated
the VMID itself yet.

At that point another vcpu from the same VM comes in, checks
the generation (and finds it not needing anything), and jumps
into the guest. At this point, we end-up with two vcpus belonging
to the same VM running with two different VMIDs. Eventually, the
VMID used by the second vcpu will get reassigned, and things will
really go wrong...

A simple solution would be to drop this initial check, and always take
the lock. This is likely to cause performance issues. A middle ground
is to convert the spinlock to a rwlock, and only take the read lock
on the fast path. If the check fails at that point, drop it and
acquire the write lock, rechecking the condition.

This ensures that the above scenario doesn't occur.

Reported-by: Mark Rutland 
Signed-off-by: Marc Zyngier 
---
I haven't seen any reply from Shannon, so reposting this to
a slightly wider audience for feedback.

 virt/kvm/arm/arm.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index dba629c5f8ac..a4c1b76240df 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -63,7 +63,7 @@ static DEFINE_PER_CPU(struct kvm_vcpu *, 
kvm_arm_running_vcpu);
 static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
 static u32 kvm_next_vmid;
 static unsigned int kvm_vmid_bits __read_mostly;
-static DEFINE_SPINLOCK(kvm_vmid_lock);
+static DEFINE_RWLOCK(kvm_vmid_lock);
 
 static bool vgic_present;
 
@@ -473,11 +473,16 @@ static void update_vttbr(struct kvm *kvm)
 {
phys_addr_t pgd_phys;
u64 vmid;
+   bool new_gen;
 
-   if (!need_new_vmid_gen(kvm))
+   read_lock(_vmid_lock);
+   new_gen = need_new_vmid_gen(kvm);
+   read_unlock(_vmid_lock);
+
+   if (!new_gen)
return;
 
-   spin_lock(_vmid_lock);
+   write_lock(_vmid_lock);
 
/*
 * We need to re-check the vmid_gen here to ensure that if another vcpu
@@ -485,7 +490,7 @@ static void update_vttbr(struct kvm *kvm)
 * use the same vmid.
 */
if (!need_new_vmid_gen(kvm)) {
-   spin_unlock(_vmid_lock);
+   write_unlock(_vmid_lock);
return;
}
 
@@ -519,7 +524,7 @@ static void update_vttbr(struct kvm *kvm)
vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & 
VTTBR_VMID_MASK(kvm_vmid_bits);
kvm->arch.vttbr = kvm_phys_to_vttbr(pgd_phys) | vmid;
 
-   spin_unlock(_vmid_lock);
+   write_unlock(_vmid_lock);
 }
 
 static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm