Re: [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2019-09-13 Thread Chao Gao
On Fri, Sep 13, 2019 at 10:02:24AM +, Spassov, Stanislav wrote:
>On Thu, Dec 13, 2018 at 07:54, Chao Gao wrote:
>>On Thu, Dec 13, 2018 at 12:54:52AM -0700, Jan Beulich wrote:
>>>>>> On 13.12.18 at 04:46,  wrote:
>>>> On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote:
>>>>>>>> On 12.12.18 at 16:18,  wrote:
>>>>>> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
>>>>>>>>>> On 12.12.18 at 08:06,  wrote:
>>>>>>>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>>>>>>>>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>>>>>>>>>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
>>>>>>>>>>> I find some pass-thru devices don't work any more across guest 
>>>>>>>>>>> reboot.
>>>>>>>>>>> Assigning it to another guest also meets the same issue. And the 
>>>>>>>>>>> only
>>>>>>>>>>> way to make it work again is un-binding and binding it to pciback.
>>>>>>>>>>> Someone reported this issue one year ago [1]. More detail also can 
>>>>>>>>>>> be
>>>>>>>>>>> found in [2].
>>>>>>>>>>>
>>>>>>>>>>> The root-cause is Xen's internal MSI-X state isn't reset properly
>>>>>>>>>>> during reboot or re-assignment. In the above case, Xen set maskall 
>>>>>>>>>>> bit
>>>>>>>>>>> to mask all MSI interrupts after it detected a potential security
>>>>>>>>>>> issue. Even after device reset, Xen didn't reset its internal 
>>>>>>>>>>> maskall
>>>>>>>>>>> bit. As a result, maskall bit would be set again in next write to
>>>>>>>>>>> MSI-X message control register.
>>>>>>>>>>>
>>>>>>>>>>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting 
>>>>>>>>>>> MSI-X
>>>>>>>>>>> internal state of a device, we employ it to fix this issue rather 
>>>>>>>>>>> than
>>>>>>>>>>> introducing another dedicated sub-hypercall.
>>>>>>>>>>>
>>>>>>>>>>> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
>>>>>>>>>>> the device's msix and pirq has been created. This limitation 
>>>>>>>>>>> prevents
>>>>>>>>>>> us calling this function when detaching a device from a guest during
>>>>>>>>>>> guest shutdown. Thus it is called right before calling
>>>>>>>>>>> PHYSDEVOPS_prepare_msix().
>>>>>>>>>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
>>>>>>>>>> () at the end of the hypercall name since it's not a function.
>>>>>>>>>>
>>>>>>>>>> I'm also wondering why the release can't be done when the device is
>>>>>>>>>> detached from the guest (or the guest has been shut down). This makes
>>>>>>>>>> me worry about the raciness of the attach/detach procedure: if 
>>>>>>>>>> there's
>>>>>>>>>> a state where pciback assumes the device has been detached from the
>>>>>>>>>> guest, but there are still pirqs bound, an attempt to attach to
>>>>>>>>>> another guest in such state will fail.
>>>>>>>>>
>>>>>>>>>I wonder whether this additional reset functionality could be done out
>>>>>>>>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>>>>>>>>>and then do the extra things that are not properly done there.
>>>>>>>> 
>>>>>>>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
>>>>>>>> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
>>>>>>>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
>>>>>>>> without error. But ATM, xen expe

Re: [RESEND PATCH V8 05/11] KVM: Create architecture independent ROE skeleton

2019-01-21 Thread Chao Gao
On Mon, Jan 21, 2019 at 01:39:34AM +0200, Ahmed Abd El Mawgood wrote:
>This patch introduces a hypercall that can assist against subset of kernel
>rootkits, it works by place readonly protection in shadow PTE. The end
>result protection is also kept in a bitmap for each kvm_memory_slot and is
>used as reference when updating SPTEs. The whole goal is to protect the
>guest kernel static data from modification if attacker is running from
>guest ring 0, for this reason there is no hypercall to revert effect of
>Memory ROE hypercall. This patch doesn't implement integrity check on guest
>TLB so obvious attack on the current implementation will involve guest
>virtual address -> guest physical address remapping, but there are plans to
>fix that.

Hello Ahmed,

I don't quite understand the attack. Do you mean that even one guest
page is protected by ROE, an attacker can map the virtual address to
another unprotected guest page by editing guest page table?

Thanks
Chao


Re: [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2018-12-13 Thread Chao Gao
On Thu, Dec 13, 2018 at 12:54:52AM -0700, Jan Beulich wrote:
>>>> On 13.12.18 at 04:46,  wrote:
>> On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote:
>>>>>> On 12.12.18 at 16:18,  wrote:
>>>> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
>>>>>>>> On 12.12.18 at 08:06,  wrote:
>>>>>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>>>>>>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>>>>>>>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
>>>>>>>>> I find some pass-thru devices don't work any more across guest reboot.
>>>>>>>>> Assigning it to another guest also meets the same issue. And the only
>>>>>>>>> way to make it work again is un-binding and binding it to pciback.
>>>>>>>>> Someone reported this issue one year ago [1]. More detail also can be
>>>>>>>>> found in [2].
>>>>>>>>>
>>>>>>>>> The root-cause is Xen's internal MSI-X state isn't reset properly
>>>>>>>>> during reboot or re-assignment. In the above case, Xen set maskall bit
>>>>>>>>> to mask all MSI interrupts after it detected a potential security
>>>>>>>>> issue. Even after device reset, Xen didn't reset its internal maskall
>>>>>>>>> bit. As a result, maskall bit would be set again in next write to
>>>>>>>>> MSI-X message control register.
>>>>>>>>>
>>>>>>>>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X
>>>>>>>>> internal state of a device, we employ it to fix this issue rather than
>>>>>>>>> introducing another dedicated sub-hypercall.
>>>>>>>>>
>>>>>>>>> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
>>>>>>>>> the device's msix and pirq has been created. This limitation prevents
>>>>>>>>> us calling this function when detaching a device from a guest during
>>>>>>>>> guest shutdown. Thus it is called right before calling
>>>>>>>>> PHYSDEVOPS_prepare_msix().
>>>>>>>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
>>>>>>>> () at the end of the hypercall name since it's not a function.
>>>>>>>>
>>>>>>>> I'm also wondering why the release can't be done when the device is
>>>>>>>> detached from the guest (or the guest has been shut down). This makes
>>>>>>>> me worry about the raciness of the attach/detach procedure: if there's
>>>>>>>> a state where pciback assumes the device has been detached from the
>>>>>>>> guest, but there are still pirqs bound, an attempt to attach to
>>>>>>>> another guest in such state will fail.
>>>>>>>
>>>>>>>I wonder whether this additional reset functionality could be done out
>>>>>>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>>>>>>>and then do the extra things that are not properly done there.
>>>>>> 
>>>>>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
>>>>>> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
>>>>>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
>>>>>> without error. But ATM, xen expects that no msi is bound to pirq when
>>>>>> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY.
>>>>>> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove().
>>>>>> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen
>>>>>> at last minute, which happens after device reset in 
>>>>>> xen_pcibk_xenbus_remove().
>>>>>
>>>>>But that may need taking care of: I don't think it is a good idea to have
>>>>>anything left from the prior owning domain when the device gets reset.
>>>>>I.e. left over IRQ bindings should perhaps be forcibly cleared before
>>>>>invoking the reset;
>>>> 
>>>> Agree. How about pciback to track the established IRQ bindings? Then
>>>> 

Re: [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2018-12-12 Thread Chao Gao
On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote:
>>>> On 12.12.18 at 16:18,  wrote:
>> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
>>>>>> On 12.12.18 at 08:06,  wrote:
>>>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>>>>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>>>>>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
>>>>>>> I find some pass-thru devices don't work any more across guest reboot.
>>>>>>> Assigning it to another guest also meets the same issue. And the only
>>>>>>> way to make it work again is un-binding and binding it to pciback.
>>>>>>> Someone reported this issue one year ago [1]. More detail also can be
>>>>>>> found in [2].
>>>>>>>
>>>>>>> The root-cause is Xen's internal MSI-X state isn't reset properly
>>>>>>> during reboot or re-assignment. In the above case, Xen set maskall bit
>>>>>>> to mask all MSI interrupts after it detected a potential security
>>>>>>> issue. Even after device reset, Xen didn't reset its internal maskall
>>>>>>> bit. As a result, maskall bit would be set again in next write to
>>>>>>> MSI-X message control register.
>>>>>>>
>>>>>>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X
>>>>>>> internal state of a device, we employ it to fix this issue rather than
>>>>>>> introducing another dedicated sub-hypercall.
>>>>>>>
>>>>>>> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
>>>>>>> the device's msix and pirq has been created. This limitation prevents
>>>>>>> us calling this function when detaching a device from a guest during
>>>>>>> guest shutdown. Thus it is called right before calling
>>>>>>> PHYSDEVOPS_prepare_msix().
>>>>>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
>>>>>> () at the end of the hypercall name since it's not a function.
>>>>>>
>>>>>> I'm also wondering why the release can't be done when the device is
>>>>>> detached from the guest (or the guest has been shut down). This makes
>>>>>> me worry about the raciness of the attach/detach procedure: if there's
>>>>>> a state where pciback assumes the device has been detached from the
>>>>>> guest, but there are still pirqs bound, an attempt to attach to
>>>>>> another guest in such state will fail.
>>>>>
>>>>>I wonder whether this additional reset functionality could be done out
>>>>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>>>>>and then do the extra things that are not properly done there.
>>>> 
>>>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
>>>> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
>>>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
>>>> without error. But ATM, xen expects that no msi is bound to pirq when
>>>> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY.
>>>> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove().
>>>> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen
>>>> at last minute, which happens after device reset in 
>>>> xen_pcibk_xenbus_remove().
>>>
>>>But that may need taking care of: I don't think it is a good idea to have
>>>anything left from the prior owning domain when the device gets reset.
>>>I.e. left over IRQ bindings should perhaps be forcibly cleared before
>>>invoking the reset;
>> 
>> Agree. How about pciback to track the established IRQ bindings? Then
>> pciback can clear irq binding before invoking the reset.
>
>How would pciback even know of those mappings, when it's qemu
>who establishes (and manages) them?

I meant to expose some interfaces from pciback. And pciback serves
as the proxy of IRQ (un)binding APIs.

>
>>>in fact I'd expect this to happen in the course of
>>>domain destruction, and I'd expect the device reset to come after the
>>>domain was cleaned up. Perhaps simply an ordering issue in the tool
>>>stack?
>> 
>> I don't think reversing the sequences of device reset and domain
>> destruction would be simple. Furthermore, during device hot-unplug,
>> device reset is done when the owner is alive. So if we use domain
>> destruction to enforce all irq binding cleared, in theory, it won't be
>> applicable to hot-unplug case (if qemu's hot-unplug logic is
>> compromised).
>
>Even in the hot-unplug case the tool stack could issue unbind
>requests, behind the back of the possibly compromised qemu,
>once neither the guest nor qemu have access to the device
>anymore.

But currently, tool stack doesn't know the remaining IRQ bindings.
If tool stack can maintaine IRQ binding information of a pass-thru
device (stored in Xenstore?), we can come up with a clean solution
without modifying linux kernel and Xen.

Thanks
Chao


Re: [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2018-12-12 Thread Chao Gao
On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
>>>> On 12.12.18 at 08:06,  wrote:
>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>>>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
>>>>> I find some pass-thru devices don't work any more across guest reboot.
>>>>> Assigning it to another guest also meets the same issue. And the only
>>>>> way to make it work again is un-binding and binding it to pciback.
>>>>> Someone reported this issue one year ago [1]. More detail also can be
>>>>> found in [2].
>>>>>
>>>>> The root-cause is Xen's internal MSI-X state isn't reset properly
>>>>> during reboot or re-assignment. In the above case, Xen set maskall bit
>>>>> to mask all MSI interrupts after it detected a potential security
>>>>> issue. Even after device reset, Xen didn't reset its internal maskall
>>>>> bit. As a result, maskall bit would be set again in next write to
>>>>> MSI-X message control register.
>>>>>
>>>>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X
>>>>> internal state of a device, we employ it to fix this issue rather than
>>>>> introducing another dedicated sub-hypercall.
>>>>>
>>>>> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
>>>>> the device's msix and pirq has been created. This limitation prevents
>>>>> us calling this function when detaching a device from a guest during
>>>>> guest shutdown. Thus it is called right before calling
>>>>> PHYSDEVOPS_prepare_msix().
>>>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
>>>> () at the end of the hypercall name since it's not a function.
>>>>
>>>> I'm also wondering why the release can't be done when the device is
>>>> detached from the guest (or the guest has been shut down). This makes
>>>> me worry about the raciness of the attach/detach procedure: if there's
>>>> a state where pciback assumes the device has been detached from the
>>>> guest, but there are still pirqs bound, an attempt to attach to
>>>> another guest in such state will fail.
>>>
>>>I wonder whether this additional reset functionality could be done out
>>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>>>and then do the extra things that are not properly done there.
>> 
>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
>> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
>> without error. But ATM, xen expects that no msi is bound to pirq when
>> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY.
>> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove().
>> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen
>> at last minute, which happens after device reset in 
>> xen_pcibk_xenbus_remove().
>
>But that may need taking care of: I don't think it is a good idea to have
>anything left from the prior owning domain when the device gets reset.
>I.e. left over IRQ bindings should perhaps be forcibly cleared before
>invoking the reset;

Agree. How about pciback to track the established IRQ bindings? Then
pciback can clear irq binding before invoking the reset.

>in fact I'd expect this to happen in the course of
>domain destruction, and I'd expect the device reset to come after the
>domain was cleaned up. Perhaps simply an ordering issue in the tool
>stack?

I don't think reversing the sequences of device reset and domain
destruction would be simple. Furthermore, during device hot-unplug,
device reset is done when the owner is alive. So if we use domain
destruction to enforce all irq binding cleared, in theory, it won't be
applicable to hot-unplug case (if qemu's hot-unplug logic is
compromised).

Thanks
Chao


Re: [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2018-12-11 Thread Chao Gao
On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
>>> I find some pass-thru devices don't work any more across guest reboot.
>>> Assigning it to another guest also meets the same issue. And the only
>>> way to make it work again is un-binding and binding it to pciback.
>>> Someone reported this issue one year ago [1]. More detail also can be
>>> found in [2].
>>>
>>> The root-cause is Xen's internal MSI-X state isn't reset properly
>>> during reboot or re-assignment. In the above case, Xen set maskall bit
>>> to mask all MSI interrupts after it detected a potential security
>>> issue. Even after device reset, Xen didn't reset its internal maskall
>>> bit. As a result, maskall bit would be set again in next write to
>>> MSI-X message control register.
>>>
>>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X
>>> internal state of a device, we employ it to fix this issue rather than
>>> introducing another dedicated sub-hypercall.
>>>
>>> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
>>> the device's msix and pirq has been created. This limitation prevents
>>> us calling this function when detaching a device from a guest during
>>> guest shutdown. Thus it is called right before calling
>>> PHYSDEVOPS_prepare_msix().
>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
>> () at the end of the hypercall name since it's not a function.
>>
>> I'm also wondering why the release can't be done when the device is
>> detached from the guest (or the guest has been shut down). This makes
>> me worry about the raciness of the attach/detach procedure: if there's
>> a state where pciback assumes the device has been detached from the
>> guest, but there are still pirqs bound, an attempt to attach to
>> another guest in such state will fail.
>
>I wonder whether this additional reset functionality could be done out
>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>and then do the extra things that are not properly done there.

No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
without error. But ATM, xen expects that no msi is bound to pirq when
doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY.
However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove().
In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen
at last minute, which happens after device reset in xen_pcibk_xenbus_remove().

Thanks
Chao


[PATCH v3] KVM: VMX: Don't halt vcpu when L1 is injecting events to L2

2018-02-10 Thread Chao Gao
Although L2 is in halt state, it will be in the active state after
VM entry if the VM entry is vectoring according to SDM 26.6.2 Activity
State. Halting the vcpu here means the event won't be injected to L2
and this decision isn't reported to L1. Thus L0 drops an event that
should be injected to L2.

Cc: Liran Alon <liran.a...@oracle.com>
Signed-off-by: Chao Gao <chao@intel.com>
---
changes in v3:
 - s/VECTORING_INFO_VALID_MASK/INTR_INFO_VALID_MASK
 - Changes of subject, comments.

Changes in v2:
 - Remove VID stuff. Only handle event injection in this patch.

Signed-off-by: Chao Gao <chao@intel.com>
---
 arch/x86/kvm/vmx.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bb5b488..cf7c2fc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10985,7 +10985,12 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (ret)
return ret;
 
-   if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
+   /*
+* If we're entering a halted L2 vcpu and the L2 vcpu won't be woken
+* by event injection, halt vcpu.
+*/
+   if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
+   !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK))
return kvm_vcpu_halt(vcpu);
 
vmx->nested.nested_run_pending = 1;
-- 
1.9.1



[PATCH v3] KVM: VMX: Don't halt vcpu when L1 is injecting events to L2

2018-02-10 Thread Chao Gao
Although L2 is in halt state, it will be in the active state after
VM entry if the VM entry is vectoring according to SDM 26.6.2 Activity
State. Halting the vcpu here means the event won't be injected to L2
and this decision isn't reported to L1. Thus L0 drops an event that
should be injected to L2.

Cc: Liran Alon 
Signed-off-by: Chao Gao 
---
changes in v3:
 - s/VECTORING_INFO_VALID_MASK/INTR_INFO_VALID_MASK
 - Changes of subject, comments.

Changes in v2:
 - Remove VID stuff. Only handle event injection in this patch.

Signed-off-by: Chao Gao 
---
 arch/x86/kvm/vmx.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bb5b488..cf7c2fc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10985,7 +10985,12 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (ret)
return ret;
 
-   if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
+   /*
+* If we're entering a halted L2 vcpu and the L2 vcpu won't be woken
+* by event injection, halt vcpu.
+*/
+   if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
+   !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK))
return kvm_vcpu_halt(vcpu);
 
vmx->nested.nested_run_pending = 1;
-- 
1.9.1



[PATCH v2] x86/kvm/vmx: Don't halt vcpu when L1 is injecting events to L2

2018-02-08 Thread Chao Gao
Although L2 is in halt state, it will be in the active state after
VM entry if the VM entry is vectoring according to SDM 26.6.2 Activity
State. Halting the vcpu here means the event won't be injected to L2
and this decision isn't reported to L1. Thus L0 drops an event that
should be injected to L2.

Cc: Liran Alon <liran.a...@oracle.com>
Signed-off-by: Chao Gao <chao@intel.com>
---
Changes in v2:
 - Remove VID stuff. Only handle event injection in this patch.
---
 arch/x86/kvm/vmx.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bb5b488..42f39d9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10985,7 +10985,12 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (ret)
return ret;
 
-   if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
+   /*
+* If we're entering a halted L2 vcpu and the L2 vcpu won't be woken
+* by event injection, halt vcpu for optimization.
+*/
+   if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
+   !(vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK))
return kvm_vcpu_halt(vcpu);
 
vmx->nested.nested_run_pending = 1;
-- 
1.9.1



[PATCH v2] x86/kvm/vmx: Don't halt vcpu when L1 is injecting events to L2

2018-02-08 Thread Chao Gao
Although L2 is in halt state, it will be in the active state after
VM entry if the VM entry is vectoring according to SDM 26.6.2 Activity
State. Halting the vcpu here means the event won't be injected to L2
and this decision isn't reported to L1. Thus L0 drops an event that
should be injected to L2.

Cc: Liran Alon 
Signed-off-by: Chao Gao 
---
Changes in v2:
 - Remove VID stuff. Only handle event injection in this patch.
---
 arch/x86/kvm/vmx.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bb5b488..42f39d9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10985,7 +10985,12 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (ret)
return ret;
 
-   if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
+   /*
+* If we're entering a halted L2 vcpu and the L2 vcpu won't be woken
+* by event injection, halt vcpu for optimization.
+*/
+   if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
+   !(vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK))
return kvm_vcpu_halt(vcpu);
 
vmx->nested.nested_run_pending = 1;
-- 
1.9.1



Re: [PATCH] x86/kvm/vmx: Don't halt vcpu when L1 is injecting events to L2

2018-02-08 Thread Chao Gao
On Thu, Feb 08, 2018 at 04:09:36AM -0800, Liran Alon wrote:
>
>- pbonz...@redhat.com wrote:
>
>> On 08/02/2018 06:13, Chao Gao wrote:
>> > Although L2 is in halt state, it will be in the active state after
>> > VM entry if the VM entry is vectoring. Halting the vcpu here means
>> > the event won't be injected to L2 and this decision isn't reported
>> > to L1. Thus L0 drops an event that should be injected to L2.
>> > 
>> > Because virtual interrupt delivery may wake L2 vcpu, if VID is
>> enabled,
>> > do the same thing -- don't halt L2.
>> 
>> This second part seems wrong to me, or at least overly general. 
>> Perhaps
>> you mean if RVI>0?
>> 
>> Thanks,
>> 
>> Paolo
>
>I would first recommend to split this commit.
>The first commit should handle only the case of vectoring VM entry.
>It should also specify in commit message it is based on Intel SDM 26.6.2 
>Activity State:
>("If the VM entry is vectoring, the logical processor is in the active state 
>after VM entry.")
>That part in code seems correct to me.
>
>The second commit seems wrong to me as-well.
>(I would also mention here it is based on Intel SDM 26.6.5
>Interrupt-Window Exiting and Virtual-Interrupt Delivery:
>"These events wake the logical processor if it just entered the HLT state 
>because of a VM entry")
>
>Paolo, I think that your suggestion is not sufficient as well.
>Consider the case that APIC's TPR blocks interrupt specified in RVI.
>
>I think that we should just remove the check for VID from commit,
>and instead fix another bug which I describe below.
>
>If lapic_in_kernel()==false, then APICv is not active and VID is not exposed 
>to L1
>(In current KVM code. Jim already criticize that this is wrong.).
>
>Otherwise, kvm_vcpu_halt() will change mp_state to KVM_MP_STATE_HALTED.
>Eventually, vcpu_run() will call vcpu_block() which will reach 
>kvm_vcpu_has_events().
>That function is responsible for checking if there is any pending interrupts.
>Including, pending interrupts as a result of VID enabled and RVI>0

The difference between checking VID and RVI here and in vcpu_run() is
"nested_run_pending" is set for the former. Would it cause any problem
if we don't set it?

Thanks
Chao

>(While also taking into account the APIC's TPR).
>The logic that checks for pending interrupts is kvm_cpu_has_interrupt()
>which eventually reach apic_has_interrupt_for_ppr().
>If APICv is enabled, apic_has_interrupt_for_ppr() will call 
>vmx_sync_pir_to_irr()
>which calls vmx_hwapic_irr_update().
>
>However, max_irr returned to apic_has_interrupt_for_ppr() does not consider 
>the interrupt
>pending in RVI. Which I think is the real bug to fix here.
>In the non-nested case, RVI can never be larger than max_irr because that is 
>how L0 KVM manages RVI.
>However, in the nested case, L1 can set RVI in VMCS arbitrary
>(we just copy GUEST_INTR_STATUS from vmcs01 into vmcs02).
>
>A possible patch to fix this is to change vmx_hwapic_irr_update() such that
>if is_guest_mode(vcpu)==true, we should return max(max_irr, rvi) and return
>that value into apic_has_interrupt_for_ppr().
>Need to verify that it doesn't break other flows but I think it makes sense.
>What do you think?
>
>Regards,
>-Liran
>
>> 
>> > Signed-off-by: Chao Gao <chao@intel.com>
>> > ---
>> >  arch/x86/kvm/vmx.c | 10 --
>> >  1 file changed, 8 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> > index bb5b488..e1fe4e4 100644
>> > --- a/arch/x86/kvm/vmx.c
>> > +++ b/arch/x86/kvm/vmx.c
>> > @@ -10985,8 +10985,14 @@ static int nested_vmx_run(struct kvm_vcpu
>> *vcpu, bool launch)
>> >if (ret)
>> >return ret;
>> >  
>> > -  if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
>> > -  return kvm_vcpu_halt(vcpu);
>> > +  if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) {
>> > +  u32 intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
>> > +  u32 exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
>> > +
>> > +  if (!(intr_info & VECTORING_INFO_VALID_MASK) &&
>> > +  !(exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
>> > +  return kvm_vcpu_halt(vcpu);
>> > +  }
>> 
>> >vmx->nested.nested_run_pending = 1;
>> >  
>> >


Re: [PATCH] x86/kvm/vmx: Don't halt vcpu when L1 is injecting events to L2

2018-02-08 Thread Chao Gao
On Thu, Feb 08, 2018 at 04:09:36AM -0800, Liran Alon wrote:
>
>- pbonz...@redhat.com wrote:
>
>> On 08/02/2018 06:13, Chao Gao wrote:
>> > Although L2 is in halt state, it will be in the active state after
>> > VM entry if the VM entry is vectoring. Halting the vcpu here means
>> > the event won't be injected to L2 and this decision isn't reported
>> > to L1. Thus L0 drops an event that should be injected to L2.
>> > 
>> > Because virtual interrupt delivery may wake L2 vcpu, if VID is
>> enabled,
>> > do the same thing -- don't halt L2.
>> 
>> This second part seems wrong to me, or at least overly general. 
>> Perhaps
>> you mean if RVI>0?
>> 
>> Thanks,
>> 
>> Paolo
>
>I would first recommend to split this commit.
>The first commit should handle only the case of vectoring VM entry.
>It should also specify in commit message it is based on Intel SDM 26.6.2 
>Activity State:
>("If the VM entry is vectoring, the logical processor is in the active state 
>after VM entry.")
>That part in code seems correct to me.
>
>The second commit seems wrong to me as-well.
>(I would also mention here it is based on Intel SDM 26.6.5
>Interrupt-Window Exiting and Virtual-Interrupt Delivery:
>"These events wake the logical processor if it just entered the HLT state 
>because of a VM entry")
>
>Paolo, I think that your suggestion is not sufficient as well.
>Consider the case that APIC's TPR blocks interrupt specified in RVI.
>
>I think that we should just remove the check for VID from commit,
>and instead fix another bug which I describe below.
>
>If lapic_in_kernel()==false, then APICv is not active and VID is not exposed 
>to L1
>(In current KVM code. Jim already criticize that this is wrong.).
>
>Otherwise, kvm_vcpu_halt() will change mp_state to KVM_MP_STATE_HALTED.
>Eventually, vcpu_run() will call vcpu_block() which will reach 
>kvm_vcpu_has_events().
>That function is responsible for checking if there is any pending interrupts.
>Including, pending interrupts as a result of VID enabled and RVI>0

The difference between checking VID and RVI here and in vcpu_run() is
"nested_run_pending" is set for the former. Would it cause any problem
if we don't set it?

Thanks
Chao

>(While also taking into account the APIC's TPR).
>The logic that checks for pending interrupts is kvm_cpu_has_interrupt()
>which eventually reach apic_has_interrupt_for_ppr().
>If APICv is enabled, apic_has_interrupt_for_ppr() will call 
>vmx_sync_pir_to_irr()
>which calls vmx_hwapic_irr_update().
>
>However, max_irr returned to apic_has_interrupt_for_ppr() does not consider 
>the interrupt
>pending in RVI. Which I think is the real bug to fix here.
>In the non-nested case, RVI can never be larger than max_irr because that is 
>how L0 KVM manages RVI.
>However, in the nested case, L1 can set RVI in VMCS arbitrary
>(we just copy GUEST_INTR_STATUS from vmcs01 into vmcs02).
>
>A possible patch to fix this is to change vmx_hwapic_irr_update() such that
>if is_guest_mode(vcpu)==true, we should return max(max_irr, rvi) and return
>that value into apic_has_interrupt_for_ppr().
>Need to verify that it doesn't break other flows but I think it makes sense.
>What do you think?
>
>Regards,
>-Liran
>
>> 
>> > Signed-off-by: Chao Gao 
>> > ---
>> >  arch/x86/kvm/vmx.c | 10 --
>> >  1 file changed, 8 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> > index bb5b488..e1fe4e4 100644
>> > --- a/arch/x86/kvm/vmx.c
>> > +++ b/arch/x86/kvm/vmx.c
>> > @@ -10985,8 +10985,14 @@ static int nested_vmx_run(struct kvm_vcpu
>> *vcpu, bool launch)
>> >if (ret)
>> >return ret;
>> >  
>> > -  if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
>> > -  return kvm_vcpu_halt(vcpu);
>> > +  if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) {
>> > +  u32 intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
>> > +  u32 exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
>> > +
>> > +  if (!(intr_info & VECTORING_INFO_VALID_MASK) &&
>> > +  !(exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
>> > +  return kvm_vcpu_halt(vcpu);
>> > +  }
>> 
>> >vmx->nested.nested_run_pending = 1;
>> >  
>> >


Re: [PATCH] x86/kvm/vmx: Don't halt vcpu when L1 is injecting events to L2

2018-02-08 Thread Chao Gao
On Thu, Feb 08, 2018 at 11:29:49AM +0100, Paolo Bonzini wrote:
>On 08/02/2018 06:13, Chao Gao wrote:
>> Although L2 is in halt state, it will be in the active state after
>> VM entry if the VM entry is vectoring. Halting the vcpu here means
>> the event won't be injected to L2 and this decision isn't reported
>> to L1. Thus L0 drops an event that should be injected to L2.
>> 
>> Because virtual interrupt delivery may wake L2 vcpu, if VID is enabled,
>> do the same thing -- don't halt L2.
>
>This second part seems wrong to me, or at least overly general.  Perhaps
>you mean if RVI>0?

Yes. It is a little general.

How about this patch:
-- 8> --
Subject: [PATCH] x86/kvm/vmx: Don't halt vcpu when L1 is injecting events to
 L2

Although L2 is in halt state, it will be in the active state after
VM entry if the VM entry is vectoring. Halting the vcpu here means
the event won't be injected to L2 and this decision isn't reported
to L1. Thus L0 drops an event that should be injected to L2.

Because virtual interrupt delivery may wake L2 vcpu, if VID is enabled
and RVI > 0, do the same thing -- don't halt L2.

Signed-off-by: Chao Gao <chao@intel.com>
---
 arch/x86/kvm/vmx.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bb5b488..fa889c8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10985,7 +10985,13 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (ret)
return ret;

-   if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
+   /*
+* If we're entering a halted L2 vcpu and the L2 vcpu won't be woken
+* by event injection or VID, halt vcpu for optimization.
+*/
+   if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
+   !(vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK) &&
+   !(nested_cpu_has_vid(vmcs12) && (vmcs12->guest_intr_status & 0xff)))
return kvm_vcpu_halt(vcpu);

vmx->nested.nested_run_pending = 1;
--

>
>Thanks,
>
>Paolo
>
>> Signed-off-by: Chao Gao <chao@intel.com>
>> ---
>>  arch/x86/kvm/vmx.c | 10 --
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index bb5b488..e1fe4e4 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -10985,8 +10985,14 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, 
>> bool launch)
>>  if (ret)
>>  return ret;
>>  
>> -if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
>> -return kvm_vcpu_halt(vcpu);
>> +if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) {
>> +u32 intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
>> +u32 exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
>> +
>> +if (!(intr_info & VECTORING_INFO_VALID_MASK) &&
>> +!(exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
>> +return kvm_vcpu_halt(vcpu);
>> +}
>
>>  vmx->nested.nested_run_pending = 1;
>>  
>> 
>


Re: [PATCH] x86/kvm/vmx: Don't halt vcpu when L1 is injecting events to L2

2018-02-08 Thread Chao Gao
On Thu, Feb 08, 2018 at 11:29:49AM +0100, Paolo Bonzini wrote:
>On 08/02/2018 06:13, Chao Gao wrote:
>> Although L2 is in halt state, it will be in the active state after
>> VM entry if the VM entry is vectoring. Halting the vcpu here means
>> the event won't be injected to L2 and this decision isn't reported
>> to L1. Thus L0 drops an event that should be injected to L2.
>> 
>> Because virtual interrupt delivery may wake L2 vcpu, if VID is enabled,
>> do the same thing -- don't halt L2.
>
>This second part seems wrong to me, or at least overly general.  Perhaps
>you mean if RVI>0?

Yes. It is a little general.

How about this patch:
-- 8> --
Subject: [PATCH] x86/kvm/vmx: Don't halt vcpu when L1 is injecting events to
 L2

Although L2 is in halt state, it will be in the active state after
VM entry if the VM entry is vectoring. Halting the vcpu here means
the event won't be injected to L2 and this decision isn't reported
to L1. Thus L0 drops an event that should be injected to L2.

Because virtual interrupt delivery may wake L2 vcpu, if VID is enabled
and RVI > 0, do the same thing -- don't halt L2.

Signed-off-by: Chao Gao 
---
 arch/x86/kvm/vmx.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bb5b488..fa889c8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10985,7 +10985,13 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (ret)
return ret;

-   if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
+   /*
+* If we're entering a halted L2 vcpu and the L2 vcpu won't be woken
+* by event injection or VID, halt vcpu for optimization.
+*/
+   if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
+   !(vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK) &&
+   !(nested_cpu_has_vid(vmcs12) && (vmcs12->guest_intr_status & 0xff)))
return kvm_vcpu_halt(vcpu);

vmx->nested.nested_run_pending = 1;
--

>
>Thanks,
>
>Paolo
>
>> Signed-off-by: Chao Gao 
>> ---
>>  arch/x86/kvm/vmx.c | 10 --
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index bb5b488..e1fe4e4 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -10985,8 +10985,14 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, 
>> bool launch)
>>  if (ret)
>>  return ret;
>>  
>> -if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
>> -return kvm_vcpu_halt(vcpu);
>> +if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) {
>> +u32 intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
>> +u32 exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
>> +
>> +if (!(intr_info & VECTORING_INFO_VALID_MASK) &&
>> +!(exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
>> +return kvm_vcpu_halt(vcpu);
>> +}
>
>>  vmx->nested.nested_run_pending = 1;
>>  
>> 
>


[PATCH] x86/kvm/vmx: Don't halt vcpu when L1 is injecting events to L2

2018-02-07 Thread Chao Gao
Although L2 is in halt state, it will be in the active state after
VM entry if the VM entry is vectoring. Halting the vcpu here means
the event won't be injected to L2 and this decision isn't reported
to L1. Thus L0 drops an event that should be injected to L2.

Because virtual interrupt delivery may wake L2 vcpu, if VID is enabled,
do the same thing -- don't halt L2.

Signed-off-by: Chao Gao <chao@intel.com>
---
 arch/x86/kvm/vmx.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bb5b488..e1fe4e4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10985,8 +10985,14 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (ret)
return ret;
 
-   if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
-   return kvm_vcpu_halt(vcpu);
+   if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) {
+   u32 intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
+   u32 exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+
+   if (!(intr_info & VECTORING_INFO_VALID_MASK) &&
+   !(exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
+   return kvm_vcpu_halt(vcpu);
+   }
 
vmx->nested.nested_run_pending = 1;
 
-- 
1.9.1



[PATCH] x86/kvm/vmx: Don't halt vcpu when L1 is injecting events to L2

2018-02-07 Thread Chao Gao
Although L2 is in halt state, it will be in the active state after
VM entry if the VM entry is vectoring. Halting the vcpu here means
the event won't be injected to L2 and this decision isn't reported
to L1. Thus L0 drops an event that should be injected to L2.

Because virtual interrupt delivery may wake L2 vcpu, if VID is enabled,
do the same thing -- don't halt L2.

Signed-off-by: Chao Gao 
---
 arch/x86/kvm/vmx.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bb5b488..e1fe4e4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10985,8 +10985,14 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (ret)
return ret;
 
-   if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
-   return kvm_vcpu_halt(vcpu);
+   if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) {
+   u32 intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
+   u32 exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+
+   if (!(intr_info & VECTORING_INFO_VALID_MASK) &&
+   !(exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
+   return kvm_vcpu_halt(vcpu);
+   }
 
vmx->nested.nested_run_pending = 1;
 
-- 
1.9.1



[PATCH] sched: correct a typo

2016-10-20 Thread Chao Gao
correct a typo.

Signed-off-by: Chao Gao <chao@intel.com>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 94732d1..02f8f9a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6976,7 +6976,7 @@ struct sched_domain *build_sched_domain(struct 
sched_domain_topology_level *tl,
 
if (!cpumask_subset(sched_domain_span(child),
sched_domain_span(sd))) {
-   pr_err("BUG: arch topology borken\n");
+   pr_err("BUG: arch topology broken\n");
 #ifdef CONFIG_SCHED_DEBUG
pr_err(" the %s domain not a subset of the %s 
domain\n",
child->name, sd->name);
-- 
1.8.3.1



[PATCH] sched: correct a typo

2016-10-20 Thread Chao Gao
correct a typo.

Signed-off-by: Chao Gao 
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 94732d1..02f8f9a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6976,7 +6976,7 @@ struct sched_domain *build_sched_domain(struct 
sched_domain_topology_level *tl,
 
if (!cpumask_subset(sched_domain_span(child),
sched_domain_span(sd))) {
-   pr_err("BUG: arch topology borken\n");
+   pr_err("BUG: arch topology broken\n");
 #ifdef CONFIG_SCHED_DEBUG
pr_err(" the %s domain not a subset of the %s 
domain\n",
child->name, sd->name);
-- 
1.8.3.1



Re: [PATCH] x86/efi: Add necessary checks before iterating over efi.memmap

2016-09-19 Thread Chao Gao
Sorry for bothering you. There is a regression since commit 78ce248f that if
booting xen in UEFI mode, dom0 will crash and xen reboot constantly.
This patch tries to fix it. Please take a look at it.

On Tue, Sep 13, 2016 at 11:28:15AM +0800, Chao Gao wrote:
>Commit 78ce248f (efi: Iterate over efi.memmap in for_each_efi_memory_desc())
>replaces the old loop by for_each_efi_memory_desc() which will encounter #PF
>when efi.memap are not initialized.
>
>In boot process, xen set EFI_PARAVIRT in xen_efi_init() before setup_arch()
>is called. This leads efi_memmap_init() will not initialize structures
>related to efi.memmap. However, the following functions e.g.
>efi_find_mirror(), efi_print_memmap() and efi_free_boot_services() access
>efi.memmap without necessary checks. kernel and xen crash in this case.
>After adding these checks, xen and kernel boot up normally.
>
>Signed-off-by: Chao Gao <chao@intel.com>
>---
> arch/x86/platform/efi/efi.c| 6 ++
> arch/x86/platform/efi/quirks.c | 3 +++
> 2 files changed, 9 insertions(+)
>
>diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
>index 1fbb408..68966dc 100644
>--- a/arch/x86/platform/efi/efi.c
>+++ b/arch/x86/platform/efi/efi.c
>@@ -102,6 +102,9 @@ void __init efi_find_mirror(void)
>   efi_memory_desc_t *md;
>   u64 mirror_size = 0, total_size = 0;
> 
>+  if (efi_enabled(EFI_PARAVIRT))
>+  return;
>+
>   for_each_efi_memory_desc(md) {
>   unsigned long long start = md->phys_addr;
>   unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
>@@ -207,6 +210,9 @@ void __init efi_print_memmap(void)
>   efi_memory_desc_t *md;
>   int i = 0;
> 
>+  if (efi_enabled(EFI_PARAVIRT))
>+  return;
>+
>   for_each_efi_memory_desc(md) {
>   char buf[64];
> 
>diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
>index 89d1146..4fa1e4d 100644
>--- a/arch/x86/platform/efi/quirks.c
>+++ b/arch/x86/platform/efi/quirks.c
>@@ -251,6 +251,9 @@ void __init efi_free_boot_services(void)
> {
>   efi_memory_desc_t *md;
> 
>+  if (efi_enabled(EFI_PARAVIRT))
>+  return;
>+
>   for_each_efi_memory_desc(md) {
>   unsigned long long start = md->phys_addr;
>   unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
>-- 
>1.8.3.1
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-efi" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86/efi: Add necessary checks before iterating over efi.memmap

2016-09-19 Thread Chao Gao
Sorry for bothering you. There is a regression since commit 78ce248f that if
booting xen in UEFI mode, dom0 will crash and xen reboot constantly.
This patch tries to fix it. Please take a look at it.

On Tue, Sep 13, 2016 at 11:28:15AM +0800, Chao Gao wrote:
>Commit 78ce248f (efi: Iterate over efi.memmap in for_each_efi_memory_desc())
>replaces the old loop by for_each_efi_memory_desc() which will encounter #PF
>when efi.memap are not initialized.
>
>In boot process, xen set EFI_PARAVIRT in xen_efi_init() before setup_arch()
>is called. This leads efi_memmap_init() will not initialize structures
>related to efi.memmap. However, the following functions e.g.
>efi_find_mirror(), efi_print_memmap() and efi_free_boot_services() access
>efi.memmap without necessary checks. kernel and xen crash in this case.
>After adding these checks, xen and kernel boot up normally.
>
>Signed-off-by: Chao Gao 
>---
> arch/x86/platform/efi/efi.c| 6 ++
> arch/x86/platform/efi/quirks.c | 3 +++
> 2 files changed, 9 insertions(+)
>
>diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
>index 1fbb408..68966dc 100644
>--- a/arch/x86/platform/efi/efi.c
>+++ b/arch/x86/platform/efi/efi.c
>@@ -102,6 +102,9 @@ void __init efi_find_mirror(void)
>   efi_memory_desc_t *md;
>   u64 mirror_size = 0, total_size = 0;
> 
>+  if (efi_enabled(EFI_PARAVIRT))
>+  return;
>+
>   for_each_efi_memory_desc(md) {
>   unsigned long long start = md->phys_addr;
>   unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
>@@ -207,6 +210,9 @@ void __init efi_print_memmap(void)
>   efi_memory_desc_t *md;
>   int i = 0;
> 
>+  if (efi_enabled(EFI_PARAVIRT))
>+  return;
>+
>   for_each_efi_memory_desc(md) {
>   char buf[64];
> 
>diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
>index 89d1146..4fa1e4d 100644
>--- a/arch/x86/platform/efi/quirks.c
>+++ b/arch/x86/platform/efi/quirks.c
>@@ -251,6 +251,9 @@ void __init efi_free_boot_services(void)
> {
>   efi_memory_desc_t *md;
> 
>+  if (efi_enabled(EFI_PARAVIRT))
>+  return;
>+
>   for_each_efi_memory_desc(md) {
>   unsigned long long start = md->phys_addr;
>   unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
>-- 
>1.8.3.1
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-efi" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html