pt(svm, GP_VECTOR);
As mentioned in the review for the other patch I would add a flag that
would enable the workaround for the errata, and I would force it disabled
if X86_FEATURE_SVME_ADDR_CHK is set in CPUID somewhere early in the
kvm initialization.
And finally that new flag can be used here to enable the #GP interception
in the above code.
>
> return 0;
Best regards,
Maxim Levitsky
return ctxt->modrm;
> +}
> +
> static bool is_vmware_backdoor_opcode(struct x86_emulate_ctxt *ctxt)
> {
> switch (ctxt->opcode_len) {
> @@ -7305,6 +7327,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> gpa_t cr2_or_gpa,
> struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
> bool writeback = true;
> bool write_fault_to_spt;
> + int vminstr;
>
> if (unlikely(!kvm_x86_ops.can_emulate_instruction(vcpu, insn,
> insn_len)))
> return 1;
> @@ -7367,10 +7390,14 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> gpa_t cr2_or_gpa,
> }
> }
>
> - if ((emulation_type & EMULTYPE_VMWARE_GP) &&
> - !is_vmware_backdoor_opcode(ctxt)) {
> - kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> - return 1;
> + if (emulation_type & EMULTYPE_PARAVIRT_GP) {
> + vminstr = is_vm_instr_opcode(ctxt);
As I said above, I would add some flag if workaround for the errata is used,
and use it here.
> + if (!vminstr && !is_vmware_backdoor_opcode(ctxt)) {
> + kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> + return 1;
> + }
> + if (vminstr)
> + return vminstr;
> }
>
> /*
Best regards,
Maxim Levitsky
nit test pass,
even if the test was run in a VM (with an unpatched kernel).
This together with setting that X86_FEATURE_SVME_ADDR_CHK bit for
the guest will allow us to hide that errata completely from the guest
which is a very good thing.
(for example for guests that we can't modify)
Best regard
RAM region moved on
its own (likely due to the fact that I moved some pcie cards around recently).
Best regards,
Maxim Levitsky
/svm.c
> > @@ -311,7 +311,7 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
> > svm->vmcb->save.efer = efer | EFER_SVME;
> > vmcb_mark_dirty(svm->vmcb, VMCB_CR);
> > /* Enable GP interception for SVM instructions if needed */
> > - if (efer &
t;
> > > extern bool itlb_multihit_kvm_mitigation;
> > > @@ -5675,6 +5676,12 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
> > > }
> > > EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
> > >
> > > +bool kvm_is_host_reserved_region(u64 gpa)
> > > +{
> > > +return e820__mbapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED);
> > > +}
> >
> > While _e820__mapped_any()'s doc says '.. checks if any part of the
> > range is mapped ..' it seems to me that the real check is
> > [start, end) so we should use 'gpa' instead of 'gpa-1', no?
>
> Why do you need to check GPA at all?
>
To reduce the scope of the workaround.
The errata only happens when you use one of SVM instructions
in the guest with EAX that happens to be inside one
of the host reserved memory regions (for example SMM).
So it is not expected for an SVM instruction with EAX that is a valid host
physical address to get a #GP due to this errata.
Best regards,
Maxim Levitsky
>
rom a cursory look this look all right, and I will review
and test this either today or tomorrow.
Thank you very much for doing the right fix for this bug.
Best regards,
Maxim Levitsky
> ---
> arch/x86/include/asm/kvm_host.h | 8 +-
> arch/x86/kvm/mmu.h | 1 +
> arc
s a separate vmcs
for guest which has its own msr bitmap so in theory this shouldn't be needed,
but it won't hurt.
I'll test indeed if canceling the KVM_REQ_GET_NESTED_STATE_PAGES on VMX
makes any difference on VMX in regard to nested migration crashes I am seeing.
Best regards,
Maxim Levitsk
On Thu, 2021-01-07 at 14:40 +0200, Maxim Levitsky wrote:
> Align start and end on page boundaries before calling
> invalidate_inode_pages2_range.
>
> This might allow us to miss a collision if the write and the discard were done
> to the same page and do overlap but it is st
Align start and end on page boundaries before calling
invalidate_inode_pages2_range.
This might allow us to miss a collision if the write and the discard were done
to the same page and do overlap but it is still better than returning -EBUSY
if those writes didn't overlap.
Signed-off-by: Maxim
On Thu, 2021-01-07 at 04:38 +0200, Maxim Levitsky wrote:
> On Wed, 2021-01-06 at 10:17 -0800, Sean Christopherson wrote:
> > On Wed, Jan 06, 2021, Maxim Levitsky wrote:
> > > If migration happens while L2 entry with an injected event to L2 is
> > > pending,
> >
This should prevent bad things from happening if the user calls the
KVM_SET_NESTED_STATE twice.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/svm/nested.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index cc3130ab612e5
We overwrite most of vmcb fields while doing so, so we must
mark it as dirty.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/svm/nested.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index e91d40c8d8c91..c340fbad88566 100644
cases in the future.
Fixes: a7d5c7ce41ac1 ("KVM: nSVM: delay MSR permission processing to first
nested VM run")
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/svm/nested.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nest
The code to store it on the migration exists, but no code was restoring it.
One of the side effects of fixing this is that L1->L2 injected events
are no longer lost when migration happens with nested run pending.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/svm/nested.c | 4
1 f
iew feedback on V1 of this
series, which is fully incorporated in this series.
Best regards,
Maxim Levitsky
Maxim Levitsky (4):
KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit
KVM: nSVM: correctly restore nested_run_pending on migration
KVM: nSVM: always leave the nes
On Wed, 2021-01-06 at 10:17 -0800, Sean Christopherson wrote:
> On Wed, Jan 06, 2021, Maxim Levitsky wrote:
> > If migration happens while L2 entry with an injected event to L2 is pending,
> > we weren't including the event in the migration state and it would be
> > lo
On Wed, 2021-01-06 at 09:39 -0800, Sean Christopherson wrote:
> On Wed, Jan 06, 2021, Maxim Levitsky wrote:
> > This should prevent bad things from happening if the user calls the
> > KVM_SET_NESTED_STATE twice.
>
> This doesn't exactly inspire confidence, nor does
On Wed, 2021-01-06 at 09:27 -0800, Sean Christopherson wrote:
> On Wed, Jan 06, 2021, Maxim Levitsky wrote:
> > The code to store it on the migration exists, but no code was restoring it.
> >
> > Signed-off-by: Maxim Levitsky
> > ---
> > arch/x86/kvm/svm/nested.
by running an IO intense task in L2,
and repeatedly migrating the L1.
Suggested-by: Paolo Bonzini
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/vmx/nested.c | 12 ++--
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index
vm has no notion of them, this is TBD to
be fixed.
This was lightly tested on my nested migration test which no VMX sadly still
crashes and burns on an (likely) unrelated issue.
Best regards,
Maxim Levitsky
Maxim Levitsky (2):
KVM: VMX: create vmx_process_injected_event
KVM: nVM
Refactor the logic that is dealing with parsing of an injected event to a
separate function.
This will be used in the next patch to deal with the events that L1 wants to
inject to L2 in a way that survives migration.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/vmx/vmx.c | 60
cases in the future.
CC: sta...@vger.kernel.org
Fixes: a7d5c7ce41ac1 ("KVM: nSVM: delay MSR permission processing to first
nested VM run")
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/svm/nested.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c
The code to store it on the migration exists, but no code was restoring it.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/svm/nested.c | 4
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 6208d3a5a3fdb..c1a3d0e996add 100644
by running an IO intense task in L2,
and repeatedly migrating the L1.
Suggested-by: Paolo Bonzini
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/svm/nested.c | 7 +--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index
This should prevent bad things from happening if the user calls the
KVM_SET_NESTED_STATE twice.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/svm/nested.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index
We overwrite most of vmcb fields while doing so, so we must
mark it as dirty.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/svm/nested.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 3aa18016832d0..de3dbb5407206 100644
Refactor the logic that is dealing with parsing of an injected event to a
separate function.
This will be used in the next patch to deal with the events that L1 wants to
inject to L2 in a way that survives migration.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/svm/svm.c | 58
can cause a nested vmexit prior to that.
Patches 4,5,6 are few things I found while reviewing the nested migration code.
I don't have a reproducer for them.
Best regards,
Maxim Levitsky
Maxim Levitsky (6):
KVM: SVM: create svm_process_injected_event
KVM: nSVM: fix for disappearing L1
On Thu, 2020-12-10 at 12:48 +0100, Paolo Bonzini wrote:
> On 08/12/20 18:08, Maxim Levitsky wrote:
> > > Even if you support TSCADJUST and let the guest write to it does not
> > > change the per guest offset at all. TSCADJUST is per [v]CPU and adds on
> > >
are tsc is used, and also implement whatever logic is
needed to jump the guest clock forward when this bit is set.
What do you think?
Best regards,
Maxim Levitsky
>
> Paolo
>
On Tue, 2020-12-08 at 09:58 -0600, Oliver Upton wrote:
> +cc Sean's new handle
>
> On Tue, Dec 8, 2020 at 9:57 AM Oliver Upton wrote:
> > On Tue, Dec 8, 2020 at 5:13 AM Maxim Levitsky wrote:
> > > On Mon, 2020-12-07 at 11:29 -0600, Oliver Upton wrote:
> > >
On Tue, 2020-12-08 at 17:40 +0100, Thomas Gleixner wrote:
> On Tue, Dec 08 2020 at 13:13, Maxim Levitsky wrote:
> > On Mon, 2020-12-07 at 11:29 -0600, Oliver Upton wrote:
> > > How would a VMM maintain the phase relationship between guest TSCs
> > > using thes
On Tue, 2020-12-08 at 17:02 +0100, Thomas Gleixner wrote:
> On Tue, Dec 08 2020 at 16:50, Maxim Levitsky wrote:
> > On Mon, 2020-12-07 at 20:29 -0300, Marcelo Tosatti wrote:
> > > > +This ioctl allows to reconstruct the guest's IA32_TSC and TSC_ADJUST
> > > &
On Mon, 2020-12-07 at 20:29 -0300, Marcelo Tosatti wrote:
> On Thu, Dec 03, 2020 at 07:11:16PM +0200, Maxim Levitsky wrote:
> > These two new ioctls allow to more precisly capture and
> > restore guest's TSC state.
> >
> > Both ioctls are meant to be used to accurately
On Mon, 2020-12-07 at 10:04 -0800, Andy Lutomirski wrote:
> > On Dec 7, 2020, at 9:00 AM, Maxim Levitsky wrote:
> >
> > On Mon, 2020-12-07 at 08:53 -0800, Andy Lutomirski wrote:
> > > > > On Dec 7, 2020, at 8:38 AM, Thomas Gleixner
> > > > > wr
On Mon, 2020-12-07 at 11:29 -0600, Oliver Upton wrote:
> On Thu, Dec 3, 2020 at 11:12 AM Maxim Levitsky wrote:
> > These two new ioctls allow to more precisly capture and
> > restore guest's TSC state.
> >
> > Both ioctls are meant to be used to accurately migr
On Mon, 2020-12-07 at 08:53 -0800, Andy Lutomirski wrote:
> > On Dec 7, 2020, at 8:38 AM, Thomas Gleixner wrote:
> >
> > On Mon, Dec 07 2020 at 14:16, Maxim Levitsky wrote:
> > > > On Sun, 2020-12-06 at 17:19 +0100, Thomas Gleixner wrote:
> > >
On Thu, 2020-12-03 at 17:18 -0300, Marcelo Tosatti wrote:
> On Thu, Dec 03, 2020 at 01:39:42PM +0200, Maxim Levitsky wrote:
> > On Tue, 2020-12-01 at 16:48 -0300, Marcelo Tosatti wrote:
> > > On Tue, Dec 01, 2020 at 02:30:39PM +0200, Maxim Levitsky wrote:
> > > > On
On Sun, 2020-12-06 at 17:19 +0100, Thomas Gleixner wrote:
> On Thu, Dec 03 2020 at 19:11, Maxim Levitsky wrote:
> > + case KVM_SET_TSC_STATE: {
> > + struct kvm_tsc_state __user *user_tsc_state = argp;
> > + struct kvm_tsc_state tsc_state;
> &g
These two new ioctls allow to more precisly capture and
restore guest's TSC state.
Both ioctls are meant to be used to accurately migrate guest TSC
even when there is a significant downtime during the migration.
Suggested-by: Paolo Bonzini
Signed-off-by: Maxim Levitsky
---
Documentation/virt
Run the test once with quirk enabled and once disabled,
and adjust the expected values accordingly.
Signed-off-by: Maxim Levitsky
---
.../selftests/kvm/x86_64/tsc_msrs_test.c | 79 ---
1 file changed, 69 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests
o 0
- updated tsc_msr_test unit test to cover this feature
- refactoring
Patches to enable this feature in qemu are in the process of
being sent to qemu-devel mailing list.
Best regards,
Maxim Levitsky
Maxim Levitsky (3):
KVM: x86: implement KVM_{GET|SET}_TSC_STATE
KVM: x86: introd
-off-by: Maxim Levitsky
---
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/x86.c | 19 ++-
2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 8e76d3701db3f..2a60fc6674164 100644
ccept_events vs check_nested_events")
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/lapic.c | 15 ---
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e3ee597ff5404..6a87623aa578e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/
,
Maxim Levitsky
Maxim Levitsky (1):
KVM: x86: ignore SIPIs that are received while not in wait-for-sipi
state
arch/x86/kvm/lapic.c | 15 ---
1 file changed, 8 insertions(+), 7 deletions(-)
--
2.26.2
On Tue, 2020-12-01 at 20:35 +0100, Thomas Gleixner wrote:
> On Mon, Nov 30 2020 at 15:35, Maxim Levitsky wrote:
> > The idea of masterclock is that when the host TSC is synchronized
> > (or as kernel call it, stable), and the guest TSC is synchronized as well,
> > then we
>
> > If the host TSC is not synchronized, then don't even try.
>
> This reminds me: if you’re adding a new kvm feature that tells the guest that
> the TSC works well, could you perhaps only have one structure for all vCPUs
> in the same guest?
I won't mind doing this, b
bit,
or always when KVM
is detected,
(or even when *any* hypervisor is detected)
I also don't mind if we only disable tsc sync logic or
set X86_FEATURE_TSC_RELIABLE which will disable it
and the clocksource watchdog.
Best regards,
Maxim Levitsky
On Tue, 2020-12-01 at 16:48 -0300, Marcelo Tosatti wrote:
> On Tue, Dec 01, 2020 at 02:30:39PM +0200, Maxim Levitsky wrote:
> > On Mon, 2020-11-30 at 16:16 -0300, Marcelo Tosatti wrote:
> > > Hi Maxim,
> > >
> > > On Mon, Nov 30, 2020 at 03:35:57PM +02
On Tue, 2020-12-01 at 20:43 +0100, Thomas Gleixner wrote:
> On Mon, Nov 30 2020 at 15:35, Maxim Levitsky wrote:
> > + struct kvm_tsc_info {
> > + __u32 flags;
> > + __u64 nsec;
> > + __u64 tsc;
> > + __u64 tsc_adjust;
> > + };
> >
On Mon, 2020-11-30 at 16:16 -0300, Marcelo Tosatti wrote:
> Hi Maxim,
>
> On Mon, Nov 30, 2020 at 03:35:57PM +0200, Maxim Levitsky wrote:
> > Hi!
> >
> > This is the first version of the work to make TSC migration more accurate,
> > as was defined by Paulo at:
On Mon, 2020-11-30 at 15:33 +0100, Paolo Bonzini wrote:
> On 30/11/20 14:35, Maxim Levitsky wrote:
> > + if (guest_cpuid_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
> > + tsc_state.tsc_adjust = vcpu->arch.ia32_tsc_adjust_msr;
> > +
On Mon, 2020-11-30 at 15:15 +0100, Paolo Bonzini wrote:
> On 30/11/20 15:11, Maxim Levitsky wrote:
> > On Mon, 2020-11-30 at 14:54 +0100, Paolo Bonzini wrote:
> > > On 30/11/20 14:35, Maxim Levitsky wrote:
> > > > This quirk reflects the fact that w
On Mon, 2020-11-30 at 14:54 +0100, Paolo Bonzini wrote:
> On 30/11/20 14:35, Maxim Levitsky wrote:
> > This quirk reflects the fact that we currently treat MSR_IA32_TSC
> > and MSR_TSC_ADJUST access by the host (e.g qemu) in a way that is different
> > compared to an
This is the summary of few things that I think are relevant.
Best regards,
Maxim Levitsky
# Random unsynchronized ramblings about the TSC in KVM/Linux
## The KVM's master clock
Under assumption that
a. Host TSC is synchronized and stable (wasn't marked as unstable).
b. Guest TSC
-off-by: Maxim Levitsky
---
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/x86.c | 19 ++-
2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 8e76d3701db3f..2a60fc6674164 100644
this feature in qemu are in process of being sent to
qemu-devel mailing list.
Best regards,
Maxim Levitsky
Maxim Levitsky (2):
KVM: x86: implement KVM_SET_TSC_PRECISE/KVM_GET_TSC_PRECISE
KVM: x86: introduce KVM_X86_QUIRK_TSC_HOST_ACCESS
Documentation/virt/kvm/api.rst | 56
These two new ioctls allow to more precisly capture and
restore guest's TSC state.
Both ioctls are meant to be used to accurately migrate guest TSC
even when there is a significant downtime during the migration.
Suggested-by: Paolo Bonzini
Signed-off-by: Maxim Levitsky
---
Documentation/virt
s like one of my local commits slipped though.
Next time I'll check this more carefully.
Can this be fixed or is it too late?
Best regards,
Maxim Levitsky
VM_MSR_RET_INVALID error code,
> > and by adding a new KVM_MSR_RET_FILTERED error code for the
> > userspace filtered msrs.
> >
> > Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr
> > emulation to userspace")
> > Reported-by: Qian Cai
> > Signe
eport negative values from wrmsr emulation to
userspace")
Reported-by: Qian Cai
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/x86.c | 29 +++--
arch/x86/kvm/x86.h | 8 +++-
2 files changed, 22 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86
On Tue, 2020-10-27 at 16:31 -0400, Qian Cai wrote:
> On Mon, 2020-10-26 at 15:40 -0400, Qian Cai wrote:
> > On Wed, 2020-09-23 at 00:10 +0300, Maxim Levitsky wrote:
> > > This will allow the KVM to report such errors (e.g -ENOMEM)
> > > to the userspace.
> > >
On Thu, 2020-10-08 at 09:52 -0400, Cathy Avery wrote:
> On 10/8/20 9:11 AM, Maxim Levitsky wrote:
> > On Thu, 2020-10-08 at 08:46 -0400, Cathy Avery wrote:
> > > On 10/8/20 6:54 AM, Maxim Levitsky wrote:
> > > > On Thu, 2020-10-08 at 13:39 +0300, Maxim Levitsky wrote
On Thu, 2020-10-08 at 08:46 -0400, Cathy Avery wrote:
> On 10/8/20 6:54 AM, Maxim Levitsky wrote:
> > On Thu, 2020-10-08 at 13:39 +0300, Maxim Levitsky wrote:
> > > On Thu, 2020-10-08 at 13:23 +0300, Maxim Levitsky wrote:
> > > > On Thu, 2020-10-08 at 07
On Thu, 2020-10-08 at 13:39 +0300, Maxim Levitsky wrote:
> On Thu, 2020-10-08 at 13:23 +0300, Maxim Levitsky wrote:
> > On Thu, 2020-10-08 at 07:52 +0200, Paolo Bonzini wrote:
> > > On 08/10/20 00:14, Maxim Levitsky wrote:
> > > > > + i
On Thu, 2020-10-08 at 13:23 +0300, Maxim Levitsky wrote:
> On Thu, 2020-10-08 at 07:52 +0200, Paolo Bonzini wrote:
> > On 08/10/20 00:14, Maxim Levitsky wrote:
> > > > + if (svm->vmcb01->control.asid == 0)
> > > > + svm->vmcb01-&
On Thu, 2020-10-08 at 07:52 +0200, Paolo Bonzini wrote:
> On 08/10/20 00:14, Maxim Levitsky wrote:
> > > + if (svm->vmcb01->control.asid == 0)
> > > + svm->vmcb01->control.asid = svm->nested.vmcb02->control.asid;
> >
> > I think t
.h
> @@ -82,7 +82,9 @@ struct kvm_svm {
> struct kvm_vcpu;
>
> struct svm_nested_state {
> - struct vmcb *hsave;
> + struct vmcb *vmcb02;
> + unsigned long vmcb01_pa;
> + unsigned long vmcb02_pa;
> u64 hsave_msr;
> u64 vm_cr_msr;
> u64 vmcb;
> @@ -102,6 +104,7 @@ struct svm_nested_state {
> struct vcpu_svm {
> struct kvm_vcpu vcpu;
> struct vmcb *vmcb;
> + struct vmcb *vmcb01;
> unsigned long vmcb_pa;
> struct svm_cpu_data *svm_data;
> uint64_t asid_generation;
> @@ -208,10 +211,7 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu
> *vcpu)
>
> static inline struct vmcb *get_host_vmcb(struct vcpu_svm *svm)
> {
> - if (is_guest_mode(>vcpu))
> - return svm->nested.hsave;
> - else
> - return svm->vmcb;
> + return svm->vmcb01;
> }
>
> static inline void set_cr_intercept(struct vcpu_svm *svm, int bit)
Honestly can't find anything seriosly wrong with this patch. I tried it,
and actually I was able to boot a fedora guest. The L2 guest is just SLOW.
I mean as slow as drawing a single character while moving in grub menu,
and later you have to wait like few minutes to just start seeing output
from the kernel on the serial line. Its is so slow that systemd timeouts
on most services, so I wasn't able to get to the GUI. Interstingly the L1
continues to work as if nothing happened. Nothing on the kernel dmesg in both
L1 and L2
I must say I never had such a puzzling issue.
I debugged it a bit but without any luck. I guess this can be brute-force
debugged by comparing the traces, etc but this probably is out of scope for
me for now.
My guesses on why this could happen:
1. Something wrong with memory types - like guest is using UC memory for
everything.
I can't completely rule that out yet
2. Something wrong with TLB/MMU - I played a bit with asid related things, but
don't see
anything significantly wrong.
3. Dirty bits of vmcb - a test to always set them without this patch and see if
that
tanks performance can be done (didn't do this)
4. Something with interrupts/int_ctl. I tested that NMI single setep code is
not involved,
vgif=0 doesn't help. avic=0 doesn't help either (tested just in case)
I applied this patch on Linus's mainline branch (it doesn't apply to kvm/queue)
Best regards,
Maxim Levitsky
On Mon, 2020-10-05 at 13:51 +0200, Vitaly Kuznetsov wrote:
> Maxim Levitsky writes:
>
> > On Thu, 2020-10-01 at 15:05 +0200, Vitaly Kuznetsov wrote:
> > > As a preparatory step to allocating vcpu->arch.cpuid_entries dynamically
> > > make kvm_check_cpuid() che
d_entries = e2;
> + vcpu->arch.cpuid_nent = cpuid->nent;
> +
> kvm_update_cpuid_runtime(vcpu);
> kvm_vcpu_after_set_cpuid(vcpu);
> -out:
> - return r;
> +
> + return 0;
> }
>
> int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c4015a43cc8a..f8ed1bde18af 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9877,6 +9877,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> kvm_mmu_destroy(vcpu);
> srcu_read_unlock(>kvm->srcu, idx);
> free_page((unsigned long)vcpu->arch.pio_data);
> + kvfree(vcpu->arch.cpuid_entries);
> if (!lapic_in_kernel(vcpu))
> static_key_slow_dec(_no_apic_vcpu);
> }
Reviewed-by: Maxim Levitsky
Best regards,
Maxim Levitsky
el)
> #define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
> #define KVM_MIN_FREE_MMU_PAGES 5
> #define KVM_REFILL_PAGES 25
> -#define KVM_MAX_CPUID_ENTRIES 80
> +#define KVM_MAX_CPUID_ENTRIES 256
> #define KVM_NR_FIXED_MTRR_REGION 88
> #define KVM_NR_VAR_MTRR 8
>
Reviewed-by: Maxim Levitsky
Best regards,
Maxim Levitsky
gt;flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)))
> - return e;
> - }
> - return NULL;
> + return cpuid_entry2_find(vcpu->arch.cpuid_entries,
> vcpu->arch.cpuid_nent,
> + function, index);
> }
> EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
>
Other than minor note to the commit message, this looks fine, so
Reviewed-by: Maxim Levitsky
Best regards,
Maxim Levitsky
my AMD and Intel machines.
I wasn't able to break it.
Changes from V5: addressed Sean Christopherson's review feedback.
Changes from V6: rebased the code on latest kvm/queue
Best regards,
Maxim Levitsky
Maxim Levitsky (4):
KVM: x86: xen_hvm_config: cleanup return values
KVM: x86
This will allow the KVM to report such errors (e.g -ENOMEM)
to the userspace.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/emulate.c | 4 ++--
arch/x86/kvm/x86.c | 9 ++---
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
Return 1 on errors that are caused by wrong guest behavior
(which will inject #GP to the guest)
And return a negative error value on issues that are
the kernel's fault (e.g -ENOMEM)
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/x86.c | 23 +--
1 file changed, 9 insertions
This way we don't waste memory on VMs which don't use nesting
virtualization even when the host enabled it for them.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/svm/nested.c | 42 +++
arch/x86/kvm/svm/svm.c| 61 +--
arch/x86/kvm
This will be used to signal an error to the userspace, in case
the vendor code failed during handling of this msr. (e.g -ENOMEM)
Signed-off-by: Maxim Levitsky
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 3 ++-
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86
On Wed, 2020-09-23 at 00:10 +0300, Maxim Levitsky wrote:
> This is the next version of this patch series.
>
> In V5 I adopted Sean Christopherson's suggestion to make .set_efer return
> a negative error (-ENOMEM in this case) which in most cases in kvm
> propagates to the us
On Mon, 2020-09-28 at 22:15 -0700, Sean Christopherson wrote:
> On Wed, Sep 23, 2020 at 12:10:25AM +0300, Maxim Levitsky wrote:
> > This way we don't waste memory on VMs which don't use nesting
> > virtualization even when the host enabled it for them.
> >
> > Si
On Thu, 2020-09-24 at 19:33 +0200, Paolo Bonzini wrote:
> On 21/09/20 12:38, Maxim Levitsky wrote:
> > MSR reads/writes should always access the L1 state, since the (nested)
> > hypervisor should intercept all the msrs it wants to adjust, and these
> > that it doesn't should
+1396,7 @@ static void svm_clear_vintr(struct vcpu_svm *svm)
> /* Drop int_ctl fields related to VINTR injection. */
> svm->vmcb->control.int_ctl &= mask;
> if (is_guest_mode(>vcpu)) {
> - svm->nested.hsave->control.int_ctl &= mask;
> + svm->vmcb01->control.int_ctl &= mask;
>
> WARN_ON((svm->vmcb->control.int_ctl & V_TPR_MASK) !=
> (svm->nested.ctl.int_ctl & V_TPR_MASK));
> @@ -3127,7 +3130,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
> if (is_guest_mode(vcpu)) {
> /* As long as interrupts are being delivered... */
> if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
> - ? !(svm->nested.hsave->save.rflags & X86_EFLAGS_IF)
> + ? !(svm->vmcb01->save.rflags & X86_EFLAGS_IF)
> : !(kvm_get_rflags(vcpu) & X86_EFLAGS_IF))
> return true;
>
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index a798e1731709..e908b83bfa69 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -82,7 +82,9 @@ struct kvm_svm {
> struct kvm_vcpu;
>
> struct svm_nested_state {
> - struct vmcb *hsave;
> + struct vmcb *vmcb02;
> + unsigned long vmcb01_pa;
> + unsigned long vmcb02_pa;
> u64 hsave_msr;
> u64 vm_cr_msr;
> u64 vmcb;
> @@ -102,6 +104,7 @@ struct svm_nested_state {
> struct vcpu_svm {
> struct kvm_vcpu vcpu;
> struct vmcb *vmcb;
> + struct vmcb *vmcb01;
> unsigned long vmcb_pa;
> struct svm_cpu_data *svm_data;
> uint64_t asid_generation;
> @@ -208,10 +211,7 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu
> *vcpu)
>
> static inline struct vmcb *get_host_vmcb(struct vcpu_svm *svm)
> {
> - if (is_guest_mode(>vcpu))
> - return svm->nested.hsave;
> - else
> - return svm->vmcb;
> + return svm->vmcb01;
> }
>
> static inline void set_cr_intercept(struct vcpu_svm *svm, int bit)
I was kind of busy this week, but very soon I'll review and test this patch.
Best regards,
Maxim Levitsky
This way we don't waste memory on VMs which don't use nesting
virtualization even when the host enabled it for them.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/svm/nested.c | 42 ++
arch/x86/kvm/svm/svm.c| 55 ++-
arch/x86
This will allow the KVM to report such errors (e.g -ENOMEM)
to the userspace.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/emulate.c | 7 +--
arch/x86/kvm/x86.c | 6 +-
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
my AMD and Intel machines.
I wasn't able to break it.
Changes from V5: addressed Sean Christopherson's review feedback.
Best regards,
Maxim Levitsky
Maxim Levitsky (4):
KVM: x86: xen_hvm_config: cleanup return values
KVM: x86: report negative values from wrmsr emulation to userspace
This will be used to signal an error to the userspace, in case
the vendor code failed during handling of this msr. (e.g -ENOMEM)
Signed-off-by: Maxim Levitsky
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 3 ++-
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86
Return 1 on errors that are caused by wrong guest behavior
(which will inject #GP to the guest)
And return a negative error value on issues that are
the kernel's fault (e.g -ENOMEM)
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/x86.c | 23 +--
1 file changed, 9 insertions
On Mon, 2020-09-21 at 09:08 -0700, Sean Christopherson wrote:
> On Mon, Sep 21, 2020 at 04:19:21PM +0300, Maxim Levitsky wrote:
> > This will allow us to make some MSR writes fatal to the guest
> > (e.g when out of memory condition occurs)
> >
> > Signed-off-by: Maxim
On Mon, 2020-09-21 at 08:41 -0700, Sean Christopherson wrote:
> On Mon, Sep 21, 2020 at 04:19:22PM +0300, Maxim Levitsky wrote:
> > This will be used later to return an error when setting this msr fails.
> >
> > Note that we ignore this return value for qemu initiated writes
On Tue, 2020-09-22 at 17:50 +0300, Maxim Levitsky wrote:
> On Tue, 2020-09-22 at 14:50 +0200, Paolo Bonzini wrote:
> > On 21/09/20 18:23, Sean Christopherson wrote:
> > > Avoid "should" in code comments and describe what the code is doing, not
> > > wha
ng if KVM_GET_MSR is used for e.g.
> debugging the guest.
Could you explain why though? After my patch, the KVM_GET_MSR will consistently
read the L1 TSC, just like all other MSRs as I explained. I guess for debugging,
this should work?
The fact that TSC reads with the guest offset is a nice ex
On Thu, 2020-09-17 at 09:29 -0700, Sean Christopherson wrote:
> On Thu, Sep 17, 2020 at 01:10:48PM +0300, Maxim Levitsky wrote:
> > This way we don't waste memory on VMs which don't use
> > nesting virtualization even if it is available to them.
> >
> > If allocation o
This will allow us to make some MSR writes fatal to the guest
(e.g when out of memory condition occurs)
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/emulate.c | 7 +--
arch/x86/kvm/x86.c | 5 +++--
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b
This way we don't waste memory on VMs which don't use
nesting virtualization even if it is available to them.
If allocation of nested state fails (which should happen,
only when host is about to OOM anyway), use new KVM_REQ_OUT_OF_MEMORY
request to shut down the guest
Signed-off-by: Maxim
MSR writes should return 1 when giving #GP to the user,
and negative value when fatal error (e.g out of memory)
happened.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/x86.c | 23 +--
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch
This will be used later to return an error when setting this msr fails.
Note that we ignore this return value for qemu initiated writes to
avoid breaking backward compatibility.
Signed-off-by: Maxim Levitsky
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 3
Levitsky
Maxim Levitsky (4):
KVM: x86: xen_hvm_config cleanup return values
KVM: x86: report negative values from wrmsr to userspace
KVM: x86: allow kvm_x86_ops.set_efer to return a value
KVM: nSVM: implement ondemand allocation of the nested state
arch/x86/include/asm/kvm_host.h | 2
ing an endless loop
in L2.
This is tested both with and without -invtsc,tsc-frequency=...
The migration was done by saving the migration stream to a file, and then
loading the qemu with '-incoming'
V2: incorporated feedback from Sean Christopherson (thanks!)
Maxim Levitsky (1):
KVM: x86:
but write is interpreted as an L1 value.
To fix this make the userspace initiated reads of IA32_TSC return L1 value
as well.
Huge thanks to Dave Gilbert for helping me understand this very confusing
semantic of MSR writes.
Signed-off-by: Maxim Levitsky
---
arch/x86/kvm/x86.c | 16
On Mon, 2020-09-21 at 12:25 +0300, Maxim Levitsky wrote:
> On Thu, 2020-09-17 at 09:11 -0700, Sean Christopherson wrote:
> > On Thu, Sep 17, 2020 at 02:07:23PM +0300, Maxim Levitsky wrote:
> > > MSR reads/writes should always access the L1 state, since the (nested)
>
101 - 200 of 663 matches
Mail list logo