> >
> > Even if the device driver doesn't support migration, you still want to
> > migrate VM? That maybe risk and we should add the "bad path" for the
> > driver at least.
>
> At a minimum we should have support for hot-plug if we are expecting to
> support migration. You would simply have to
> On Wed, Nov 25, 2015 at 12:21 AM, Lan Tianyu wrote:
> > On 2015年11月25日 13:30, Alexander Duyck wrote:
> >> No, what I am getting at is that you can't go around and modify the
> >> configuration space for every possible device out there. This
> >> solution won't scale.
> >
I didn't quite understand a couple of things though, perhaps you can
explain:
1) If we ignore the TCP sequence number problem, in an SMP machine
don't we get other randomnesses - e.g. which core completes something
first, or who wins a lock contention, so the output stream might not
Let me clarify on this issue. COLO didn't ignore the TCP sequence
number, but uses a new implementation to make the sequence number to
be best effort identical between the primary VM (PVM) and secondary VM
(SVM). Likely, VMM has to synchronize the emulation of randomization
number
Thanks Dave:
Whether the randomness value/branch/code path the PVM and SVM
may
have, It is only a performance issue. COLO never assumes the PVM and
SVM has same internal Machine state. From correctness p.o.v, as if
the PVM and SVM generate Identical response, we can view the SVM is
Zachary:
Will you extend the logic to cover the situation when the guest runs at
higher than the guest rate but the PCPU is over committed. In that case, likely
we can use the time spent when the VCPU is scheduled out to catch up as well.
Of course if the VCPU scheduled out time is not
Arnd Bergmann wrote:
On Friday 30 July 2010 17:51:52 Shirley Ma wrote:
On Fri, 2010-07-30 at 16:53 +0800, Xin, Xiaohui wrote:
Since vhost-net already supports macvtap/tun backends, do you think
whether it's better to implement zero copy in macvtap/tun than
inducing a new media passthrough
Nadav Har'El wrote:
Hi Avi,
This is a followup of our nested VMX patches that Orit Wasserman
posted in December. We've addressed most of the comments and concerns
that you and others on the mailing list had with the previous patch
set. We hope you'll find these patches easier to understand,
Nadav Har'El wrote:
This patch contains code to prepare the VMCS which can be used to
actually run the L2 guest, vmcs02. prepare_vmcs02 appropriately
merges the information in shadow_vmcs that L1 built for L2 (vmcs12),
and that in the VMCS that we built for L1 (vmcs01).
VMREAD/WRITE can only
+/* Allocate an L0 VMCS (vmcs02) for the current L1 VMCS (vmcs12), if
one + * does not already exist. The allocation is done in L0 memory,
so to avoid + * denial-of-service attack by guests, we limit the
number of concurrently- + * allocated vmcss. A well-behaving L1 will
VMCLEAR unused
Nadav Har'El wrote:
This patch implements the VMCLEAR instruction.
Signed-off-by: Nadav Har'El n...@il.ibm.com
---
--- .before/arch/x86/kvm/vmx.c2010-06-13 15:01:29.0 +0300
+++ .after/arch/x86/kvm/vmx.c 2010-06-13 15:01:29.0 +0300
@@ -138,6 +138,8 @@ struct
Avi Kivity wrote:
On 06/28/2010 09:42 AM, Sheng Yang wrote:
+static void wbinvd_ipi(void *garbage)
+{
+ wbinvd();
+}
Like Jan mentioned, this is quite heavy. What about a clflush()
loop instead? That may take more time, but at least it's
preemptible. Of course, it isn't preemptible
Herbert Xu wrote:
On Wed, Jun 23, 2010 at 06:05:41PM +0800, Dong, Eddie wrote:
I mean once the frontend side driver post the buffers to the backend
driver, the backend driver will immediately use that buffers to
compose skb or gro_frags and post them to the assigned host NIC
driver
3) As I have mentioned above, with this idea, netdev_alloc_skb() will
allocate
as usual, the data pointed by skb-data will be copied into the first
guest buffer.
That means we should reserve sufficient room in guest buffer. For PS
mode
supported driver (for example ixgbe), the room will
Herbert Xu wrote:
On Wed, Jun 23, 2010 at 04:09:40PM +0800, Dong, Eddie wrote:
Xiaohui Herbert:
Mixing copy of head 0-copy of bulk data imposes additional
challange to find the guest buffer. The backend driver may be
unable to find a spare guest buffer from virtqueue
A VF interrupt usually happens in 4-8KHZ. How about the virtio?
I assume virtio will be widely used together w/ leagcy guest with
INTx mode.
True, but in time it will be replaced by MSI.
Note without vhost virtio is also in userspace, so there are lots of
exits anyway for the status
Avi Kivity wrote:
I am currently investigating a problem with the a guest running Linux
malfunctioning in the NMI watchdog code. The problem is that we don't
handle NMI delivery mode for the local APIC LINT0 pin; instead we
expect ExtInt deliver mode or that the line is disabled completely.
Avi Kivity wrote:
On 06/09/2010 06:59 PM, Dong, Eddie wrote:
Besides VF IO interrupt and timer interrupt introduced performance
overhead risk,
VF usually uses MSI
Typo, I mean PV IO.
A VF interrupt usually happens in 4-8KHZ. How about the virtio?
I assume virtio will be widely used
Olivier Berghmans wrote:
Hi,
Thanks for the reply. I have a Intel Q9550 processor. EPT was
introduced in the Nehalem series, where my processor is not part of.
Based on that, I conclude that I am not using EPT and thus using
shadow page tables.
The problem of running (for example) a
The goal of HVM virtualization is to provide an exact same with native platform
to guest in KVM guest and Xen HVM guest, however for some reason, it is not
strictly followed in today's virtualization solution. VMMs normally take
shortcut to make commodity OS happen for performance etc. That
Saksena, Abhishek wrote:
Correction to question:-
Can keeping device models and KVM pit synchronize with the host clock
separately result in any issues with booting up OSes?
Time virtualization, especially in SMP guest for legacy OS, is tricky. Hardly
it can be really synced.
But, Linux
EOI is one of key VM Exit at high bandwidth IO such as VT-d with 10Gb/s NIC.
This patch accelerate guest EOI emulation utilizing HW VM Exit
information.
Signed-off-by: Eddie Dong eddie.d...@intel.com
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index
Avi Kivity wrote:
On 07/06/2009 04:42 PM, Dong, Eddie wrote:
EOI is one of key VM Exit at high bandwidth IO such as VT-d
with 10Gb/s NIC. This patch accelerate guest EOI emulation
utilizing HW VM Exit information.
Won't this fail if the guest uses STOSD to issue the EOI
Avi Kivity wrote:
Sheng Yang wrote:
I think that means the PV interface for lapic. And yes, we can
support it follow MS's interface, but x2apic still seems another
story as you noted... I still don't think support x2apic here would
bring us more benefits.
x2apic has the following
Gleb Natapov wrote:
On Thu, May 14, 2009 at 10:34:11PM +0800, Dong, Eddie wrote:
Gleb Natapov wrote:
On Thu, May 14, 2009 at 09:43:33PM +0800, Dong, Eddie wrote:
Avi Kivity wrote:
Dong, Eddie wrote:
OK.
Also back to Gleb's question, the reason I want to do that is to
simplify event
Avi Kivity wrote:
Dong, Eddie wrote:
OK.
Also back to Gleb's question, the reason I want to do that is to
simplify event
generation mechanism in current KVM.
Today KVM use additional layer of exception/nmi/interrupt such as
vcpu.arch.exception.pending, vcpu-arch.interrupt.pending
vcpu
Gleb Natapov wrote:
On Thu, May 14, 2009 at 09:43:33PM +0800, Dong, Eddie wrote:
Avi Kivity wrote:
Dong, Eddie wrote:
OK.
Also back to Gleb's question, the reason I want to do that is to
simplify event generation mechanism in current KVM.
Today KVM use additional layer of exception/nmi
Gleb Natapov wrote:
On Tue, May 12, 2009 at 11:06:39PM +0800, Dong, Eddie wrote:
I didn't take many test since our PTS system stop working now due to
KVM userspace
build changes. But since the logic is pretty simple, so I want to
post here to see comments. Thx, eddie
That is OK, You can send two patches. The first one will WARN_ON and
overwrite exception like the current code does. And the second one
will remove WARN_ON explaining that this case is actually possible to
trigger from a guest.
Sounds you don't like to provide this additional one, here it is
Avi Kivity wrote:
Dong, Eddie wrote:
I noticed the MACRO for SVM vmcb-control.event_inj and VMX
VM_EXIT_INTR_INFO are almost same, I have a need to query the event
injection situation in common code so plan to expose this register
read/write to x86.c. Should we define a new format
I noticed the MACRO for SVM vmcb-control.event_inj and VMX VM_EXIT_INTR_INFO
are almost same, I have a need to query the event injection situation in common
code so plan to expose this register read/write to x86.c. Should we define a
new format for evtinj/VM_EXIT_INTR_INFO as common KVM
I didn't take many test since our PTS system stop working now due to KVM
userspace
build changes. But since the logic is pretty simple, so I want to post here to
see comments.
Thx, eddie
If there is pending irq after an virtual exception is injected,
KVM needs to enable IRQ window to trap
Gleb Natapov wrote:
On Mon, May 11, 2009 at 09:04:52AM +0800, Dong, Eddie wrote:
There is not point referring to current code. Current code does not
handle serial exceptions properly. So fix it in your patch
otherwise I propose to use my patch that fixes current code
(http
There is not point referring to current code. Current code does not
handle serial exceptions properly. So fix it in your patch otherwise I
propose to use my patch that fixes current code
(http://patchwork.kernel.org/patch/21829/).
I would like Avi to decide. As comments to the difference
Xiaodong Yi wrote:
It is not a typo. I copied from UnixBench output directly. Howver, it
must be a bug of Luvalley because even the native Linux benchmark on
Double-Precision Whetstone is not that high. I also noticed that other
benchmarks are all lower than native Linux.
About timing,
Dong, Eddie wrote:
Move Double-Fault generation logic out of page fault
exception generating function to cover more generic case.
Signed-off-by: Eddie Dong eddie.d...@intel.com
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ab1fdac..51a8dad 100644
--- a/arch/x86
Gleb Natapov wrote:
+
+static int exception_class(int vector)
+{
+if (vector == 14)
+return EXCPT_PF;
+else if (vector == 0 || (vector = 10 vector = 13)) +
return
EXCPT_CONTRIBUTORY; +else
+return EXCPT_BENIGN;
+}
+
This makes
ction will be re-executed.
Do you want it to be covered for now? For exception, it is easy but
for IRQ, it needs to be pushed back.
Yes I want it to be covered now otherwise any serial exception
generates flood of Exception happens serially messages. This
function does not handle IRQ so
Dong, Eddie wrote:
ction will be re-executed.
Do you want it to be covered for now? For exception, it is easy but
for IRQ, it needs to be pushed back.
Yes I want it to be covered now otherwise any serial exception
generates flood of Exception happens serially messages. This
function does
Move Double-Fault generation logic out of page fault
exception generating function to cover more generic case.
Signed-off-by: Eddie Dong eddie.d...@intel.com
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ab1fdac..51a8dad 100644
--- a/arch/x86/kvm/x86.c
+++
Following code deactivate fpu when CR0.PE is on, any explaination?
Rest of code active/deactive fpu based on cr0.TS bit.
thx, eddie
static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
{
unsigned long guest_cr3;
u64 eptp;
guest_cr3 =
Thx, eddie
commit ad4a9829c8d5b30995f008e32774bd5f555b7e9f
Author: root r...@eddie-wb.localdomain
Date: Thu Apr 2 11:16:03 2009 +0800
Check valid bit of VM_EXIT_INTR_INFO before unblock nmi.
Signed-off-by: Eddie Dong eddie.d...@intel.com
diff --git a/arch/x86/kvm/vmx.c
Neiger, Gil wrote:
PDPTEs are used only if CR0.PG=CR4.PAE=1.
In that situation, their format depends the value of IA32_EFER.LMA.
If IA32_EFER.LMA=0, bit 63 is reserved and must be 0 in any PDPTE
that is marked present. The execute-disable setting of a page is
determined only by the PDE
Looks good, but doesn't apply; please check if you are working against
the latest version.
Rebased on top of a317a1e496b22d1520218ecf16a02498b99645e2 + previous rsvd bits
violation check patch.
thx, eddie
Use rsvd_bits_mask in load_pdptrs and remove bit 5-6 from rsvd_bits_mask
per
Just noticed that walk_addr() too can be called from tdp context, so
need to make sure rsvd_bits_mask is initialized in init_kvm_tdp_mmu()
as well.
Yes, fixed.
Thx, eddie
commit b282565503a78e75af643de42fe7bf495e2213ec
Author: root r...@eddie-wb.localdomain
Date: Mon Mar 30 16:57:39 2009
Avi Kivity wrote:
Dong, Eddie wrote:
@@ -2199,6 +2194,9 @@ void reset_rsvds_bits_mask(struct kvm_vcpu
*vcpu, int level) context-rsvd_bits_mask[1][0] = 0;
break;
case PT32E_ROOT_LEVEL:
+context-rsvd_bits_mask[0][2] = exb_bit_rsvd
+static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int
level) +{ + int ps = 0;
+
+if (level == PT_DIRECTORY_LEVEL)
+ps = !!(gpte PT_PAGE_SIZE_MASK);
No need for this. If you set rsvd_bits_mask[1][0] ==
rsvd_bits_mask[0][0], then you get the same behaviour.
Thanks, Eddie
commit 6688a1fbc37330f2c4e16d1a78050b64e1ce5dcc
Author: root r...@eddie-wb.localdomain
Date: Mon Mar 30 11:31:10 2009 +0800
cleanup to reuse is_long_mode(vcpu)
Signed-off-by: Eddie Dong eddie.d...@intel.com
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
This is followup of rsvd_bits emulation.
thx, eddie
commit 171eb2b2d8282dd913a5d5c6c695fd64e1ddcf4c
Author: root r...@eddie-wb.localdomain
Date: Mon Mar 30 11:39:50 2009 +0800
Use rsvd_bits_mask in load_pdptrs for cleanup and considing EXB bit.
Signed-off-by: Eddie Dong
+ context-rsvd_bits_mask[0] = rsvd_bits(maxphyaddr, 51);
+ context-large_page_rsvd_mask = /* 2MB PDE */
+ rsvd_bits(maxphyaddr, 51) | rsvd_bits(13, 20);
return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL); }
Isn't bit 63 reserved if NX is disabled?
Sure.
@@ -2206,6
Will never be use, PDPTEs are loaded by set_cr3(), not walk_addr().
I see, then how about to replace CR3_PAE_RESERVED_BITS check at cr3
load with
rsvd_bits_mask[2]? Seems current code are lacking of enough reserved
bits check too.
typo, I mean this:
--- a/arch/x86/kvm/x86.c
+++
Need to make sure rsvd_bits_mask[] is maintained on ept and npt, then.
Sure, will be in next patch, post the current modified one.
Thx, eddie
Current KVM doesn't check reserved bits of guest page table entry, but use
reserved bits to bypass guest #PF in VMX.
This patch add reserved
Current KVM doesn't check reserved bits of guest page table, while may use
reserved bits to bypass guest #PF in VMX.
This patch add this check while leaving shadow pte un-constructed if guest
RSVD=1.
Comments?
Thx, eddie
diff --git a/arch/x86/include/asm/kvm_host.h
Kernel pio emulation return value is mistakenly checked, fortuantely it is not
hit yet for normal OS bootup :(
Signed-off-by: Eddie Dong eddie.d...@linux.intel.com
commit 98d3dc8b67ba0bc7f494de3ade8f2b5cfcadaeb4
Author: root r...@eddie-wb.localdomain
Date: Thu Mar 19 15:44:39 2009 +0800
Simon Horman wrote:
On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell
wrote:
2) Direct NIC attachment This is particularly
interesting with SR-IOV or other multiqueue nics, but
for boutique cases or benchmarks, could be for normal
NICs. So far I have some very sketched-out patches:
What we would rather do in KVM, is have the VFs appear in
the host as standard network devices. We would then like
to back our existing PV driver to this VF directly
bypassing the host networking stack. A key feature here
is being able to fill the VF's receive queue with guest
memory
Matthew Wilcox wrote:
Wouldn't it be more useful to have the iov/N directories
be a symlink to the actual pci_dev used by the virtual
function?
The main concern here is that a VF may be disabed such as when PF enter
D3 state or undergo an reset and thus be plug-off, but user won't
Matthew Wilcox wrote:
On Tue, Oct 14, 2008 at 08:23:34AM +0800, Dong, Eddie
wrote:
Matthew Wilcox wrote:
Wouldn't it be more useful to have the iov/N directories
be a symlink to the actual pci_dev used by the virtual
function?
The main concern here is that a VF may be disabed
Matthew Wilcox wrote:
On Tue, Oct 14, 2008 at 10:14:35AM +0800, Yu Zhao wrote:
BTW, the SR-IOV patch is not only for network, some
other devices such as IDE will use same code base as
well and we image it could have other parameter to set
such as starting LBA of a IDE VF.
As Eddie said,
Tian, Kevin wrote:
From:Avi Kivity
Sent: 2008年9月27日 17:50
Yang, Sheng wrote:
After check host shared interrupts situation, I got a
question here:
If I understand correctly, current solution don't block
host shared irq, just come with the performance pentry.
The penalty come with host
I don't see how this relates to shared guest interrupts.
Whatever you have on the host side, you still need to
support shared guest interrupts. The only way to avoid
the issue is by using MSI for the guest, and even then we
still have to support interrupt sharing since not all
guests have
Avi Kivity wrote:
Dong, Eddie wrote:
I don't see how this relates to shared guest interrupts.
Whatever you have on the host side, you still need to
support shared guest interrupts. The only way to avoid
the issue is by using MSI for the guest, and even then
we still have to support
Avi Kivity wrote:
Han, Weidong wrote:
Hi all,
The initial passthrough/VT-d patches have been in kvm,
it's time to enhance it, and push them into 2.6.28.
- Shared Interrupt support
Shared guest interrupts is a prerequisite for merging
into mainline. Without this, device
Matt Anger wrote:
Thanks for the info, I've been looking into it by trying
to look
around kvm source code.
Apparently I have to write a kernel driver for the guest
os and then
also write backend driver and modify qemu to use it? Is
that correct?
That seems ugly, especially since now my
Matt Anger wrote:
I was referring to the bounce from host kernel to qemu
and then back
to the host kernel for my BE driver.
Xen:
guest - guest kernel driver- host kernel driver
For both situations I need a FE and BE driver, but for
KVM I need to modify QEMU and teach it how to pass the
65 matches
Mail list logo