Avi Kivity wrote:
> Sheng Yang wrote:
>> I think that means the PV interface for lapic. And yes, we can
>> support it follow MS's interface, but x2apic still seems another
>> story as you noted... I still don't think support x2apic here would
>> bring us more benefits.
>>
>
> x2apic has the foll
EOI is one of key VM Exit at high bandwidth IO such as VT-d with 10Gb/s NIC.
This patch accelerate guest EOI emulation utilizing HW VM Exit
information.
Signed-off-by: Eddie Dong
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ccafe0d..b63138f 100644
--- a/ar
Avi Kivity wrote:
> On 07/06/2009 04:42 PM, Dong, Eddie wrote:
>> EOI is one of key VM Exit at high bandwidth IO such as VT-d
>> with 10Gb/s NIC. This patch accelerate guest EOI emulation
>> utilizing HW VM Exit information.
>>
>
> Won't this
> >
> > I didn't quite understand a couple of things though, perhaps you can
> > explain:
> >1) If we ignore the TCP sequence number problem, in an SMP machine
> > don't we get other randomnesses - e.g. which core completes something
> > first, or who wins a lock contention, so the output strea
> >
> > Let me clarify on this issue. COLO didn't ignore the TCP sequence
> > number, but uses a new implementation to make the sequence number to
> > be best effort identical between the primary VM (PVM) and secondary VM
> > (SVM). Likely, VMM has to synchronize the emulation of randomization
> >
> > Thanks Dave:
> > Whether the randomness value/branch/code path the PVM and SVM
> may
> > have, It is only a performance issue. COLO never assumes the PVM and
> > SVM has same internal Machine state. From correctness p.o.v, as if
> > the PVM and SVM generate Identical response, we can view
Saksena, Abhishek wrote:
> Correction to question:-
>
> Can keeping device models and KVM pit synchronize with the host clock
> <> result in any issues with booting up OSes?
>
Time virtualization, especially in SMP guest for legacy OS, is tricky. Hardly
it can be really synced.
But, Linux has
Avi Kivity wrote:
> I am currently investigating a problem with the a guest running Linux
> malfunctioning in the NMI watchdog code. The problem is that we don't
> handle NMI delivery mode for the local APIC LINT0 pin; instead we
> expect ExtInt deliver mode or that the line is disabled completely
Avi Kivity wrote:
> On 06/09/2010 06:59 PM, Dong, Eddie wrote:
>>
>> Besides VF IO interrupt and timer interrupt introduced performance
>> overhead risk,
>
> VF usually uses MSI
Typo, I mean PV IO.
A VF interrupt usually happens in 4-8KHZ. How about the virtio?
I a
>> A VF interrupt usually happens in 4-8KHZ. How about the virtio?
>> I assume virtio will be widely used together w/ leagcy guest with
>> INTx mode.
>>
>
> True, but in time it will be replaced by MSI.
>
> Note without vhost virtio is also in userspace, so there are lots of
> exits anyway for
Matt Anger wrote:
> Thanks for the info, I've been looking into it by trying
> to look
> around kvm source code.
> Apparently I have to write a kernel driver for the guest
> os and then
> also write backend driver and modify qemu to use it? Is
> that correct?
> That seems ugly, especially since
Matt Anger wrote:
> I was referring to the bounce from host kernel to qemu
> and then back
> to the host kernel for my BE driver.
> Xen:
> guest -> guest kernel driver-> host kernel driver
>
> For both situations I need a FE and BE driver, but for
> KVM I need to modify QEMU and teach it how to p
Avi Kivity wrote:
> Han, Weidong wrote:
>> Hi all,
>>
>> The initial passthrough/VT-d patches have been in kvm,
>> it's time to enhance it, and push them into 2.6.28.
>>
>> - Shared Interrupt support
>>
>
> Shared guest interrupts is a prerequisite for merging
> into mainline. Without this
Tian, Kevin wrote:
>> From:Avi Kivity
>> Sent: 2008年9月27日 17:50
>>
>> Yang, Sheng wrote:
>>> After check host shared interrupts situation, I got a
>>> question here:
>>>
>>> If I understand correctly, current solution don't block
>>> host shared irq, just come with the performance pentry.
>>> Th
> I don't see how this relates to shared guest interrupts.
> Whatever you have on the host side, you still need to
> support shared guest interrupts. The only way to avoid
> the issue is by using MSI for the guest, and even then we
> still have to support interrupt sharing since not all
> guests
Avi Kivity wrote:
> Dong, Eddie wrote:
>>> I don't see how this relates to shared guest interrupts.
>>> Whatever you have on the host side, you still need to
>>> support shared guest interrupts. The only way to avoid
>>> the issue is by using MSI for t
Matthew Wilcox wrote:
> Wouldn't it be more useful to have the iov/N directories
> be a symlink to the actual pci_dev used by the virtual
> function?
The main concern here is that a VF may be disabed such as when PF enter
D3 state or undergo an reset and thus be plug-off, but user won't
re-conf
Matthew Wilcox wrote:
> On Tue, Oct 14, 2008 at 08:23:34AM +0800, Dong, Eddie
> wrote:
>> Matthew Wilcox wrote:
>>> Wouldn't it be more useful to have the iov/N directories
>>> be a symlink to the actual pci_dev used by the virtual
>>> function?
>
Matthew Wilcox wrote:
> On Tue, Oct 14, 2008 at 10:14:35AM +0800, Yu Zhao wrote:
>>> BTW, the SR-IOV patch is not only for network, some
>>> other devices such as IDE will use same code base as
>>> well and we image it could have other parameter to set
>>> such as starting LBA of a IDE VF.
>>
>
> What we would rather do in KVM, is have the VFs appear in
> the host as standard network devices. We would then like
> to back our existing PV driver to this VF directly
> bypassing the host networking stack. A key feature here
> is being able to fill the VF's receive queue with guest
> memory
Kernel pio emulation return value is mistakenly checked, fortuantely it is not
hit yet for normal OS bootup :(
Signed-off-by: Eddie Dong
commit 98d3dc8b67ba0bc7f494de3ade8f2b5cfcadaeb4
Author: root
Date: Thu Mar 19 15:44:39 2009 +0800
fix a bug when kernel PIO is emulated.
diff --gi
> kvm_emulate_pio() returns 1 when emulation is complete,
> and 0 when emulation needs further processing in
> userspace. So I think in both cases cannot_emulate is
> the wrong answer. I think 'in' emulation gets it right.
OK, yes. Do u mean this? I may misunderstand.
Thx, eddie
diff --git a
Current KVM doesn't check reserved bits of guest page table, while may use
reserved bits to bypass guest #PF in VMX.
This patch add this check while leaving shadow pte un-constructed if guest
RSVD=1.
Comments?
Thx, eddie
diff --git a/arch/x86/include/asm/kvm_host.h b/ar
>> + context->rsvd_bits_mask[0] = rsvd_bits(maxphyaddr, 51);
>> + context->large_page_rsvd_mask = /* 2MB PDE */
>> + rsvd_bits(maxphyaddr, 51) | rsvd_bits(13, 20);
>> return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL); }
>>
>
> Isn't bit 63 reserved if NX is disabled?
Sure.
>
>>
>> Will never be use, PDPTEs are loaded by set_cr3(), not walk_addr().
>>
>
> I see, then how about to replace CR3_PAE_RESERVED_BITS check at cr3
> load with
> rsvd_bits_mask[2]? Seems current code are lacking of enough reserved
> bits check too.
>
typo, I mean this:
--- a/arch/x86/kvm/x86.
>
> Need to make sure rsvd_bits_mask[] is maintained on ept and npt, then.
Sure, will be in next patch, post the current modified one.
Thx, eddie
Current KVM doesn't check reserved bits of guest page table entry, but use
reserved bits to bypass guest #PF in VMX.
This patch add reserved b
>> +static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int
>> level) +{ + int ps = 0;
>> +
>> +if (level == PT_DIRECTORY_LEVEL)
>> +ps = !!(gpte & PT_PAGE_SIZE_MASK);
>>
>
> No need for this. If you set rsvd_bits_mask[1][0] ==
> rsvd_bits_mask[0][0], then you get the
Thanks, Eddie
commit 6688a1fbc37330f2c4e16d1a78050b64e1ce5dcc
Author: root
Date: Mon Mar 30 11:31:10 2009 +0800
cleanup to reuse is_long_mode(vcpu)
Signed-off-by: Eddie Dong
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index db5021b..affc31d 100644
--- a/arch/x86/kv
This is followup of rsvd_bits emulation.
thx, eddie
commit 171eb2b2d8282dd913a5d5c6c695fd64e1ddcf4c
Author: root
Date: Mon Mar 30 11:39:50 2009 +0800
Use rsvd_bits_mask in load_pdptrs for cleanup and considing EXB bit.
Signed-off-by: Eddie Dong
diff --git a/arch/x86/kvm/mmu.
>
> Just noticed that walk_addr() too can be called from tdp context, so
> need to make sure rsvd_bits_mask is initialized in init_kvm_tdp_mmu()
> as well.
Yes, fixed.
Thx, eddie
commit b282565503a78e75af643de42fe7bf495e2213ec
Author: root
Date: Mon Mar 30 16:57:39 2009 +0800
Emulate #P
Avi Kivity wrote:
> Dong, Eddie wrote:
>> struct vcpu_svm *svm = to_svm(vcpu);
>>
>> #ifdef CONFIG_X86_64
>> -if (vcpu->arch.shadow_efer & EFER_LME) {
>> +if (is_long_mode(vcpu)) {
>>
>
> is_long_mode() actually tests EFER_LMA, s
Dong, Eddie wrote:
> This is followup of rsvd_bits emulation.
>
Base on new rsvd_bits emulation patch.
thx, eddie
commit 2c1472ef2b9fd87a261e8b58a7db11afd6a111dc
Author: root
Date: Mon Mar 30 17:05:47 2009 +0800
Use rsvd_bits_mask in load_pdptrs for cleanup with EXB bit cons
Avi Kivity wrote:
> Dong, Eddie wrote:
>> @@ -2199,6 +2194,9 @@ void reset_rsvds_bits_mask(struct kvm_vcpu
>> *vcpu, int level) context->rsvd_bits_mask[1][0] = 0;
>> break;
>> case PT32E_ROOT_LEVEL:
>> +context-
Avi Kivity wrote:
> Dong, Eddie wrote:
>> @@ -2199,6 +2194,9 @@ void reset_rsvds_bits_mask(struct kvm_vcpu
>> *vcpu, int level) context->rsvd_bits_mask[1][0] = 0;
>> break;
>> case PT32E_ROOT_LEVEL:
>> +context-
Neiger, Gil wrote:
> PDPTEs are used only if CR0.PG=CR4.PAE=1.
>
> In that situation, their format depends the value of IA32_EFER.LMA.
>
> If IA32_EFER.LMA=0, bit 63 is reserved and must be 0 in any PDPTE
> that is marked present. The execute-disable setting of a page is
> determined only by the
>
> Looks good, but doesn't apply; please check if you are working against
> the latest version.
Rebased on top of a317a1e496b22d1520218ecf16a02498b99645e2 + previous rsvd bits
violation check patch.
thx, eddie
Use rsvd_bits_mask in load_pdptrs and remove bit 5-6 from rsvd_bits_mask
pe
Thx, eddie
commit ad4a9829c8d5b30995f008e32774bd5f555b7e9f
Author: root
Date: Thu Apr 2 11:16:03 2009 +0800
Check valid bit of VM_EXIT_INTR_INFO before unblock nmi.
Signed-off-by: Eddie Dong
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index aba41ae..689523a 100644
---
Following code deactivate fpu when CR0.PE is on, any explaination?
Rest of code active/deactive fpu based on cr0.TS bit.
thx, eddie
static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
{
unsigned long guest_cr3;
u64 eptp;
guest_cr3 = cr3
Move Double-Fault generation logic out of page fault
exception generating function to cover more generic case.
Signed-off-by: Eddie Dong
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ab1fdac..51a8dad 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -162,
Dong, Eddie wrote:
> Move Double-Fault generation logic out of page fault
> exception generating function to cover more generic case.
>
> Signed-off-by: Eddie Dong
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ab1fdac..51a8dad 100644
>
Gleb Natapov wrote:
>> +
>> +static int exception_class(int vector)
>> +{
>> +if (vector == 14)
>> +return EXCPT_PF;
>> +else if (vector == 0 || (vector >= 10 && vector <= 13)) +
>> return
>> EXCPT_CONTRIBUTORY; +else
>> +return EXCPT_BENIGN;
>
ction will be re-executed.
>>
>> Do you want it to be covered for now? For exception, it is easy but
>> for IRQ, it needs to be pushed back.
>>
> Yes I want it to be covered now otherwise any serial exception
> generates flood of "Exception happens serially" messages. This
> function does not ha
Dong, Eddie wrote:
> ction will be re-executed.
>>>
>>> Do you want it to be covered for now? For exception, it is easy but
>>> for IRQ, it needs to be pushed back.
>>>
>> Yes I want it to be covered now otherwise any serial exception
>> gener
Gleb Natapov wrote:
> On Fri, May 08, 2009 at 06:46:14PM +0800, Dong, Eddie wrote:
>> Dong, Eddie wrote:
>>> ction will be re-executed.
>>>>>
>>>>> Do you want it to be covered for now? For exception, it is easy
>>>>> but for I
> There is not point referring to current code. Current code does not
> handle serial exceptions properly. So fix it in your patch otherwise I
> propose to use my patch that fixes current code
> (http://patchwork.kernel.org/patch/21829/).
>
I would like Avi to decide. As comments to the differen
Xiaodong Yi wrote:
> It is not a typo. I copied from UnixBench output directly. Howver, it
> must be a bug of Luvalley because even the native Linux benchmark on
> Double-Precision Whetstone is not that high. I also noticed that other
> benchmarks are all lower than native Linux.
>
> About timing,
Gleb Natapov wrote:
> On Mon, May 11, 2009 at 09:04:52AM +0800, Dong, Eddie wrote:
>>
>>> There is not point referring to current code. Current code does not
>>> handle serial exceptions properly. So fix it in your patch
>>> otherwise I propose to use my pat
I noticed the MACRO for SVM vmcb->control.event_inj and VMX VM_EXIT_INTR_INFO
are almost same, I have a need to query the event injection situation in common
code so plan to expose this register read/write to x86.c. Should we define a
new format for evtinj/VM_EXIT_INTR_INFO as common KVM format
I didn't take many test since our PTS system stop working now due to KVM
userspace
build changes. But since the logic is pretty simple, so I want to post here to
see comments.
Thx, eddie
If there is pending irq after an virtual exception is injected,
KVM needs to enable IRQ window to trap ba
Gleb Natapov wrote:
> On Tue, May 12, 2009 at 11:06:39PM +0800, Dong, Eddie wrote:
>>
>> I didn't take many test since our PTS system stop working now due to
>> KVM userspace
>> build changes. But since the logic is pretty simple, so I want to
>>
> That is OK, You can send two patches. The first one will WARN_ON and
> overwrite exception like the current code does. And the second one
> will remove WARN_ON explaining that this case is actually possible to
> trigger from a guest.
>
Sounds you don't like to provide this additional one, here i
Avi Kivity wrote:
> Dong, Eddie wrote:
>> I noticed the MACRO for SVM vmcb->control.event_inj and VMX
>> VM_EXIT_INTR_INFO are almost same, I have a need to query the event
>> injection situation in common code so plan to expose this register
>> read/write to x86.c.
Avi Kivity wrote:
> Dong, Eddie wrote:
>> OK.
>> Also back to Gleb's question, the reason I want to do that is to
>> simplify event
>> generation mechanism in current KVM.
>>
>> Today KVM use additional layer of exception/nmi/interrupt
Gleb Natapov wrote:
> On Thu, May 14, 2009 at 09:43:33PM +0800, Dong, Eddie wrote:
>> Avi Kivity wrote:
>>> Dong, Eddie wrote:
>>>> OK.
>>>> Also back to Gleb's question, the reason I want to do that is to
>>>> simplify event generation me
Gleb Natapov wrote:
> On Thu, May 14, 2009 at 10:34:11PM +0800, Dong, Eddie wrote:
>> Gleb Natapov wrote:
>>> On Thu, May 14, 2009 at 09:43:33PM +0800, Dong, Eddie wrote:
>>>> Avi Kivity wrote:
>>>>> Dong, Eddie wrote:
>>>>>> OK.
>&
The goal of HVM virtualization is to provide an exact same with native platform
to guest in KVM guest and Xen HVM guest, however for some reason, it is not
strictly followed in today's virtualization solution. VMMs normally take
shortcut to make commodity OS happen for performance etc. That brin
Olivier Berghmans wrote:
> Hi,
>
> Thanks for the reply. I have a Intel Q9550 processor. EPT was
> introduced in the Nehalem series, where my processor is not part of.
> Based on that, I conclude that I am not using EPT and thus using
> shadow page tables.
>
> The problem of running (for example)
Zachary:
Will you extend the logic to cover the situation when the guest runs at
higher than the guest rate but the PCPU is over committed. In that case, likely
we can use the time spent when the VCPU is scheduled out to catch up as well.
Of course if the VCPU scheduled out time is not e
> 3) As I have mentioned above, with this idea, netdev_alloc_skb() will
> allocate
> as usual, the data pointed by skb->data will be copied into the first
> guest buffer.
> That means we should reserve sufficient room in guest buffer. For PS
> mode
> supported driver (for example ixgbe), the ro
Herbert Xu wrote:
> On Wed, Jun 23, 2010 at 04:09:40PM +0800, Dong, Eddie wrote:
>>
>> Xiaohui & Herbert:
>> Mixing copy of head & 0-copy of bulk data imposes additional
>> challange to find the guest buffer. The backend driver may be
>>
Herbert Xu wrote:
> On Wed, Jun 23, 2010 at 06:05:41PM +0800, Dong, Eddie wrote:
>>
>> I mean once the frontend side driver post the buffers to the backend
>> driver, the backend driver will "immediately" use that buffers to
>> compose skb or gro_frags a
Avi Kivity wrote:
> On 06/28/2010 09:42 AM, Sheng Yang wrote:
+static void wbinvd_ipi(void *garbage)
+{
+ wbinvd();
+}
>>> Like Jan mentioned, this is quite heavy. What about a clflush()
>>> loop instead? That may take more time, but at least it's
>>> preemptible. Of c
Avi Kivity wrote:
> On 06/28/2010 10:30 AM, Dong, Eddie wrote:
>>>
>>> Several milliseconds of non-responsiveness may not be acceptable for
>>> some applications. So I think queue_work_on() and a clflush loop is
>>> better than an IPI and wbinvd.
Nadav Har'El wrote:
> This patch implements the VMCLEAR instruction.
>
> Signed-off-by: Nadav Har'El
> ---
> --- .before/arch/x86/kvm/vmx.c2010-06-13 15:01:29.0 +0300
> +++ .after/arch/x86/kvm/vmx.c 2010-06-13 15:01:29.0 +0300
> @@ -138,6 +138,8 @@ struct __attribute__ ((_
Nadav Har'El wrote:
> This patch implements the VMPTRLD instruction.
>
> Signed-off-by: Nadav Har'El
> ---
> --- .before/arch/x86/kvm/vmx.c2010-06-13 15:01:29.0 +0300
> +++ .after/arch/x86/kvm/vmx.c 2010-06-13 15:01:29.0 +0300
> @@ -3829,6 +3829,26 @@ static int read_guest
Nadav Har'El wrote:
> This patch contains code to prepare the VMCS which can be used to
> actually run the L2 guest, vmcs02. prepare_vmcs02 appropriately
> merges the information in shadow_vmcs that L1 built for L2 (vmcs12),
> and that in the VMCS that we built for L1 (vmcs01).
>
> VMREAD/WRITE can
> +/* Allocate an L0 VMCS (vmcs02) for the current L1 VMCS (vmcs12), if
> one + * does not already exist. The allocation is done in L0 memory,
> so to avoid + * denial-of-service attack by guests, we limit the
> number of concurrently- + * allocated vmcss. A well-behaving L1 will
> VMCLEAR unused v
Nadav Har'El wrote:
> Hi Avi,
>
> This is a followup of our nested VMX patches that Orit Wasserman
> posted in December. We've addressed most of the comments and concerns
> that you and others on the mailing list had with the previous patch
> set. We hope you'll find these patches easier to unders
Arnd Bergmann wrote:
> On Friday 30 July 2010 17:51:52 Shirley Ma wrote:
>> On Fri, 2010-07-30 at 16:53 +0800, Xin, Xiaohui wrote:
Since vhost-net already supports macvtap/tun backends, do you think
whether it's better to implement zero copy in macvtap/tun than
inducing a new media p
Simon Horman wrote:
> On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell
> wrote:
>>
>> 2) Direct NIC attachment This is particularly
>> interesting with SR-IOV or other multiqueue nics, but
>> for boutique cases or benchmarks, could be for normal
>> NICs. So far I have some very sketched-o
Rusty/Anthony/Avi and all:
I did a read at Rusty's April 18 post of vringfd patch, though I
didn't find explicit code for how to call get_user_skb_frags, it seems
to be clear the user only need to trigger a "flush_tx" command. All the
rest will be automatically turned on, fetching the ring
Sukanto Ghosh wrote:
> Hi all,
>
> Why has the IO device emulation part been kept in
> userspace ?
> IO attempts cause VM-exits to the KVM (running in
> kernel-mode) it then forwards these requests to the
> userspace (mode-switch). After completion of IO in
> userspace, another mode switch is don
> On Wed, Nov 25, 2015 at 12:21 AM, Lan Tianyu wrote:
> > On 2015年11月25日 13:30, Alexander Duyck wrote:
> >> No, what I am getting at is that you can't go around and modify the
> >> configuration space for every possible device out there. This
> >> solution won't scale.
> >
> >
> > PCI config spac
> >
> > Even if the device driver doesn't support migration, you still want to
> > migrate VM? That maybe risk and we should add the "bad path" for the
> > driver at least.
>
> At a minimum we should have support for hot-plug if we are expecting to
> support migration. You would simply have to ho
74 matches
Mail list logo