Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial

2014-09-01 Thread Amit Shah
On (Mon) 01 Sep 2014 [20:52:46], Zhang Haoyu wrote:
> >>> Hi, all
> >>> 
> >>> I start a VM with virtio-serial (default ports number: 31), and found 
> >>> that virtio-blk performance degradation happened, about 25%, this problem 
> >>> can be reproduced 100%.
> >>> without virtio-serial:
> >>> 4k-read-random 1186 IOPS
> >>> with virtio-serial:
> >>> 4k-read-random 871 IOPS
> >>> 
> >>> but if use max_ports=2 option to limit the max number of virio-serial 
> >>> ports, then the IO performance degradation is not so serious, about 5%.
> >>> 
> >>> And, ide performance degradation does not happen with virtio-serial.
> >>
> >>Pretty sure it's related to MSI vectors in use.  It's possible that
> >>the virtio-serial device takes up all the avl vectors in the guests,
> >>leaving old-style irqs for the virtio-blk device.
> >>
> >I don't think so,
> >I use iometer to test 64k-read(or write)-sequence case, if I disable the 
> >virtio-serial dynamically via device manager->virtio-serial => disable,
> >then the performance get promotion about 25% immediately, then I re-enable 
> >the virtio-serial via device manager->virtio-serial => enable,
> >the performance got back again, very obvious.
> add comments:
> Although the virtio-serial is enabled, I don't use it at all, the degradation 
> still happened.

Using the vectors= option as mentioned below, you can restrict the
number of MSI vectors the virtio-serial device gets.  You can then
confirm whether it's MSI that's related to these issues.

> >So, I think it has no business with legacy interrupt mode, right?
> >
> >I am going to observe the difference of perf top data on qemu and perf kvm 
> >stat data when disable/enable virtio-serial in guest,
> >and the difference of perf top data on guest when disable/enable 
> >virtio-serial in guest,
> >any ideas?
> >
> >Thanks,
> >Zhang Haoyu
> >>If you restrict the number of vectors the virtio-serial device gets
> >>(using the -device virtio-serial-pci,vectors= param), does that make
> >>things better for you?



Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 82211] Cannot boot Xen under KVM with X2APIC enabled

2014-09-01 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=82211

--- Comment #6 from Paolo Bonzini  ---
What version of Xen?  Can you attach the xen.gz file?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested paging in nested SVM setup

2014-09-01 Thread Valentine Sinitsyn

On 02.09.2014 12:09, Valentine Sinitsyn wrote:

https://www.dropbox.com/s/slbxmxyg74wh9hv/l1mmio-cpu0.txt.gz?dl=0

Forgot to say: the user space is vanilla QEMU 2.1.0 here.

Valentine
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: PPC: Book3S HV: Increase timeout for grabbing secondary threads

2014-09-01 Thread Paul Mackerras
From: Paul Mackerras 

Occasional failures have been seen with split-core mode and migration
where the message "KVM: couldn't grab cpu" appears.  This increases
the length of time that we wait from 1ms to 10ms, which seems to
work around the issue.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 27cced9..4526bef 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1489,7 +1489,7 @@ static void kvmppc_remove_runnable(struct kvmppc_vcore 
*vc,
 static int kvmppc_grab_hwthread(int cpu)
 {
struct paca_struct *tpaca;
-   long timeout = 1000;
+   long timeout = 1;
 
tpaca = &paca[cpu];
 
-- 
2.1.0.rc1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] KVM: PPC: Book3S: Some miscellanous fixes

2014-09-01 Thread Paul Mackerras
This series of patches is based on Alex Graf's kvm-ppc-queue branch.
It contains 3 small patches from the tree that we are shipping on
POWER8 machines: a fix for an error that we see very occasionally,
and two minor improvements.  Please apply for 3.18.

Paul.
---
 arch/powerpc/include/asm/kvm_host.h |  2 ++
 arch/powerpc/kvm/book3s_hv.c|  6 --
 arch/powerpc/kvm/book3s_pr.c| 39 +
 3 files changed, 45 insertions(+), 2 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: PPC: Book3S PR: Implement ARCH_COMPAT register

2014-09-01 Thread Paul Mackerras
This provides basic support for the KVM_REG_PPC_ARCH_COMPAT register
in PR KVM.  At present the value is sanity-checked when set, but
doesn't actually affect anything yet.

Implementing this makes it possible to use a qemu command-line
argument such as "-cpu host,compat=power7" on a POWER8 machine,
just as we would with HV KVM.  In turn this means that we can use the
same libvirt XML for nested virtualization with PR KVM as we do with
HV KVM.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  2 ++
 arch/powerpc/kvm/book3s_pr.c| 39 +
 2 files changed, 41 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 3502649..f435a889 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -515,6 +515,8 @@ struct kvm_vcpu_arch {
u32 ivor[64];
ulong ivpr;
u32 pvr;
+   u32 pvr_arch;
+   u32 compat_arch;
 
u32 shadow_pid;
u32 shadow_pid1;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index faffb27..7183bdc 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -398,6 +398,28 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 
msr)
kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
 }
 
+/*
+ * Evaluate the architecture level of a PVR value.
+ * The result is in terms of PVR_ARCH_* values.
+ */
+static u32 pvr_to_arch(u32 pvr)
+{
+   switch (PVR_VER(pvr)) {
+   case PVR_POWER5p:
+   return PVR_ARCH_204;
+   case PVR_POWER6:
+   return PVR_ARCH_205;
+   case PVR_POWER7:
+   case PVR_POWER7p:
+   return PVR_ARCH_206;
+   case PVR_POWER8:
+   case PVR_POWER8E:
+   return PVR_ARCH_207;
+   default:
+   return 0;
+   }
+}
+
 void kvmppc_set_pvr_pr(struct kvm_vcpu *vcpu, u32 pvr)
 {
u32 host_pvr;
@@ -473,6 +495,18 @@ void kvmppc_set_pvr_pr(struct kvm_vcpu *vcpu, u32 pvr)
/* Enable HID2.PSE - in case we need it later */
mtspr(SPRN_HID2_GEKKO, mfspr(SPRN_HID2_GEKKO) | (1 << 29));
}
+
+   vcpu->arch.pvr_arch = pvr_to_arch(pvr);
+   if (vcpu->arch.pvr_arch < vcpu->arch.compat_arch)
+   vcpu->arch.compat_arch = 0;
+}
+
+static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 compat_arch)
+{
+   if (compat_arch > vcpu->arch.pvr_arch)
+   return -EINVAL;
+   vcpu->arch.compat_arch = vcpu->arch.pvr_arch;
+   return 0;
 }
 
 /* Book3s_32 CPUs always have 32 bytes cache line size, which Linux assumes. To
@@ -1332,6 +1366,9 @@ static int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, 
u64 id,
else
*val = get_reg_val(id, 0);
break;
+   case KVM_REG_PPC_ARCH_COMPAT:
+   *val = get_reg_val(id, vcpu->arch.compat_arch);
+   break;
default:
r = -EINVAL;
break;
@@ -1361,6 +1398,8 @@ static int kvmppc_set_one_reg_pr(struct kvm_vcpu *vcpu, 
u64 id,
case KVM_REG_PPC_LPCR:
case KVM_REG_PPC_LPCR_64:
kvmppc_set_lpcr_pr(vcpu, set_reg_val(id, *val));
+   case KVM_REG_PPC_ARCH_COMPAT:
+   r = kvmppc_set_arch_compat(vcpu, set_reg_val(id, *val));
break;
default:
r = -EINVAL;
-- 
2.1.0.rc1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: PPC: Book3S HV: Only accept host PVR value for guest PVR

2014-09-01 Thread Paul Mackerras
Since the guest can read the machine's PVR (Processor Version Register)
directly and see the real value, we should disallow userspace from
setting any value for the guest's PVR other than the real host value.
Therefore this makes kvm_arch_vcpu_set_sregs_hv() check the supplied
PVR value and return an error if it is different from the host value,
which has been put into vcpu->arch.pvr at vcpu creation time.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4526bef..529d10a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -856,7 +856,9 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct kvm_vcpu 
*vcpu,
 {
int i, j;
 
-   kvmppc_set_pvr_hv(vcpu, sregs->pvr);
+   /* Only accept the same PVR as the host's, since we can't spoof it */
+   if (sregs->pvr != vcpu->arch.pvr)
+   return -EINVAL;
 
j = 0;
for (i = 0; i < vcpu->arch.slb_nr; i++) {
-- 
2.1.0.rc1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested paging in nested SVM setup

2014-09-01 Thread Valentine Sinitsyn

Hi Paolo,

On 01.09.2014 23:04, Paolo Bonzini wrote:

Valentine, can you produce another trace, this time with both kvm and
kvmmmu events enabled?
I was able to make the trace shorter by grepping only what's happening 
on a single CPU core (#0):


https://www.dropbox.com/s/slbxmxyg74wh9hv/l1mmio-cpu0.txt.gz?dl=0

It was taken with kernel 3.16.1 modules with your paging-tmpl.h patch 
applied.


This time, the trace looks somewhat different, however my code still 
hangs in nested KVM (and doesn't on real HW).


Thanks,
Valentine
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 82211] Cannot boot Xen under KVM with X2APIC enabled

2014-09-01 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=82211

--- Comment #5 from Zhou, Chao  ---
kvm.git+ qemu.git:fd275235_8b303011
kernel version:3.17.0-rc1
test on Ivytown_EP
qemu-system-x86_64 -enable-kvm -m 4G -smp 2 -net nic,macaddr=00:13:13:51:51:15
-net tap,script=/etc/kvm/qemu-ifup nested-xen.qcow -cpu kvm64
the L1 guest panic and reboot, the bug can reproduce.

when I try enable_apicv=1 or enalbe_apicv=0
create guest
qemu-system-x86_64 -enable-kvm -m 4G -smp 2 -net nic,macaddr=00:13:13:51:51:15
-net tap,script=/etc/kvm/qemu-ifup nested-xen.qcow -cpu kvm64
or 
qemu-system-x86_64 -enable-kvm -m 4G -smp 2 -net nic,macaddr=00:13:13:51:51:15
-net tap,script=/etc/kvm/qemu-ifup nested-xen.qcow -cpu host
the bug can reproduce.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 61411] [Nested]L2 guest failed to start in VMware on KVM

2014-09-01 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=61411

Zhou, Chao  changed:

   What|Removed |Added

 Status|RESOLVED|VERIFIED

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 61411] [Nested]L2 guest failed to start in VMware on KVM

2014-09-01 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=61411

--- Comment #1 from Zhou, Chao  ---
kvm.git +qemu.git: fd275235_8b303011
test on Ivytown_EP
kernel version: 3.17.0-rc1
enable ignore_msrs(echo 1>/sys/module/kvm/parameters/ignore_msrs), then create
L1 guest
qemu-system-x86_64 --enable-kvm -m 6G -smp 4 -net nic,macaddr=00:16:3e:5a:28:29
-net tap,script=/etc/kvm/qemu-ifup win7-nested.qcow2 -cpu host,-hypervisor
vmware boot up successfully

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 61411] [Nested]L2 guest failed to start in VMware on KVM

2014-09-01 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=61411

Zhou, Chao  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |CODE_FIX

--- Comment #2 from Zhou, Chao  ---
this commit fixed the bug:
commit a7c0b07d570848e50fce4d31ac01313484d6b844
Author: Wanpeng Li 
Date:   Thu Aug 21 19:46:50 2014 +0800

KVM: nVMX: nested TPR shadow/threshold emulation

This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411

TPR shadow/threshold feature is important to speed up the Windows guest.
Besides, it is a must feature for certain VMM.

We map virtual APIC page address and TPR threshold from L1 VMCS. If
TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested
in, we inject it into L1 VMM for handling.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Wanpeng Li 
[Add PAGE_ALIGNED check, do not write useless virtual APIC page address
 if TPR shadowing is disabled. - Paolo]
Signed-off-by: Paolo Bonzini 

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: minor cleanup and optimizations

2014-09-01 Thread Alexander Graf


On 28.08.14 15:13, Radim Krčmář wrote:
> The first patch answers a demand for inline arch functions.
> (There is a lot of constant functions that could be inlined as well.)
> 
> Second patch digs a bit into the history of KVM and removes a useless
> argument that seemed suspicious when preparing the first patch.
> 
> 
> Radim Krčmář (2):
>   KVM: static inline empty kvm_arch functions
>   KVM: remove garbage arg to *hardware_{en,dis}able

Acked-by: Alexander Graf 


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Remove shared defines for SPE and AltiVec interrupts

2014-09-01 Thread Alexander Graf


On 01.09.14 12:17, Mihai Caraman wrote:
> We currently decide at compile-time which of the SPE or AltiVec units to
> support exclusively. Guard kernel defines with CONFIG_SPE_POSSIBLE and
> CONFIG_PPC_E500MC and remove shared defines.
> 
> Signed-off-by: Mihai Caraman 

Thanks, applied to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Remove the tasklet used by the hrtimer

2014-09-01 Thread Alexander Graf


On 01.09.14 16:19, Mihai Caraman wrote:
> Powerpc timer implementation is a copycat version of s390. Now that they 
> removed
> the tasklet with commit ea74c0ea1b24a6978a6ebc80ba4dbc7b7848b32d follow this
> optimization.
> 
> Signed-off-by: Mihai Caraman 
> Signed-off-by: Bogdan Purcareata 

What could possibly go wrong ... :)

Applied to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested paging in nested SVM setup

2014-09-01 Thread Valentine Sinitsyn

Hi Paolo,

On 01.09.2014 23:41, Paolo Bonzini wrote:

Il 21/08/2014 14:28, Valentine Sinitsyn ha scritto:
BTW npt_rsvd does *not* fail on the machine I've been testing on today.
I can confirm l1mmio test doesn't fail in kvm-unit-test's master 
anymore. npt_rsvd still does. I also needed to disable ioio test, or it 
was hanging for a long time (this doesn't happen if I use Jan's patched 
KVM that have IOPM bugs fixed). However, l1mmio test passes regardless I 
use stock kvm 3.16.1 or a patched version.



Can you retry running the tests with the latest kvm-unit-tests (branch
"master"), gather a trace of kvm and kvmmmu events, and send the
compressed trace.dat my way?
You mean the trace when the problem reveal itself (not from running 
tests), I assume? It's around 2G uncompressed (probably I'm enabling 
tracing to early or doing anything else wrong). Will look into it 
tomorrow, hopefully, I can reduce the size (e.g. by switching to 
uniprocessor mode). Below is a trace snippet similar to the one I've 
sent earlier.


--
qemu-system-x86-2728  [002]  1726.426225: kvm_exit: reason 
npf rip 0x8104e876 info 1000f fee000b0
 qemu-system-x86-2728  [002]  1726.426226: kvm_nested_vmexit:rip: 
0x8104e876 reason: npf ext_inf1: 0x0001000f ext_inf2: 
0xfee000b0 ext_int: 0x ext_int_err: 0x
 qemu-system-x86-2728  [002]  1726.426227: kvm_page_fault: 
address fee000b0 error_code f
 qemu-system-x86-2725  [000]  1726.426227: kvm_exit: reason 
npf rip 0x8104e876 info 1000f fee000b0
 qemu-system-x86-2725  [000]  1726.426228: kvm_nested_vmexit:rip: 
0x8104e876 reason: npf ext_inf1: 0x0001000f ext_inf2: 
0xfee000b0 ext_int: 0x ext_int_err: 0x
 qemu-system-x86-2725  [000]  1726.426229: kvm_page_fault: 
address fee000b0 error_code f
 qemu-system-x86-2728  [002]  1726.426229: kvm_emulate_insn: 
0:8104e876:89 b7 00 b0 5f ff (prot64)
 qemu-system-x86-2725  [000]  1726.426230: kvm_emulate_insn: 
0:8104e876:89 b7 00 b0 5f ff (prot64)
 qemu-system-x86-2728  [002]  1726.426231: kvm_mmu_pagetable_walk: addr 
ff5fb0b0 pferr 2 W
 qemu-system-x86-2725  [000]  1726.426231: kvm_mmu_pagetable_walk: addr 
ff5fb0b0 pferr 2 W
 qemu-system-x86-2728  [002]  1726.426231: kvm_mmu_pagetable_walk: addr 
1811000 pferr 6 W|U
 qemu-system-x86-2725  [000]  1726.426232: kvm_mmu_pagetable_walk: addr 
36c49000 pferr 6 W|U
 qemu-system-x86-2728  [002]  1726.426232: kvm_mmu_paging_element: pte 
3c03a027 level 4
 qemu-system-x86-2725  [000]  1726.426232: kvm_mmu_paging_element: pte 
3c03a027 level 4
 qemu-system-x86-2728  [002]  1726.426232: kvm_mmu_paging_element: pte 
3c03d027 level 3
 qemu-system-x86-2725  [000]  1726.426233: kvm_mmu_paging_element: pte 
3c03d027 level 3
 qemu-system-x86-2728  [002]  1726.426233: kvm_mmu_paging_element: pte 
18000e7 level 2
 qemu-system-x86-2725  [000]  1726.426233: kvm_mmu_paging_element: pte 
36c000e7 level 2
 qemu-system-x86-2728  [002]  1726.426233: kvm_mmu_paging_element: pte 
1814067 level 4
 qemu-system-x86-2725  [000]  1726.426233: kvm_mmu_paging_element: pte 
1814067 level 4
 qemu-system-x86-2728  [002]  1726.426233: kvm_mmu_pagetable_walk: addr 
1814000 pferr 6 W|U
 qemu-system-x86-2725  [000]  1726.426234: kvm_mmu_pagetable_walk: addr 
1814000 pferr 6 W|U
 qemu-system-x86-2728  [002]  1726.426234: kvm_mmu_paging_element: pte 
3c03a027 level 4
 qemu-system-x86-2725  [000]  1726.426234: kvm_mmu_paging_element: pte 
3c03a027 level 4
 qemu-system-x86-2728  [002]  1726.426234: kvm_mmu_paging_element: pte 
3c03d027 level 3
 qemu-system-x86-2725  [000]  1726.426235: kvm_mmu_paging_element: pte 
3c03d027 level 3
 qemu-system-x86-2728  [002]  1726.426235: kvm_mmu_paging_element: pte 
18000e7 level 2
 qemu-system-x86-2725  [000]  1726.426235: kvm_mmu_paging_element: pte 
18000e7 level 2
 qemu-system-x86-2728  [002]  1726.426235: kvm_mmu_paging_element: pte 
1816067 level 3
 qemu-system-x86-2725  [000]  1726.426235: kvm_mmu_paging_element: pte 
1816067 level 3
 qemu-system-x86-2728  [002]  1726.426235: kvm_mmu_pagetable_walk: addr 
1816000 pferr 6 W|U
 qemu-system-x86-2725  [000]  1726.426236: kvm_mmu_pagetable_walk: addr 
1816000 pferr 6 W|U
 qemu-system-x86-2728  [002]  1726.426236: kvm_mmu_paging_element: pte 
3c03a027 level 4
 qemu-system-x86-2725  [000]  1726.426236: kvm_mmu_paging_element: pte 
3c03a027 level 4
 qemu-system-x86-2728  [002]  1726.426236: kvm_mmu_paging_element: pte 
3c03d027 level 3
 qemu-system-x86-2725  [000]  1726.426236: kvm_mmu_paging_element: pte 
3c03d027 level 3
 qemu-system-x86-2728  [002]  1726.426236: kvm_mmu_paging_element: pte 
18000e7 level 2
 qemu-system-x86-2725  [000]  1726.426237: kvm_mmu_paging_element: pte 
18000e7 level 2
 qemu-system-x86-2728  [002]  1726.426237: kvm_mmu_paging_element: pte 
1a06067 level 2
 qemu-system-x86-2725  [000]  1726.426237: kv

Re: Nested paging in nested SVM setup

2014-09-01 Thread Paolo Bonzini
Il 21/08/2014 14:28, Valentine Sinitsyn ha scritto:
>> It seems to work for VMX (see the testcase I just sent).  For SVM, can
>> you check if this test works for you, so that we can work on a simple
>> testcase?
> 
> However, npt_rsvd seems to be broken - maybe that is the reason?

BTW npt_rsvd does *not* fail on the machine I've been testing on today.

Can you retry running the tests with the latest kvm-unit-tests (branch
"master"), gather a trace of kvm and kvmmmu events, and send the
compressed trace.dat my way?

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested paging in nested SVM setup

2014-09-01 Thread Paolo Bonzini
Il 20/08/2014 08:46, Valentine Sinitsyn ha scritto:
> Looks like it is a bug in KVM. I had a chance to run the same code
> bare-metal ([1], line 310 is uncommented for bare-metal case but present
> for nested SVM), and it seems to work as expected. However, When I trace
> it in nested SVM setup, after some successful APIC reads and writes, I
> get the following:

Valentine, can you produce another trace, this time with both kvm and
kvmmmu events enabled?

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests] x86: svm: fix typo in setting up nested page tables

2014-09-01 Thread Paolo Bonzini
This will cause problems when accessing memory above the first GB, as
in the l1mmio test.

Signed-off-by: Paolo Bonzini 
---
 x86/svm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/x86/svm.c b/x86/svm.c
index 1e6908a..00b3191 100644
--- a/x86/svm.c
+++ b/x86/svm.c
@@ -87,7 +87,7 @@ static void setup_svm(void)
 page = alloc_page();
 
 for (j = 0; j < 512; ++j)
-page[j] = (u64)pte[(i * 514) + j] | 0x027ULL;
+page[j] = (u64)pte[(i * 512) + j] | 0x027ULL;
 
 pde[i] = page;
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests] x86: svm: test reading the LVR register twice

2014-09-01 Thread Paolo Bonzini
The first read will fill the nested page table, the second will find a reserved
bit set.

Signed-off-by: Paolo Bonzini 
---
 x86/svm.c |   13 -
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/x86/svm.c b/x86/svm.c
index cb4c736..1e6908a 100644
--- a/x86/svm.c
+++ b/x86/svm.c
@@ -802,20 +802,23 @@ static void npt_l1mmio_prepare(struct test *test)
 vmcb_ident(test->vmcb);
 }
 
-u32 nested_apic_version;
+u32 nested_apic_version1;
+u32 nested_apic_version2;
 
 static void npt_l1mmio_test(struct test *test)
 {
-u32 *data = (void*)(0xfee00030UL);
+volatile u32 *data = (volatile void*)(0xfee00030UL);
 
-nested_apic_version = *data;
+nested_apic_version1 = *data;
+nested_apic_version2 = *data;
 }
 
 static bool npt_l1mmio_check(struct test *test)
 {
-u32 *data = (void*)(0xfee00030);
+volatile u32 *data = (volatile void*)(0xfee00030);
+u32 lvr = *data;
 
-return (nested_apic_version == *data);
+return nested_apic_version1 == lvr && nested_apic_version2 == lvr;
 }
 
 static void latency_prepare(struct test *test)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested paging in nested SVM setup

2014-09-01 Thread Paolo Bonzini
Il 22/08/2014 06:33, Valentine Sinitsyn ha scritto:
> On 22.08.2014 02:31, Paolo Bonzini wrote:
>> VMX used the right access size already, the tests are separate for VMX
>> and SVM.
> Sure. So the bug is NPT-specific?

Hmm, unfortunately the test cannot reproduce the bug, at least with 3.16.
It only failed due to a (somewhat unbelievable...) typo:

diff --git a/x86/svm.c b/x86/svm.c
index 54d804b..ca1e64e 100644
--- a/x86/svm.c
+++ b/x86/svm.c
@@ -87,7 +87,7 @@ static void setup_svm(void)
 page = alloc_page();
 
 for (j = 0; j < 512; ++j)
-page[j] = (u64)pte[(i * 514) + j] | 0x027ULL;
+page[j] = (u64)pte[(i * 512) + j] | 0x027ULL;
 
 pde[i] = page;
 }

The trace correctly points at APIC_LVR for both the guest read:

 qemu-system-x86-23749 [019]  6718.397998: kvm_exit: reason npf rip 
0x4003ba info 10004 fee00030
 qemu-system-x86-23749 [019]  6718.397998: kvm_nested_vmexit:rip: 
0x004003ba reason: npf ext_inf1: 0x00010004 ext_inf2: 
0xfee00030 ext_int: 0x ext_int_err: 0x
 qemu-system-x86-23749 [019]  6718.397999: kvm_page_fault:   address 
fee00030 error_code 4
 qemu-system-x86-23749 [019]  6718.398009: kvm_emulate_insn: 0:4003ba:a1 30 
00 e0 fe 00 00 00 00 (prot64)
 qemu-system-x86-23749 [019]  6718.398013: kvm_apic: apic_read 
APIC_LVR = 0x1050014
 qemu-system-x86-23749 [019]  6718.398014: kvm_mmio: mmio read len 
4 gpa 0xfee00030 val 0x1050014
 qemu-system-x86-23749 [019]  6718.398015: kvm_entry:vcpu 0

and the host read:

 qemu-system-x86-23749 [019]  6718.398035: kvm_entry:vcpu 0
 qemu-system-x86-23749 [019]  6718.398036: kvm_exit: reason npf rip 
0x4003ca info 1000d fee00030
 qemu-system-x86-23749 [019]  6718.398037: kvm_page_fault:   address 
fee00030 error_code d
 qemu-system-x86-23749 [019]  6718.398039: kvm_emulate_insn: 0:4003ca:a1 30 
00 e0 fe 00 00 00 00 (prot64)
 qemu-system-x86-23749 [019]  6718.398040: kvm_apic: apic_read 
APIC_LVR = 0x1050014
 qemu-system-x86-23749 [019]  6718.398040: kvm_mmio: mmio read len 
4 gpa 0xfee00030 val 0x1050014

The different error codes are because the first read will install the shadow
page.  If I change the test to do two reads, the error codes match.  I will
look at this more closely tomorrow.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Remove the tasklet used by the hrtimer

2014-09-01 Thread Mihai Caraman
Powerpc timer implementation is a copycat version of s390. Now that they removed
the tasklet with commit ea74c0ea1b24a6978a6ebc80ba4dbc7b7848b32d follow this
optimization.

Signed-off-by: Mihai Caraman 
Signed-off-by: Bogdan Purcareata 
---
 arch/powerpc/include/asm/kvm_host.h | 1 -
 arch/powerpc/include/asm/kvm_ppc.h  | 2 +-
 arch/powerpc/kvm/book3s.c   | 4 +---
 arch/powerpc/kvm/booke.c| 4 +---
 arch/powerpc/kvm/powerpc.c  | 8 +---
 5 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index cc11aed..3502649 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -611,7 +611,6 @@ struct kvm_vcpu_arch {
u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
struct hrtimer dec_timer;
-   struct tasklet_struct tasklet;
u64 dec_jiffies;
u64 dec_expires;
unsigned long pending_exceptions;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index fb86a22..1117360 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -89,7 +89,7 @@ extern int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu);
 extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu);
 extern void kvmppc_emulate_dec(struct kvm_vcpu *vcpu);
 extern u32 kvmppc_get_dec(struct kvm_vcpu *vcpu, u64 tb);
-extern void kvmppc_decrementer_func(unsigned long data);
+extern void kvmppc_decrementer_func(struct kvm_vcpu *vcpu);
 extern int kvmppc_sanity_check(struct kvm_vcpu *vcpu);
 extern int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_subarch_vcpu_uninit(struct kvm_vcpu *vcpu);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 1b5adda..f23b6a5 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -718,10 +718,8 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
*vcpu,
return -EINVAL;
 }
 
-void kvmppc_decrementer_func(unsigned long data)
+void kvmppc_decrementer_func(struct kvm_vcpu *vcpu)
 {
-   struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
-
kvmppc_core_queue_dec(vcpu);
kvm_vcpu_kick(vcpu);
 }
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 831c1b4..a4487f4 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1782,10 +1782,8 @@ void kvmppc_clr_tsr_bits(struct kvm_vcpu *vcpu, u32 
tsr_bits)
update_timer_ints(vcpu);
 }
 
-void kvmppc_decrementer_func(unsigned long data)
+void kvmppc_decrementer_func(struct kvm_vcpu *vcpu)
 {
-   struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
-
if (vcpu->arch.tcr & TCR_ARE) {
vcpu->arch.dec = vcpu->arch.decar;
kvmppc_emulate_dec(vcpu);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 19d4755..02a6e2d 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -658,7 +658,6 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
/* Make sure we're not using the vcpu anymore */
hrtimer_cancel(&vcpu->arch.dec_timer);
-   tasklet_kill(&vcpu->arch.tasklet);
 
kvmppc_remove_vcpu_debugfs(vcpu);
 
@@ -684,16 +683,12 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
return kvmppc_core_pending_dec(vcpu);
 }
 
-/*
- * low level hrtimer wake routine. Because this runs in hardirq context
- * we schedule a tasklet to do the real work.
- */
 enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer)
 {
struct kvm_vcpu *vcpu;
 
vcpu = container_of(timer, struct kvm_vcpu, arch.dec_timer);
-   tasklet_schedule(&vcpu->arch.tasklet);
+   kvmppc_decrementer_func(vcpu);
 
return HRTIMER_NORESTART;
 }
@@ -703,7 +698,6 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
int ret;
 
hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
-   tasklet_init(&vcpu->arch.tasklet, kvmppc_decrementer_func, (ulong)vcpu);
vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup;
vcpu->arch.dec_expires = ~(u64)0;
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial

2014-09-01 Thread Christian Borntraeger
On 01/09/14 16:03, Christian Borntraeger wrote:
> On 01/09/14 15:29, Paolo Bonzini wrote:
>> Il 01/09/2014 15:22, Christian Borntraeger ha scritto:
> If virtio-blk and virtio-serial share an IRQ, the guest operating system 
> has to check each virtqueue for activity. Maybe there is some 
> inefficiency doing that.
> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even 
> if everything is unused.

 That could be the case if MSI is disabled.
>>>
>>> Do the windows virtio drivers enable MSIs, in their inf file?
>>
>> It depends on the version of the drivers, but it is a reasonable guess
>> at what differs between Linux and Windows.  Haoyu, can you give us the
>> output of lspci from a Linux guest?
>>
>> Paolo
> 
> Zhang Haoyu, which virtio drivers did you use?
> 
> I just checked the Fedora virtio driver. The INF file does not contain the 
> MSI enablement as described in
> http://msdn.microsoft.com/en-us/library/windows/hardware/ff544246%28v=vs.85%29.aspx
> That would explain the performance issues - given that the link information 
> is still true.

Sorry, looked at the wrong inf file. The fedora driver does use MSI for serial 
and block.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial

2014-09-01 Thread Christian Borntraeger
On 01/09/14 15:29, Paolo Bonzini wrote:
> Il 01/09/2014 15:22, Christian Borntraeger ha scritto:
 If virtio-blk and virtio-serial share an IRQ, the guest operating system 
 has to check each virtqueue for activity. Maybe there is some inefficiency 
 doing that.
 AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even 
 if everything is unused.
>>>
>>> That could be the case if MSI is disabled.
>>
>> Do the windows virtio drivers enable MSIs, in their inf file?
> 
> It depends on the version of the drivers, but it is a reasonable guess
> at what differs between Linux and Windows.  Haoyu, can you give us the
> output of lspci from a Linux guest?
> 
> Paolo

Zhang Haoyu, which virtio drivers did you use?

I just checked the Fedora virtio driver. The INF file does not contain the MSI 
enablement as described in
http://msdn.microsoft.com/en-us/library/windows/hardware/ff544246%28v=vs.85%29.aspx
That would explain the performance issues - given that the link information is 
still true.



Christian





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial

2014-09-01 Thread Paolo Bonzini
Il 01/09/2014 15:22, Christian Borntraeger ha scritto:
> > > If virtio-blk and virtio-serial share an IRQ, the guest operating system 
> > > has to check each virtqueue for activity. Maybe there is some 
> > > inefficiency doing that.
> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even 
> > > if everything is unused.
> > 
> > That could be the case if MSI is disabled.
> 
> Do the windows virtio drivers enable MSIs, in their inf file?

It depends on the version of the drivers, but it is a reasonable guess
at what differs between Linux and Windows.  Haoyu, can you give us the
output of lspci from a Linux guest?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial

2014-09-01 Thread Christian Borntraeger
On 01/09/14 15:12, Paolo Bonzini wrote:
> Il 01/09/2014 15:09, Christian Borntraeger ha scritto:
>> This is just wild guessing:
>> If virtio-blk and virtio-serial share an IRQ, the guest operating system has 
>> to check each virtqueue for activity. Maybe there is some inefficiency doing 
>> that.
>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if 
>> everything is unused.
> 
> That could be the case if MSI is disabled.
> 
> Paolo
> 

Do the windows virtio drivers enable MSIs, in their inf file?

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 82211] Cannot boot Xen under KVM with X2APIC enabled

2014-09-01 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=82211

Paolo Bonzini  changed:

   What|Removed |Added

 Status|NEW |NEEDINFO

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial

2014-09-01 Thread Paolo Bonzini
Il 01/09/2014 15:09, Christian Borntraeger ha scritto:
> This is just wild guessing:
> If virtio-blk and virtio-serial share an IRQ, the guest operating system has 
> to check each virtqueue for activity. Maybe there is some inefficiency doing 
> that.
> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if 
> everything is unused.

That could be the case if MSI is disabled.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial

2014-09-01 Thread Christian Borntraeger
On 01/09/14 14:52, Zhang Haoyu wrote:
 Hi, all

 I start a VM with virtio-serial (default ports number: 31), and found that 
 virtio-blk performance degradation happened, about 25%, this problem can 
 be reproduced 100%.
 without virtio-serial:
 4k-read-random 1186 IOPS
 with virtio-serial:
 4k-read-random 871 IOPS

 but if use max_ports=2 option to limit the max number of virio-serial 
 ports, then the IO performance degradation is not so serious, about 5%.

 And, ide performance degradation does not happen with virtio-serial.
>>>
>>> Pretty sure it's related to MSI vectors in use.  It's possible that
>>> the virtio-serial device takes up all the avl vectors in the guests,
>>> leaving old-style irqs for the virtio-blk device.
>>>
>> I don't think so,
>> I use iometer to test 64k-read(or write)-sequence case, if I disable the 
>> virtio-serial dynamically via device manager->virtio-serial => disable,
>> then the performance get promotion about 25% immediately, then I re-enable 
>> the virtio-serial via device manager->virtio-serial => enable,
>> the performance got back again, very obvious.
> add comments:
> Although the virtio-serial is enabled, I don't use it at all, the degradation 
> still happened.

This is just wild guessing:
If virtio-blk and virtio-serial share an IRQ, the guest operating system has to 
check each virtqueue for activity. Maybe there is some inefficiency doing that.
AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if 
everything is unused.

Christian


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 82211] Cannot boot Xen under KVM with X2APIC enabled

2014-09-01 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=82211

Paolo Bonzini  changed:

   What|Removed |Added

Summary|[BISECTED][Nested xen on|Cannot boot Xen under KVM
   |kvm] L1 guest panic and |with X2APIC enabled
   |reboot when L1 guest boot   |
   |up. |

--- Comment #4 from Paolo Bonzini  ---
... but couldn't reproduce the bisection results.  It fails for me in all three
of 3.16, 3.17 and RHEL6.

Maybe the bisection result is specific to a particular KVM module parameter,
for example enable_apicv=1?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded

2014-09-01 Thread Eric Auger
In case the IRQ is forwarded, the VFIO platform IRQ handler does not
need to disable the IRQ anymore. In that mode, when the handler completes
the IRQ is not deactivated but only its priority is lowered.

Some other actor (typically a guest) is supposed to deactivate the IRQ,
allowing at that time a new physical IRQ to hit.

In virtualization use case, the physical IRQ is automatically completed
by the interrupt controller when the guest completes the corresponding
virtual IRQ.

Signed-off-by: Eric Auger 
---
 drivers/vfio/platform/vfio_platform_irq.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 6768508..1f851b2 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
struct vfio_platform_irq *irq_ctx = dev_id;
unsigned long flags;
int ret = IRQ_NONE;
+   struct irq_data *d;
+   bool is_forwarded;
 
spin_lock_irqsave(&irq_ctx->lock, flags);
 
if (!irq_ctx->masked) {
ret = IRQ_HANDLED;
+   d = irq_get_irq_data(irq_ctx->hwirq);
+   is_forwarded = irqd_irq_forwarded(d);
 
-   if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
+   if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
+   !is_forwarded) {
disable_irq_nosync(irq_ctx->hwirq);
irq_ctx->masked = true;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 1/9] KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded IRQ

2014-09-01 Thread Eric Auger
Fix multiple injection of level sensitive forwarded IRQs.
With current code, the second injection fails since the state bitmaps
are not reset (process_maintenance is not called anymore).
New implementation consists in fully bypassing the vgic state
management for forwarded IRQ (checks are ignored in
vgic_update_irq_pending). This obviously assumes the forwarded IRQ is
injected from kernel side.

Signed-off-by: Eric Auger 

---

It was attempted to reset the states in __kvm_vgic_sync_hwstate, checking
the emptied LR of forwarded IRQ. However surprisingly this solution does
not seem to work. Some times, a new forwarded IRQ injection is observed
while the LR of the previous instance was not observed as empty.

v1 -> v2:
- fix vgic state bypass in vgic_queue_hwirq
---
 virt/kvm/arm/vgic.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 0007300..8ef495b 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1259,7 +1259,9 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq)
 
 static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
 {
-   if (vgic_irq_is_queued(vcpu, irq))
+   bool is_forwarded =  (vgic_get_phys_irq(vcpu, irq) > 0);
+
+   if (vgic_irq_is_queued(vcpu, irq) && !is_forwarded)
return true; /* level interrupt, already queued */
 
if (vgic_queue_irq(vcpu, 0, irq)) {
@@ -1517,14 +1519,18 @@ static bool vgic_update_irq_pending(struct kvm *kvm, 
int cpuid,
int edge_triggered, level_triggered;
int enabled;
bool ret = true;
+   bool is_forwarded;
 
spin_lock(&dist->lock);
 
vcpu = kvm_get_vcpu(kvm, cpuid);
+   is_forwarded = (vgic_get_phys_irq(vcpu, irq_num) > 0);
+
edge_triggered = vgic_irq_is_edge(vcpu, irq_num);
level_triggered = !edge_triggered;
 
-   if (!vgic_validate_injection(vcpu, irq_num, level)) {
+   if (!is_forwarded &&
+   !vgic_validate_injection(vcpu, irq_num, level)) {
ret = false;
goto out;
}
@@ -1557,7 +1563,8 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int 
cpuid,
goto out;
}
 
-   if (level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
+   if (!is_forwarded &&
+   level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
/*
 * Level interrupt in progress, will be picked up
 * when EOId.
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 3/9] ARM: KVM: Enable the KVM-VFIO device

2014-09-01 Thread Eric Auger
From: Kim Phillips 

Used by KVM-enabled VFIO-based device passthrough support in QEMU.

Signed-off-by: Kim Phillips 
---
 arch/arm/kvm/Kconfig  | 1 +
 arch/arm/kvm/Makefile | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index e519a40..aace254 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -24,6 +24,7 @@ config KVM
select KVM_MMIO
select KVM_ARM_HOST
depends on ARM_VIRT_EXT && ARM_LPAE
+   select KVM_VFIO
select HAVE_KVM_EVENTFD
---help---
  Support hosting virtualized guest machines. You will also
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 859db09..ea1fa76 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
 AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
 
 KVM := ../../../virt/kvm
-kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
+kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o 
$(KVM)/vfio.o
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 2/9] KVM: ARM: VGIC: add forwarded irq rbtree lock

2014-09-01 Thread Eric Auger
add a lock related to the rb tree manipulation. The rb tree can be
searched in one thread (irqfd handler for instance) and map/unmap
happen in another.

Signed-off-by: Eric Auger 
---
 include/kvm/arm_vgic.h |  1 +
 virt/kvm/arm/vgic.c| 46 +-
 2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 743020f..3da244f 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -177,6 +177,7 @@ struct vgic_dist {
unsigned long   irq_pending_on_cpu;
 
struct rb_root  irq_phys_map;
+   spinlock_t  rb_tree_lock;
 #endif
 };
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 8ef495b..dbc2a5a 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1630,9 +1630,15 @@ static struct rb_root *vgic_get_irq_phys_map(struct 
kvm_vcpu *vcpu,
 
 int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
 {
-   struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
-   struct rb_node **new = &root->rb_node, *parent = NULL;
+   struct rb_root *root;
+   struct rb_node **new, *parent = NULL;
struct irq_phys_map *new_map;
+   struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+
+   spin_lock(&dist->rb_tree_lock);
+
+   root = vgic_get_irq_phys_map(vcpu, virt_irq);
+   new = &root->rb_node;
 
/* Boilerplate rb_tree code */
while (*new) {
@@ -1644,13 +1650,17 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int 
virt_irq, int phys_irq)
new = &(*new)->rb_left;
else if (this->virt_irq > virt_irq)
new = &(*new)->rb_right;
-   else
+   else {
+   spin_unlock(&dist->rb_tree_lock);
return -EEXIST;
+   }
}
 
new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
-   if (!new_map)
+   if (!new_map) {
+   spin_unlock(&dist->rb_tree_lock);
return -ENOMEM;
+   }
 
new_map->virt_irq = virt_irq;
new_map->phys_irq = phys_irq;
@@ -1658,6 +1668,8 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int 
virt_irq, int phys_irq)
rb_link_node(&new_map->node, parent, new);
rb_insert_color(&new_map->node, root);
 
+   spin_unlock(&dist->rb_tree_lock);
+
return 0;
 }
 
@@ -1685,24 +1697,39 @@ static struct irq_phys_map *vgic_irq_map_search(struct 
kvm_vcpu *vcpu,
 
 int vgic_get_phys_irq(struct kvm_vcpu *vcpu, int virt_irq)
 {
-   struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
+   struct irq_phys_map *map;
+   struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+   int ret;
+
+   spin_lock(&dist->rb_tree_lock);
+   map = vgic_irq_map_search(vcpu, virt_irq);
 
if (map)
-   return map->phys_irq;
+   ret = map->phys_irq;
+   else
+   ret =  -ENOENT;
+
+   spin_unlock(&dist->rb_tree_lock);
+   return ret;
 
-   return -ENOENT;
 }
 
 int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
 {
-   struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
+   struct irq_phys_map *map;
+   struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+
+   spin_lock(&dist->rb_tree_lock);
+
+   map = vgic_irq_map_search(vcpu, virt_irq);
 
if (map && map->phys_irq == phys_irq) {
rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, virt_irq));
kfree(map);
+   spin_unlock(&dist->rb_tree_lock);
return 0;
}
-
+   spin_unlock(&dist->rb_tree_lock);
return -ENOENT;
 }
 
@@ -1898,6 +1925,7 @@ int kvm_vgic_create(struct kvm *kvm)
}
 
spin_lock_init(&kvm->arch.vgic.lock);
+   spin_lock_init(&kvm->arch.vgic.rb_tree_lock);
kvm->arch.vgic.in_kernel = true;
kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 6/9] VFIO: Extend external user API

2014-09-01 Thread Eric Auger
New functions are added to be called from ARM KVM-VFIO device.

- vfio_device_get_external_user enables to get a vfio device from
  its fd
- vfio_device_put_external_user puts the vfio device
- vfio_external_base_device returns the struct device*,
  useful to access the platform_device

Signed-off-by: Eric Auger 

---

v1 -> v2:

- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed
---
 drivers/vfio/vfio.c  | 24 
 include/linux/vfio.h |  3 +++
 2 files changed, 27 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 8e84471..282814e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1401,6 +1401,30 @@ void vfio_group_put_external_user(struct vfio_group 
*group)
 }
 EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
 
+struct vfio_device *vfio_device_get_external_user(struct file *filep)
+{
+   struct vfio_device *vdev = filep->private_data;
+
+   if (filep->f_op != &vfio_device_fops)
+   return ERR_PTR(-EINVAL);
+
+   vfio_device_get(vdev);
+   return vdev;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
+
+void vfio_device_put_external_user(struct vfio_device *vdev)
+{
+   vfio_device_put(vdev);
+}
+EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
+
+struct device *vfio_external_base_device(struct vfio_device *vdev)
+{
+   return vdev->dev;
+}
+EXPORT_SYMBOL_GPL(vfio_external_base_device);
+
 int vfio_external_user_iommu_id(struct vfio_group *group)
 {
return iommu_group_id(group->iommu_group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ffe04ed..bd4b6cb 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -99,6 +99,9 @@ extern void vfio_group_put_external_user(struct vfio_group 
*group);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
 extern long vfio_external_check_extension(struct vfio_group *group,
  unsigned long arg);
+extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
+extern void vfio_device_put_external_user(struct vfio_device *vdev);
+extern struct device *vfio_external_base_device(struct vfio_device *vdev);
 
 struct pci_dev;
 #ifdef CONFIG_EEH
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 5/9] KVM: KVM-VFIO: update user API to program forwarded IRQ

2014-09-01 Thread Eric Auger
add new device group commands:
- KVM_DEV_VFIO_DEVICE_FORWARD_IRQ and
  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ

which enable to turn forwarded IRQ mode on/off.

the kvm_arch_forwarded_irq struct embodies a forwarded IRQ

Signed-off-by: Eric Auger 

---

v1 -> v2:
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
  to include/uapi/linux/kvm.h
  also irq_index renamed into index and guest_irq renamed into gsi
- ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
---
 Documentation/virtual/kvm/devices/vfio.txt | 26 ++
 include/uapi/linux/kvm.h   |  9 +
 2 files changed, 35 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
b/Documentation/virtual/kvm/devices/vfio.txt
index ef51740..048baa0 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -13,6 +13,7 @@ VFIO-group is held by KVM.
 
 Groups:
   KVM_DEV_VFIO_GROUP
+  KVM_DEV_VFIO_DEVICE
 
 KVM_DEV_VFIO_GROUP attributes:
   KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
@@ -20,3 +21,28 @@ KVM_DEV_VFIO_GROUP attributes:
 
 For each, kvm_device_attr.addr points to an int32_t file descriptor
 for the VFIO group.
+
+KVM_DEV_VFIO_DEVICE attributes:
+  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ
+  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
+
+For each, kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct.
+This user API makes possible to create a special IRQ handling mode,
+where KVM and a VFIO platform driver collaborate to improve IRQ
+handling performance.
+
+fd represents the file descriptor of a valid VFIO device whose physical
+IRQ, referenced by its index, is injected into the VM guest irq (gsi).
+
+On FORWARD_IRQ, KVM-VFIO device programs:
+- the host, to not complete the physical IRQ itself.
+- the GIC, to automatically complete the physical IRQ when the guest
+  completes the virtual IRQ.
+This avoids trapping the end-of-interrupt for level sensitive IRQ.
+
+On UNFORWARD_IRQ, one returns to the mode where the host completes the
+physical IRQ and the guest completes the virtual IRQ.
+
+It is up to the caller of this API to make sure the IRQ is not
+outstanding when the FORWARD/UNFORWARD is called. This could lead to
+some inconsistency on who is going to complete the IRQ.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index cf3a2ff..8cd7b0e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -947,6 +947,12 @@ struct kvm_device_attr {
__u64   addr;   /* userspace address of attr data */
 };
 
+struct kvm_arch_forwarded_irq {
+   __u32 fd; /* file desciptor of the VFIO device */
+   __u32 index; /* VFIO device IRQ index */
+   __u32 gsi; /* gsi, ie. virtual IRQ number */
+};
+
 #define KVM_DEV_TYPE_FSL_MPIC_20   1
 #define KVM_DEV_TYPE_FSL_MPIC_42   2
 #define KVM_DEV_TYPE_XICS  3
@@ -954,6 +960,9 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_GROUP1
 #define   KVM_DEV_VFIO_GROUP_ADD   1
 #define   KVM_DEV_VFIO_GROUP_DEL   2
+#define  KVM_DEV_VFIO_DEVICE   2
+#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ  1
+#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ2
 #define KVM_DEV_TYPE_ARM_VGIC_V2   5
 #define KVM_DEV_TYPE_FLIC  6
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 0/9] KVM-VFIO IRQ forward control

2014-09-01 Thread Eric Auger
This RFC proposes an integration of "ARM: Forwarding physical
interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
KVM.

It enables to transform a VFIO platform driver IRQ into a forwarded
IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
switch can be avoided on guest virtual IRQ completion. Before this
patch, a maintenance IRQ was triggered on the virtual IRQ completion.

When the IRQ is forwarded, the VFIO platform driver does not need to
disable the IRQ anymore. Indeed when returning from the IRQ handler
the IRQ is not deactivated. Only its priority is lowered. This means
the same IRQ cannot hit before the guest completes the virtual IRQ
and the GIC automatically deactivates the corresponding physical IRQ.

Besides, the injection still is based on irqfd triggering. The only
impact on irqfd process is resamplefd is not called anymore on
virtual IRQ completion since this latter becomes "transparent".

The current integration is based on an extension of the KVM-VFIO
device, previously used by KVM to interact with VFIO groups. The
patch serie now enables KVM to directly interact with a VFIO
platform device. The VFIO external API was extended for that purpose.

Th KVM-VFIO device can get/put the vfio platform device, check its
integrity and type, get the IRQ number associated to an IRQ index.

The IRQ forward programming is architecture specific (virtual interrupt
controller programming basically). However the whole infrastructure is
kept generic.

from a user point of view, the functionality is provided through new
KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
and the capability can be checked with KVM_HAS_DEVICE_ATTR.
Assignment can only be changed when the physical IRQ is not active.
It is the responsability of the user to do this check.

This patch serie has the following dependencies:
- "ARM: Forwarding physical interrupts to a guest VM"
  (http://lwn.net/Articles/603514/) in
- [PATCH v3] irqfd for ARM
- and obviously the VFIO platform driver serie:
  [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
  https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html

Integrated pieces can be found at
ssh://git.linaro.org/people/eric.auger/linux.git
on branch 3.17rc3_irqfd_forward_integ_v2

This was was tested on Calxeda Midway, assigning the xgmac main IRQ.

v1 -> v2:
- forward control is moved from architecture specific file into generic
  vfio.c module.
  only kvm_arch_set_fwd_state remains architecture specific
- integrate Kim's patch which enables KVM-VFIO for ARM
- fix vgic state bypass in vgic_queue_hwirq
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
  to include/uapi/linux/kvm.h
  also irq_index renamed into index and guest_irq renamed into gsi
- ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed
- kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD

Eric Auger (8):
  KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
IRQ
  KVM: ARM: VGIC: add forwarded irq rbtree lock
  VFIO: platform: handler tests whether the IRQ is forwarded
  KVM: KVM-VFIO: update user API to program forwarded IRQ
  VFIO: Extend external user API
  KVM: KVM-VFIO: add new VFIO external API hooks
  KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
control
  KVM: KVM-VFIO: ARM forwarding control

Kim Phillips (1):
  ARM: KVM: Enable the KVM-VFIO device

 Documentation/virtual/kvm/devices/vfio.txt |  26 ++
 arch/arm/include/asm/kvm_host.h|   7 +
 arch/arm/kvm/Kconfig   |   1 +
 arch/arm/kvm/Makefile  |   4 +-
 arch/arm/kvm/kvm_vfio_arm.c|  85 +
 drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
 drivers/vfio/vfio.c|  24 ++
 include/kvm/arm_vgic.h |   1 +
 include/linux/kvm_host.h   |  27 ++
 include/linux/vfio.h   |   3 +
 include/uapi/linux/kvm.h   |   9 +
 virt/kvm/arm/vgic.c|  59 +++-
 virt/kvm/vfio.c| 497 -
 13 files changed, 733 insertions(+), 17 deletions(-)
 create mode 100644 arch/arm/kvm/kvm_vfio_arm.c

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial

2014-09-01 Thread Zhang Haoyu
>> >> Hi, all
>> >> 
>> >> I start a VM with virtio-serial (default ports number: 31), and found 
>> >> that virtio-blk performance degradation happened, about 25%, this problem 
>> >> can be reproduced 100%.
>> >> without virtio-serial:
>> >> 4k-read-random 1186 IOPS
>> >> with virtio-serial:
>> >> 4k-read-random 871 IOPS
>> >> 
>> >> but if use max_ports=2 option to limit the max number of virio-serial 
>> >> ports, then the IO performance degradation is not so serious, about 5%.
>> >> 
>> >> And, ide performance degradation does not happen with virtio-serial.
>> >
>> >Pretty sure it's related to MSI vectors in use.  It's possible that
>> >the virtio-serial device takes up all the avl vectors in the guests,
>> >leaving old-style irqs for the virtio-blk device.
>> >
>> I don't think so,
>> I use iometer to test 64k-read(or write)-sequence case, if I disable the 
>> virtio-serial dynamically via device manager->virtio-serial => disable,
>> then the performance get promotion about 25% immediately, then I re-enable 
>> the virtio-serial via device manager->virtio-serial => enable,
>> the performance got back again, very obvious.
>> So, I think it has no business with legacy interrupt mode, right?
>> 
>> I am going to observe the difference of perf top data on qemu and perf kvm 
>> stat data when disable/enable virtio-serial in guest,
>> and the difference of perf top data on guest when disable/enable 
>> virtio-serial in guest,
>> any ideas?
>
>So it's a windows guest; it could be something windows driver
>specific, then?  Do you see the same on Linux guests too?
>
I suspect windows driver specific, too.
I have not test linux guest, I'll test it later.

Thanks,
Zhang Haoyu
>   Amit

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 7/9] KVM: KVM-VFIO: add new VFIO external API hooks

2014-09-01 Thread Eric Auger
add functions that implement the gateway to the extended
external VFIO API:
- kvm_vfio_device_get_external_user
- kvm_vfio_device_put_external_user
- kvm_vfio_external_base_device

Signed-off-by: Eric Auger 

---

v1 -> v2:
- kvm_vfio_external_get_base_device renamed into
  kvm_vfio_external_base_device
- kvm_vfio_external_get_type removed
---
 arch/arm/include/asm/kvm_host.h |  5 +
 virt/kvm/vfio.c | 45 +
 2 files changed, 50 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..1aee6bb 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -171,6 +171,11 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, 
pte_t pte);
 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 
+struct vfio_device;
+struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep);
+void kvm_vfio_device_put_external_user(struct vfio_device *vdev);
+struct device *kvm_vfio_external_base_device(struct vfio_device *vdev);
+
 /* We do not have shadow page tables, hence the empty hooks */
 static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
 {
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index ba1a93f..76dc7a1 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -59,6 +59,51 @@ static void kvm_vfio_group_put_external_user(struct 
vfio_group *vfio_group)
symbol_put(vfio_group_put_external_user);
 }
 
+struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
+{
+   struct vfio_device *vdev;
+   struct vfio_device *(*fn)(struct file *);
+
+   fn = symbol_get(vfio_device_get_external_user);
+   if (!fn)
+   return ERR_PTR(-EINVAL);
+
+   vdev = fn(filep);
+
+   symbol_put(vfio_device_get_external_user);
+
+   return vdev;
+}
+
+void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
+{
+   void (*fn)(struct vfio_device *);
+
+   fn = symbol_get(vfio_device_put_external_user);
+   if (!fn)
+   return;
+
+   fn(vdev);
+
+   symbol_put(vfio_device_put_external_user);
+}
+
+struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
+{
+   struct device *(*fn)(struct vfio_device *);
+   struct device *dev;
+
+   fn = symbol_get(vfio_external_base_device);
+   if (!fn)
+   return NULL;
+
+   dev = fn(vdev);
+
+   symbol_put(vfio_external_base_device);
+
+   return dev;
+}
+
 static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
 {
long (*fn)(struct vfio_group *, unsigned long);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control

2014-09-01 Thread Eric Auger
This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.

This is a new control channel which enables KVM to cooperate with
viable VFIO devices.

The kvm-vfio device now holds a list of devices (kvm_vfio_device)
in addition to a list of groups (kvm_vfio_group). The new
infrastructure enables to check the validity of the VFIO device
file descriptor, get and hold a reference to it.

The first concrete implemented command is IRQ forward control:
KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.

It consists in programing the VFIO driver and KVM in a consistent manner
so that an optimized IRQ injection/completion is set up. Each
kvm_vfio_device holds a list of forwarded IRQ. When putting a
kvm_vfio_device, the implementation makes sure the forwarded IRQs
are set again in the normal handling state (non forwarded).

The forwarding programmming is architecture specific, embodied by the
kvm_arch_set_fwd_state function. Its implementation is given in a
separate patch file.

The forwarding control modality is enabled by the
__KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.

Signed-off-by: Eric Auger 

---

v1 -> v2:
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
- original patch file separated into 2 parts: generic part moved in vfio.c
  and ARM specific part(kvm_arch_set_fwd_state)
---
 include/linux/kvm_host.h |  27 +++
 virt/kvm/vfio.c  | 452 ++-
 2 files changed, 477 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..24350dc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1065,6 +1065,21 @@ struct kvm_device_ops {
  unsigned long arg);
 };
 
+enum kvm_fwd_irq_action {
+   KVM_VFIO_IRQ_SET_FORWARD,
+   KVM_VFIO_IRQ_SET_NORMAL,
+   KVM_VFIO_IRQ_CLEANUP,
+};
+
+/* internal structure describing a forwarded IRQ */
+struct kvm_fwd_irq {
+   struct list_head link;
+   __u32 index; /* platform device irq index */
+   __u32 hwirq; /*physical IRQ */
+   __u32 gsi; /* virtual IRQ */
+   struct kvm_vcpu *vcpu; /* vcpu to inject into*/
+};
+
 void kvm_device_get(struct kvm_device *dev);
 void kvm_device_put(struct kvm_device *dev);
 struct kvm_device *kvm_device_from_filp(struct file *filp);
@@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
 extern struct kvm_device_ops kvm_flic_ops;
 
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
+  enum kvm_fwd_irq_action action);
+
+#else
+static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
+enum kvm_fwd_irq_action action)
+{
+   return 0;
+}
+#endif
+
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
 static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 76dc7a1..e4a81c4 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -18,14 +18,24 @@
 #include 
 #include 
 #include 
+#include 
 
 struct kvm_vfio_group {
struct list_head node;
struct vfio_group *vfio_group;
 };
 
+struct kvm_vfio_device {
+   struct list_head node;
+   struct vfio_device *vfio_device;
+   /* list of forwarded IRQs for that VFIO device */
+   struct list_head fwd_irq_list;
+   int fd;
+};
+
 struct kvm_vfio {
struct list_head group_list;
+   struct list_head device_list;
struct mutex lock;
bool noncoherent;
 };
@@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, 
long attr, u64 arg)
return -ENXIO;
 }
 
+/**
+ * get_vfio_device - returns the vfio-device corresponding to this fd
+ * @fd:fd of the vfio platform device
+ *
+ * checks it is a vfio device
+ * increment its ref counter
+ */
+static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
+{
+   struct fd f;
+   struct vfio_device *vdev;
+
+   f = fdget(fd);
+   if (!f.file)
+   return NULL;
+   vdev = kvm_vfio_device_get_external_user(f.file);
+   fdput(f);
+   return vdev;
+}
+
+/**
+ * put_vfio_device: put the vfio platform device
+ * @vdev: vfio_device to put
+ *
+ * decrement the ref counter
+ */
+static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
+{
+   kvm_vfio_device_put_external_user(vdev);
+}
+
+/**
+ * kvm_vfio_find_device - look for the device in the assigned
+ * device list
+ * @kv: the kvm-vfio device
+ * @vdev: the vfio_device to look for
+ *
+ * returns the associated kvm_vfio_device if the device is known,
+ * meaning at least 1 IRQ is forwarded for this device.
+ * in the device is not registered, returns NULL.
+ */
+struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
+struct vfio_device *vdev)
+{
+   struct kvm_vfio_device *kvm_vdev_iter;
+
+ 

[RFC v2 9/9] KVM: KVM-VFIO: ARM forwarding control

2014-09-01 Thread Eric Auger
Enables forwarding control for ARM. By defining
__KVM_HAVE_ARCH_KVM_VFIO_FORWARD the patch enables
KVM_DEV_VFIO_DEVICE_FORWARD/UNFORWARD_IRQ command on ARM. As a
result it brings an optimized injection/completion handling for
forwarded IRQ. The ARM specific part is implemented in a new module,
kvm_vfio_arm.c

Signed-off-by: Eric Auger 
---
 arch/arm/include/asm/kvm_host.h |  2 +
 arch/arm/kvm/Makefile   |  2 +-
 arch/arm/kvm/kvm_vfio_arm.c | 85 +
 3 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/kvm/kvm_vfio_arm.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 1aee6bb..dfd3b05 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -25,6 +25,8 @@
 #include 
 #include 
 
+#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
 #else
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index ea1fa76..26a5a42 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
$(KVM)/eventfd.o $(KVM)/vf
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
+obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o kvm_vfio_arm.o
 obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
 obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
 obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
diff --git a/arch/arm/kvm/kvm_vfio_arm.c b/arch/arm/kvm/kvm_vfio_arm.c
new file mode 100644
index 000..0d316b1
--- /dev/null
+++ b/arch/arm/kvm/kvm_vfio_arm.c
@@ -0,0 +1,85 @@
+/*
+ * Copyright (C) 2014 Linaro Ltd.
+ * Authors: Eric Auger 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * kvm_arch_set_fwd_state - change the forwarded state of an IRQ
+ * @pfwd: the forwarded irq struct
+ * @action: action to perform (set forward, set back normal, cleanup)
+ *
+ * programs the GIC and VGIC
+ * returns the VGIC map/unmap return status
+ * It is the responsability of the caller to make sure the physical IRQ
+ * is not active. there is a critical section between the start of the
+ * VFIO IRQ handler and LR programming.
+ */
+int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
+  enum kvm_fwd_irq_action action)
+{
+   int ret;
+   struct irq_desc *desc = irq_to_desc(pfwd->hwirq);
+   struct irq_data *d = &desc->irq_data;
+   struct irq_chip *chip = desc->irq_data.chip;
+
+   disable_irq(pfwd->hwirq);
+   /* no fwd state change can happen if the IRQ is in progress */
+   if (irqd_irq_inprogress(d)) {
+   kvm_err("%s cannot change fwd state (IRQ %d in progress\n",
+   __func__, pfwd->hwirq);
+   enable_irq(pfwd->hwirq);
+   return -1;
+   }
+
+   if (action == KVM_VFIO_IRQ_SET_FORWARD) {
+   irqd_set_irq_forwarded(d);
+   ret = vgic_map_phys_irq(pfwd->vcpu,
+   pfwd->gsi + VGIC_NR_PRIVATE_IRQS,
+   pfwd->hwirq);
+   } else if (action == KVM_VFIO_IRQ_SET_NORMAL) {
+   irqd_clr_irq_forwarded(d);
+   ret = vgic_unmap_phys_irq(pfwd->vcpu,
+ pfwd->gsi +
+   VGIC_NR_PRIVATE_IRQS,
+ pfwd->hwirq);
+   } else if (action == KVM_VFIO_IRQ_CLEANUP) {
+   irqd_clr_irq_forwarded(d);
+   /*
+* in case the guest did not complete the
+* virtual IRQ, let's do it for him.
+* when cleanup is called, VCPU have already
+* been freed, do not manipulate VGIC
+*/
+   chip->irq_eoi(d);
+   ret = 0;
+   } else {
+   enable_irq(pfwd->hwirq);
+   ret = -EINVAL;
+   }
+
+   enable_irq(pfwd->hwirq);
+   return ret;
+}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial

2014-09-01 Thread Zhang Haoyu
>>> Hi, all
>>> 
>>> I start a VM with virtio-serial (default ports number: 31), and found that 
>>> virtio-blk performance degradation happened, about 25%, this problem can be 
>>> reproduced 100%.
>>> without virtio-serial:
>>> 4k-read-random 1186 IOPS
>>> with virtio-serial:
>>> 4k-read-random 871 IOPS
>>> 
>>> but if use max_ports=2 option to limit the max number of virio-serial 
>>> ports, then the IO performance degradation is not so serious, about 5%.
>>> 
>>> And, ide performance degradation does not happen with virtio-serial.
>>
>>Pretty sure it's related to MSI vectors in use.  It's possible that
>>the virtio-serial device takes up all the avl vectors in the guests,
>>leaving old-style irqs for the virtio-blk device.
>>
>I don't think so,
>I use iometer to test 64k-read(or write)-sequence case, if I disable the 
>virtio-serial dynamically via device manager->virtio-serial => disable,
>then the performance get promotion about 25% immediately, then I re-enable the 
>virtio-serial via device manager->virtio-serial => enable,
>the performance got back again, very obvious.
add comments:
Although the virtio-serial is enabled, I don't use it at all, the degradation 
still happened.

>So, I think it has no business with legacy interrupt mode, right?
>
>I am going to observe the difference of perf top data on qemu and perf kvm 
>stat data when disable/enable virtio-serial in guest,
>and the difference of perf top data on guest when disable/enable virtio-serial 
>in guest,
>any ideas?
>
>Thanks,
>Zhang Haoyu
>>If you restrict the number of vectors the virtio-serial device gets
>>(using the -device virtio-serial-pci,vectors= param), does that make
>>things better for you?
>>
>>
>>  Amit

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial

2014-09-01 Thread Amit Shah
On (Mon) 01 Sep 2014 [20:38:20], Zhang Haoyu wrote:
> >> Hi, all
> >> 
> >> I start a VM with virtio-serial (default ports number: 31), and found that 
> >> virtio-blk performance degradation happened, about 25%, this problem can 
> >> be reproduced 100%.
> >> without virtio-serial:
> >> 4k-read-random 1186 IOPS
> >> with virtio-serial:
> >> 4k-read-random 871 IOPS
> >> 
> >> but if use max_ports=2 option to limit the max number of virio-serial 
> >> ports, then the IO performance degradation is not so serious, about 5%.
> >> 
> >> And, ide performance degradation does not happen with virtio-serial.
> >
> >Pretty sure it's related to MSI vectors in use.  It's possible that
> >the virtio-serial device takes up all the avl vectors in the guests,
> >leaving old-style irqs for the virtio-blk device.
> >
> I don't think so,
> I use iometer to test 64k-read(or write)-sequence case, if I disable the 
> virtio-serial dynamically via device manager->virtio-serial => disable,
> then the performance get promotion about 25% immediately, then I re-enable 
> the virtio-serial via device manager->virtio-serial => enable,
> the performance got back again, very obvious.
> So, I think it has no business with legacy interrupt mode, right?
> 
> I am going to observe the difference of perf top data on qemu and perf kvm 
> stat data when disable/enable virtio-serial in guest,
> and the difference of perf top data on guest when disable/enable 
> virtio-serial in guest,
> any ideas?

So it's a windows guest; it could be something windows driver
specific, then?  Do you see the same on Linux guests too?

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 82211] [BISECTED][Nested xen on kvm] L1 guest panic and reboot when L1 guest boot up.

2014-09-01 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=82211

Paolo Bonzini  changed:

   What|Removed |Added

 CC||bonz...@gnu.org

--- Comment #3 from Paolo Bonzini  ---
Reproduced.  This is caused by "-cpu host" and, in particular by x2apic.  This
command line fails:

/usr/libexec/qemu-kvm \
-kernel xen-4.4.0 \
-append 'noreboot loglvl=all com1=115200,8n1 console=com1' \
-serial mon:stdio \
-initrd /boot/vmlinuz-2.6.18-348.el5xen -cpu kvm64,+x2apic

It works with "-cpu kvm64".

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial

2014-09-01 Thread Zhang Haoyu
>> Hi, all
>> 
>> I start a VM with virtio-serial (default ports number: 31), and found that 
>> virtio-blk performance degradation happened, about 25%, this problem can be 
>> reproduced 100%.
>> without virtio-serial:
>> 4k-read-random 1186 IOPS
>> with virtio-serial:
>> 4k-read-random 871 IOPS
>> 
>> but if use max_ports=2 option to limit the max number of virio-serial ports, 
>> then the IO performance degradation is not so serious, about 5%.
>> 
>> And, ide performance degradation does not happen with virtio-serial.
>
>Pretty sure it's related to MSI vectors in use.  It's possible that
>the virtio-serial device takes up all the avl vectors in the guests,
>leaving old-style irqs for the virtio-blk device.
>
I don't think so,
I use iometer to test 64k-read(or write)-sequence case, if I disable the 
virtio-serial dynamically via device manager->virtio-serial => disable,
then the performance get promotion about 25% immediately, then I re-enable the 
virtio-serial via device manager->virtio-serial => enable,
the performance got back again, very obvious.
So, I think it has no business with legacy interrupt mode, right?

I am going to observe the difference of perf top data on qemu and perf kvm stat 
data when disable/enable virtio-serial in guest,
and the difference of perf top data on guest when disable/enable virtio-serial 
in guest,
any ideas?

Thanks,
Zhang Haoyu
>If you restrict the number of vectors the virtio-serial device gets
>(using the -device virtio-serial-pci,vectors= param), does that make
>things better for you?
>
>
>   Amit

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: mmio: cleanup kvm_set_mmio_spte_mask

2014-09-01 Thread Paolo Bonzini
Il 01/09/2014 12:44, Tiejun Chen ha scritto:
> Just reuse rsvd_bits() inside kvm_set_mmio_spte_mask()
> for slightly better code.
> 
> Signed-off-by: Tiejun Chen 
> ---
>  arch/x86/kvm/mmu.c | 5 -
>  arch/x86/kvm/mmu.h | 5 +
>  arch/x86/kvm/x86.c | 2 +-
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 9314678..ae5a085 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -296,11 +296,6 @@ static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>   return likely(kvm_gen == spte_gen);
>  }
>  
> -static inline u64 rsvd_bits(int s, int e)
> -{
> - return ((1ULL << (e - s + 1)) - 1) << s;
> -}
> -
>  void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>   u64 dirty_mask, u64 nx_mask, u64 x_mask)
>  {
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index b982112..bde8ee7 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -56,6 +56,11 @@
>  #define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
>  #define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
>  
> +static inline u64 rsvd_bits(int s, int e)
> +{
> + return ((1ULL << (e - s + 1)) - 1) << s;
> +}
> +
>  int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 
> sptes[4]);
>  void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8f1e22d..a933d4e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5545,7 +5545,7 @@ static void kvm_set_mmio_spte_mask(void)
>* entry to generate page fault with PFER.RSV = 1.
>*/
>/* Mask the reserved physical address bits. */
> - mask = ((1ull << (51 - maxphyaddr + 1)) - 1) << maxphyaddr;
> + mask = rsvd_bits(maxphyaddr, 51);
>  
>   /* Bit 62 is always reserved for 32bit host. */
>   mask |= 0x3ull << 62;
> 

Nice, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: mmio: cleanup kvm_set_mmio_spte_mask

2014-09-01 Thread Tiejun Chen
Just reuse rsvd_bits() inside kvm_set_mmio_spte_mask()
for slightly better code.

Signed-off-by: Tiejun Chen 
---
 arch/x86/kvm/mmu.c | 5 -
 arch/x86/kvm/mmu.h | 5 +
 arch/x86/kvm/x86.c | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..ae5a085 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -296,11 +296,6 @@ static bool check_mmio_spte(struct kvm *kvm, u64 spte)
return likely(kvm_gen == spte_gen);
 }
 
-static inline u64 rsvd_bits(int s, int e)
-{
-   return ((1ULL << (e - s + 1)) - 1) << s;
-}
-
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask)
 {
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index b982112..bde8ee7 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -56,6 +56,11 @@
 #define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
 #define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
 
+static inline u64 rsvd_bits(int s, int e)
+{
+   return ((1ULL << (e - s + 1)) - 1) << s;
+}
+
 int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..a933d4e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5545,7 +5545,7 @@ static void kvm_set_mmio_spte_mask(void)
 * entry to generate page fault with PFER.RSV = 1.
 */
 /* Mask the reserved physical address bits. */
-   mask = ((1ull << (51 - maxphyaddr + 1)) - 1) << maxphyaddr;
+   mask = rsvd_bits(maxphyaddr, 51);
 
/* Bit 62 is always reserved for 32bit host. */
mask |= 0x3ull << 62;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Remove shared defines for SPE and AltiVec interrupts

2014-09-01 Thread Mihai Caraman
We currently decide at compile-time which of the SPE or AltiVec units to
support exclusively. Guard kernel defines with CONFIG_SPE_POSSIBLE and
CONFIG_PPC_E500MC and remove shared defines.

Signed-off-by: Mihai Caraman 
---
 arch/powerpc/include/asm/kvm_asm.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index b8901c4..68644c7 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -53,17 +53,17 @@
 #define BOOKE_INTERRUPT_DEBUG 15
 
 /* E500 */
-#define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
-#define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
-/*
- * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same defines
- */
-#define BOOKE_INTERRUPT_SPE_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
-#define BOOKE_INTERRUPT_SPE_FP_DATA BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
-#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
-#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
-   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
+#ifdef CONFIG_SPE_POSSIBLE
+#define BOOKE_INTERRUPT_SPE_UNAVAIL 32
+#define BOOKE_INTERRUPT_SPE_FP_DATA 33
 #define BOOKE_INTERRUPT_SPE_FP_ROUND 34
+#endif
+
+#ifdef CONFIG_PPC_E500MC
+#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL 32
+#define BOOKE_INTERRUPT_ALTIVEC_ASSIST 33
+#endif
+
 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
 #define BOOKE_INTERRUPT_DOORBELL 36
 #define BOOKE_INTERRUPT_DOORBELL_CRITICAL 37
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: EVENTFD: remove inclusion of irq.h

2014-09-01 Thread Paolo Bonzini
Il 01/09/2014 10:36, Eric Auger ha scritto:
> No more needed. irq.h would be void on ARM.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> I don't think irq.h is needed anymore since Paul Mackerras' work. However
> I did not compile for all architectures.
> ---
>  virt/kvm/eventfd.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 3c5981c..0c712a7 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -36,7 +36,6 @@
>  #include 
>  #include 
>  
> -#include "irq.h"
>  #include "iodev.h"
>  
>  #ifdef CONFIG_HAVE_KVM_IRQFD
> 

Acked-by: Paolo Bonzini 

Christoffer, please include this via the ARM tree, together with ARM
irqfd support.  Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/16] KVM: Add KVM_EXIT_SYSTEM_EVENT to user space API header

2014-09-01 Thread Peter Maydell
On 1 September 2014 10:56, Christoffer Dall  wrote:
> The thing is that we're not exposing PSCI to user space, we're just
> exposing a system event, so it feels a bit weird to rely on user space's
> correct interpretation of a more generic API, to correctly implement
> PSCI in the kernel.

Yeah; if somebody wants to argue that the other set of semantics
make more sense considered purely as a KVM kernel-to-user
API I have no objection. (QEMU's current "we'll do that at some
point in the future" implementation follows the typical semantics
for reset/shutdown triggered by a device register write I think:
the write-to-device-register instruction will generally complete
and CPU execution continue before the prodded device can
get the system reset process done. But we don't necessarily
need to be bound by that idea.)

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/16] KVM: Add KVM_EXIT_SYSTEM_EVENT to user space API header

2014-09-01 Thread Christoffer Dall
On Mon, Sep 01, 2014 at 10:30:17AM +0100, Peter Maydell wrote:
> On 1 September 2014 10:20, Christoffer Dall  
> wrote:
> > On Fri, Aug 29, 2014 at 06:39:09PM +0100, Peter Maydell wrote:
> >> Talking with Ard I realised that there's actually a hole in the
> >> specification of this new ABI. Did we intend these shutdown
> >> and reset exits to be:
> >>  (1) requests from the guest for the shutdown/reset to be
> >>scheduled in the near future (and we'll continue to execute
> >>the guest until the shutdown actually happens)
> >>  (2) requests for shutdown/reset right now, with no further
> >>guest instructions to be executed
> >>
> >> ?
> >>
> >> As currently implemented in QEMU we get behaviour (1),
> >> but I think the kernel PSCI implementation assumes
> >> behaviour (2). Who's right?
> >>
> > For the arm/arm64 use of this API (currently the only one?) the host
> > would not break or anything like that if you keep executing the VM, but
> > the guest will expect that no other instructions are executed after this
> > call.
> 
> Well, if we do that then between QEMU and KVM we've
> violated the PSCI ABI we're supposed to provide, so somebody
> is wrong :-)
> 
> I guess that since the kernel already implements "assume
> userspace won't resume the guest vcpu" the path of least
> resistance is to make userspace follow that.

The thing is that we're not exposing PSCI to user space, we're just
exposing a system event, so it feels a bit weird to rely on user space's
correct interpretation of a more generic API, to correctly implement
PSCI in the kernel.  On the other hand, user space can always break the
guest as it sees fit...

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 2/2] docs: update ivshmem device spec

2014-09-01 Thread David Marchand

On 08/28/2014 11:49 AM, Stefan Hajnoczi wrote:

On Tue, Aug 26, 2014 at 01:04:30PM +0200, Paolo Bonzini wrote:

Il 26/08/2014 08:47, David Marchand ha scritto:


Using a version message supposes we want to keep ivshmem-server and QEMU
separated (for example, in two distribution packages) while we can avoid
this, so why would we do so ?

If we want the ivshmem-server to come with QEMU, then both are supposed
to be aligned on your system.


What about upgrading QEMU and ivshmem-server while you have existing
guests?  You cannot restart ivshmem-server, and the new QEMU would have
to talk to the old ivshmem-server.


Version negotiation also helps avoid confusion if someone combines
ivshmem-server and QEMU from different origins (e.g. built from source
and distro packaged).

It's a safeguard to prevent hard-to-diagnose failures when the system is
misconfigured.



Hum, so you want the code to be defensive against mis-use, why not.

I wanted to keep modifications on ivshmem as little as possible in a 
first phase (all the more so as there are potential ivshmem users out 
there that I think will be impacted by a protocol change).


Sending the version as the first "vm_id" with an associated fd to -1 
before sending the real client id should work with existing QEMU client 
code (hw/misc/ivshmem.c).


Do you have a better idea ?
Is there a best practice in QEMU for "version negotiation" that could 
work with ivshmem protocol ?


I have a v4 ready with this (and all the pending comments), I will send 
it later unless a better idea is exposed.



Thanks.

--
David Marchand
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call for agenda for 2014-09-02

2014-09-01 Thread Juan Quintela

Hi

Please, send any topic that you are interested in covering.

 Thanks, Juan.

 Call details:

 15:00 CEST
 13:00 UTC
 09:00 EDT

 Every two weeks

By popular demand, a google calendar public entry with it

 
https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

 (Let me know if you have any problems with the calendar entry)

If you need phone number details,  contact me privately

Thanks, Juan.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/16] KVM: Add KVM_EXIT_SYSTEM_EVENT to user space API header

2014-09-01 Thread Peter Maydell
On 1 September 2014 10:20, Christoffer Dall  wrote:
> On Fri, Aug 29, 2014 at 06:39:09PM +0100, Peter Maydell wrote:
>> Talking with Ard I realised that there's actually a hole in the
>> specification of this new ABI. Did we intend these shutdown
>> and reset exits to be:
>>  (1) requests from the guest for the shutdown/reset to be
>>scheduled in the near future (and we'll continue to execute
>>the guest until the shutdown actually happens)
>>  (2) requests for shutdown/reset right now, with no further
>>guest instructions to be executed
>>
>> ?
>>
>> As currently implemented in QEMU we get behaviour (1),
>> but I think the kernel PSCI implementation assumes
>> behaviour (2). Who's right?
>>
> For the arm/arm64 use of this API (currently the only one?) the host
> would not break or anything like that if you keep executing the VM, but
> the guest will expect that no other instructions are executed after this
> call.

Well, if we do that then between QEMU and KVM we've
violated the PSCI ABI we're supposed to provide, so somebody
is wrong :-)

I guess that since the kernel already implements "assume
userspace won't resume the guest vcpu" the path of least
resistance is to make userspace follow that.

What does kvmtool do here (if it implements PSCI shutdown
and reset at all)?

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge

2014-09-01 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #18 from Joerg Roedel  ---
The fix is now upstream and part of Linux v3.17-rc2.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: remove garbage arg to *hardware_{en,dis}able

2014-09-01 Thread Christoffer Dall
On Thu, Aug 28, 2014 at 03:13:03PM +0200, Radim Krčmář wrote:
> In the beggining was on_each_cpu(), which required an unused argument to
> kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.
> 
> Remove unnecessary arguments that stem from this.
> 
> Signed-off-by: Radim Krčmář 

For the arm/arm64 part:

Acked-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: static inline empty kvm_arch functions

2014-09-01 Thread Christoffer Dall
On Thu, Aug 28, 2014 at 03:13:02PM +0200, Radim Krčmář wrote:
> Using static inline is going to save few bytes and cycles.
> For example on powerpc, the difference is 700 B after stripping.
> (5 kB before)
> 
> This patch also deals with two overlooked empty functions:
> kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c
>   2df72e9bc KVM: split kvm_arch_flush_shadow
> and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c.
>   e790d9ef6 KVM: add kvm_arch_sched_in
> 
> Signed-off-by: Radim Krčmář 

For the arm/arm64 part:

Acked-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/16] KVM: Add KVM_EXIT_SYSTEM_EVENT to user space API header

2014-09-01 Thread Christoffer Dall
On Fri, Aug 29, 2014 at 06:39:09PM +0100, Peter Maydell wrote:
> On 25 May 2014 19:18, Christoffer Dall  wrote:
> > From: Anup Patel 
> >
> > Currently, we don't have an exit reason to notify user space about
> > a system-level event (for e.g. system reset or shutdown) triggered
> > by the VCPU. This patch adds exit reason KVM_EXIT_SYSTEM_EVENT for
> > this purpose. We can also inform user space about the 'type' and
> > architecture specific 'flags' of a system-level event using the
> > kvm_run structure.
> >
> > This newly added KVM_EXIT_SYSTEM_EVENT will be used by KVM ARM/ARM64
> > in-kernel PSCI v0.2 support to reset/shutdown VMs.
> 
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -2740,6 +2740,21 @@ It gets triggered whenever both KVM_CAP_PPC_EPR are 
> > enabled and an
> >  external interrupt has just been delivered into the guest. User space
> >  should put the acknowledged interrupt vector into the 'epr' field.
> >
> > +   /* KVM_EXIT_SYSTEM_EVENT */
> > +   struct {
> > +#define KVM_SYSTEM_EVENT_SHUTDOWN   1
> > +#define KVM_SYSTEM_EVENT_RESET  2
> > +   __u32 type;
> > +   __u64 flags;
> > +   } system_event;
> > +
> > +If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered
> > +a system-level event using some architecture specific mechanism (hypercall
> > +or some special instruction). In case of ARM/ARM64, this is triggered using
> > +HVC instruction based PSCI call from the vcpu. The 'type' field describes
> > +the system-level event type. The 'flags' field describes architecture
> > +specific flags for the system-level event.
> 
> Talking with Ard I realised that there's actually a hole in the
> specification of this new ABI. Did we intend these shutdown
> and reset exits to be:
>  (1) requests from the guest for the shutdown/reset to be
>scheduled in the near future (and we'll continue to execute
>the guest until the shutdown actually happens)
>  (2) requests for shutdown/reset right now, with no further
>guest instructions to be executed
> 
> ?
> 
> As currently implemented in QEMU we get behaviour (1),
> but I think the kernel PSCI implementation assumes
> behaviour (2). Who's right?
> 
For the arm/arm64 use of this API (currently the only one?) the host
would not break or anything like that if you keep executing the VM, but
the guest will expect that no other instructions are executed after this
call.

The PSCI spec states that it's the responsibility of the PSCI
implementation (here KVM), that "Implementation must ensure that all
cores are in a known state with caches cleaned".  I guess we don't need
to worry about the latter, but we could handle the former by pausing all
VCPUs prior to exiting with the SHUTDOWN system event.  In that
scenario, user space could choose to do either (1) or (2), but it gets a
little fishy with a reset if we set the pause flag, because we would
then at least need to specify in this ABI that this happens for
ARM/ARM64 on reset.

We could clarify this ABI to the fact that user space should not run any
VCPUs after receiving this event, but the above change should probably
be made anyhow, to make sure KVM implements PSCI as much as it can in
the kernel?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] KVM: PPC: Book3E: Enable e6500 core

2014-09-01 Thread Mihai Caraman
Now that AltiVec and hardware thread support is in place enable e6500 core.

Signed-off-by: Mihai Caraman 
---
v2:
 - new patch

 arch/powerpc/kvm/e500mc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
index bf8f99f..2fdc872 100644
--- a/arch/powerpc/kvm/e500mc.c
+++ b/arch/powerpc/kvm/e500mc.c
@@ -180,6 +180,16 @@ int kvmppc_core_check_processor_compat(void)
r = 0;
else if (strcmp(cur_cpu_spec->cpu_name, "e5500") == 0)
r = 0;
+#ifdef CONFIG_ALTIVEC
+   /*
+* Since guests have the priviledge to enable AltiVec, we need AltiVec
+* support in the host to save/restore their context.
+* Don't use CPU_FTR_ALTIVEC to identify cores with AltiVec unit
+* because it's cleared in the absence of CONFIG_ALTIVEC!
+*/
+   else if (strcmp(cur_cpu_spec->cpu_name, "e6500") == 0)
+   r = 0;
+#endif
else
r = -ENOTSUPP;
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core

2014-09-01 Thread Mihai Caraman
ePAPR represents hardware threads as cpu node properties in device tree.
So with existing QEMU, hardware threads are simply exposed as vcpus with
one hardware thread.

The e6500 core shares TLBs between hardware threads. Without tlb write
conditional instruction, the Linux kernel uses per core mechanisms to
protect against duplicate TLB entries.

The guest is unable to detect real siblings threads, so it can't use the
TLB protection mechanism. An alternative solution is to use the hypervisor
to allocate different lpids to guest's vcpus that runs simultaneous on real
siblings threads. On systems with two threads per core this patch halves
the size of the lpid pool that the allocator sees and use two lpids per VM.
Use even numbers to speedup vcpu lpid computation with consecutive lpids
per VM: vm1 will use lpids 2 and 3, vm2 lpids 4 and 5, and so on.

Signed-off-by: Mihai Caraman 
---
v2:
 - halve the size of the lpid pool that the allocator sees to get rid of
   ifdefs in the headers and to have lpids correlated.

 arch/powerpc/include/asm/kvm_booke.h |  5 +++-
 arch/powerpc/kvm/e500.h  | 20 
 arch/powerpc/kvm/e500_mmu_host.c | 18 +++---
 arch/powerpc/kvm/e500mc.c| 46 ++--
 4 files changed, 65 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_booke.h 
b/arch/powerpc/include/asm/kvm_booke.h
index f7aa5cc..630134d 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -23,7 +23,10 @@
 #include 
 #include 
 
-/* LPIDs we support with this build -- runtime limit may be lower */
+/*
+ * Number of available lpids. Only the low-order 6 bits of LPID rgister are
+ * implemented on e500mc+ cores.
+ */
 #define KVMPPC_NR_LPIDS64
 
 #define KVMPPC_INST_EHPRIV 0x7c00021c
diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index a326178..7b74453 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 enum vcpu_ftr {
VCPU_FTR_MMU_V2
@@ -289,6 +290,25 @@ void kvmppc_e500_tlbil_all(struct kvmppc_vcpu_e500 
*vcpu_e500);
 #define kvmppc_e500_get_tlb_stid(vcpu, gtlbe)   get_tlb_tid(gtlbe)
 #define get_tlbmiss_tid(vcpu)   get_cur_pid(vcpu)
 #define get_tlb_sts(gtlbe)  (gtlbe->mas1 & MAS1_TS)
+
+/*
+ * This functios should be called with preemtion disabled
+ * and the returned value is valid only in that context
+ */
+static inline int get_thread_specific_lpid(int vm_lpid)
+{
+   int vcpu_lpid = vm_lpid;
+
+   if (threads_per_core == 2)
+   vcpu_lpid |= smp_processor_id() & 1;
+
+   return vcpu_lpid;
+}
+
+static inline int get_lpid(struct kvm_vcpu *vcpu)
+{
+   return get_thread_specific_lpid(vcpu->kvm->arch.lpid);
+}
 #else
 unsigned int kvmppc_e500_get_tlb_stid(struct kvm_vcpu *vcpu,
  struct kvm_book3e_206_tlb_entry *gtlbe);
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 08f14bb..c8795a6 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -69,7 +69,8 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int 
usermode)
  * writing shadow tlb entry to host TLB
  */
 static inline void __write_host_tlbe(struct kvm_book3e_206_tlb_entry *stlbe,
-uint32_t mas0)
+uint32_t mas0,
+uint32_t lpid)
 {
unsigned long flags;
 
@@ -80,7 +81,7 @@ static inline void __write_host_tlbe(struct 
kvm_book3e_206_tlb_entry *stlbe,
mtspr(SPRN_MAS3, (u32)stlbe->mas7_3);
mtspr(SPRN_MAS7, (u32)(stlbe->mas7_3 >> 32));
 #ifdef CONFIG_KVM_BOOKE_HV
-   mtspr(SPRN_MAS8, stlbe->mas8);
+   mtspr(SPRN_MAS8, MAS8_TGS | get_thread_specific_lpid(lpid));
 #endif
asm volatile("isync; tlbwe" : : : "memory");
 
@@ -129,11 +130,12 @@ static inline void write_host_tlbe(struct 
kvmppc_vcpu_e500 *vcpu_e500,
 
if (tlbsel == 0) {
mas0 = get_host_mas0(stlbe->mas2);
-   __write_host_tlbe(stlbe, mas0);
+   __write_host_tlbe(stlbe, mas0, vcpu_e500->vcpu.kvm->arch.lpid);
} else {
__write_host_tlbe(stlbe,
  MAS0_TLBSEL(1) |
- MAS0_ESEL(to_htlb1_esel(sesel)));
+ MAS0_ESEL(to_htlb1_esel(sesel)),
+ vcpu_e500->vcpu.kvm->arch.lpid);
}
 }
 
@@ -176,7 +178,7 @@ void kvmppc_map_magic(struct kvm_vcpu *vcpu)
   MAS3_SW | MAS3_SR | MAS3_UW | MAS3_UR;
magic.mas8 = 0;
 
-   __write_host_tlbe(&magic, MAS0_TLBSEL(1) | MAS0_ESEL(tlbcam_index));
+   __write_host_tlbe(&magic, MAS0_TLBSEL(1) | MAS0_ESEL(tlbcam_index), 0);
preempt_enable();
 }
 #endif
@@ -317,10 +319,6 @

[PATCH v3] ARM: KVM: add irqfd support

2014-09-01 Thread Eric Auger
This patch enables irqfd on ARM.

irqfd framework enables to inject a virtual IRQ into a guest upon an
eventfd trigger. User-side uses KVM_IRQFD VM ioctl to provide KVM with
a kvm_irqfd struct that associates a VM, an eventfd, a virtual IRQ number
(aka. the gsi). When an actor signals the eventfd (typically a VFIO
platform driver), the kvm irqfd subsystem injects the provided virtual
IRQ into the guest.

Resamplefd also is supported for level sensitive interrupts, ie. the
user can provide another eventfd that is triggered when the completion
of the virtual IRQ (gsi) is detected by the GIC.

The gsi must correspond to a shared peripheral interrupt (SPI), ie the
GIC interrupt ID is gsi+32.

this patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.
CONFIG_HAVE_KVM_IRQCHIP is removed. No IRQ routing table is used
(irqchip.c and irqcomm.c are not used).

Both KVM_CAP_IRQFD & KVM_CAP_IRQFD_RESAMPLE capabilities are exposed

Signed-off-by: Eric Auger 

---

This patch serie deprecates the previous serie featuring GSI routing
(https://patches.linaro.org/32261/)

The patch serie has the following dependencies:
- arm/arm64: KVM: Various VGIC cleanups and improvements
  https://lists.cs.columbia.edu/pipermail/kvmarm/2014-June/009979.html
- "KVM: EVENTFD: remove inclusion of irq.h"

All pieces can be found on git://git.linaro.org/people/eric.auger/linux.git
branch irqfd_norouting_integ_v3

This work was tested with Calxeda Midway xgmac main interrupt with
qemu-system-arm and QEMU VFIO platform device.

v2 -> v3:
- removal of irq.h from eventfd.c put in a separate patch to increase
  visibility
- properly expose KVM_CAP_IRQFD capability in arm.c
- remove CONFIG_HAVE_KVM_IRQCHIP meaningfull only if irq_comm.c is used

v1 -> v2:
- rebase on 3.17rc1
- move of the dist unlock in process_maintenance
- remove of dist lock in __kvm_vgic_sync_hwstate
- rewording of the commit message (add resamplefd reference)
- remove irq.h
---
 Documentation/virtual/kvm/api.txt |  5 +++-
 arch/arm/include/uapi/asm/kvm.h   |  3 +++
 arch/arm/kvm/Kconfig  |  4 +--
 arch/arm/kvm/Makefile |  2 +-
 arch/arm/kvm/arm.c|  3 +++
 virt/kvm/arm/vgic.c   | 56 ---
 6 files changed, 65 insertions(+), 8 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index beae3fd..8118b12 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2204,7 +2204,7 @@ into the hash PTE second double word).
 4.75 KVM_IRQFD
 
 Capability: KVM_CAP_IRQFD
-Architectures: x86 s390
+Architectures: x86 s390 arm
 Type: vm ioctl
 Parameters: struct kvm_irqfd (in)
 Returns: 0 on success, -1 on error
@@ -2230,6 +2230,9 @@ Note that closing the resamplefd is not sufficient to 
disable the
 irqfd.  The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
 and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
 
+On ARM/arm64 the injected must be a shared peripheral interrupt (SPI).
+This means the programmed GIC interrupt ID is gsi+32.
+
 4.76 KVM_PPC_ALLOCATE_HTAB
 
 Capability: KVM_CAP_PPC_ALLOC_HTAB
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index e6ebdd3..3034c66 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -194,6 +194,9 @@ struct kvm_arch_memory_slot {
 /* Highest supported SPI, from VGIC_NR_IRQS */
 #define KVM_ARM_IRQ_GIC_MAX127
 
+/* One single KVM irqchip, ie. the VGIC */
+#define KVM_NR_IRQCHIPS  1
+
 /* PSCI interface */
 #define KVM_PSCI_FN_BASE   0x95c1ba5e
 #define KVM_PSCI_FN(n) (KVM_PSCI_FN_BASE + (n))
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 466bd29..e519a40 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -24,6 +24,7 @@ config KVM
select KVM_MMIO
select KVM_ARM_HOST
depends on ARM_VIRT_EXT && ARM_LPAE
+   select HAVE_KVM_EVENTFD
---help---
  Support hosting virtualized guest machines. You will also
  need to select one or more of the processor modules below.
@@ -55,7 +56,7 @@ config KVM_ARM_MAX_VCPUS
 config KVM_ARM_VGIC
bool "KVM support for Virtual GIC"
depends on KVM_ARM_HOST && OF
-   select HAVE_KVM_IRQCHIP
+   select HAVE_KVM_IRQFD
default y
---help---
  Adds support for a hardware assisted, in-kernel GIC emulation.
@@ -63,7 +64,6 @@ config KVM_ARM_VGIC
 config KVM_ARM_TIMER
bool "KVM support for Architected Timers"
depends on KVM_ARM_VGIC && ARM_ARCH_TIMER
-   select HAVE_KVM_IRQCHIP
default y
---help---
  Adds support for the Architected Timers in virtual machines
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index f7057ed..859db09 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
 AFLAGS_in

RE: [PATCH 1/2] KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core

2014-09-01 Thread mihai.cara...@freescale.com
I abandon this patch, I will send a v2 with a minor fix for 85xx.

Mike

> -Original Message-
> From: Mihai Caraman [mailto:mihai.cara...@freescale.com]
> Sent: Friday, August 29, 2014 8:04 PM
> To: kvm-...@vger.kernel.org
> Cc: kvm@vger.kernel.org; Caraman Mihai Claudiu-B02008
> Subject: [PATCH 1/2] KVM: PPC: e500mc: Add support for single threaded
> vcpus on e6500 core
> 
> ePAPR represents hardware threads as cpu node properties in device tree.
> So with existing QEMU, hardware threads are simply exposed as vcpus with
> one hardware thread.
> 
> The e6500 core shares TLBs between hardware threads. Without tlb write
> conditional instruction, the Linux kernel uses per core mechanisms to
> protect against duplicate TLB entries.
> 
> The guest is unable to detect real siblings threads, so it can't use a
> TLB protection mechanism. An alternative solution is to use the
> hypervisor
> to allocate different lpids to guest's vcpus running simultaneous on real
> siblings threads. On systems with two threads per core this patch halves
> the size of the lpid pool that the allocator sees and use two lpids per
> VM.
> Use even numbers to speedup vcpu lpid computation with consecutive lpids
> per VM: vm1 will use lpids 2 and 3, vm2 lpids 4 and 5, and so on.
> 
> Signed-off-by: Mihai Caraman 
> ---
>  arch/powerpc/include/asm/kvm_booke.h |  5 +++-
>  arch/powerpc/kvm/e500.h  | 20 
>  arch/powerpc/kvm/e500_mmu_host.c | 16 ++---
>  arch/powerpc/kvm/e500mc.c| 46 ++
> --
>  4 files changed, 64 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_booke.h
> b/arch/powerpc/include/asm/kvm_booke.h
> index f7aa5cc..630134d 100644
> --- a/arch/powerpc/include/asm/kvm_booke.h
> +++ b/arch/powerpc/include/asm/kvm_booke.h
> @@ -23,7 +23,10 @@
>  #include 
>  #include 
> 
> -/* LPIDs we support with this build -- runtime limit may be lower */
> +/*
> + * Number of available lpids. Only the low-order 6 bits of LPID rgister
> are
> + * implemented on e500mc+ cores.
> + */
>  #define KVMPPC_NR_LPIDS64
> 
>  #define KVMPPC_INST_EHPRIV   0x7c00021c
> diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
> index a326178..7b74453 100644
> --- a/arch/powerpc/kvm/e500.h
> +++ b/arch/powerpc/kvm/e500.h
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  enum vcpu_ftr {
>   VCPU_FTR_MMU_V2
> @@ -289,6 +290,25 @@ void kvmppc_e500_tlbil_all(struct kvmppc_vcpu_e500
> *vcpu_e500);
>  #define kvmppc_e500_get_tlb_stid(vcpu, gtlbe)   get_tlb_tid(gtlbe)
>  #define get_tlbmiss_tid(vcpu)   get_cur_pid(vcpu)
>  #define get_tlb_sts(gtlbe)  (gtlbe->mas1 & MAS1_TS)
> +
> +/*
> + * This functios should be called with preemtion disabled
> + * and the returned value is valid only in that context
> + */
> +static inline int get_thread_specific_lpid(int vm_lpid)
> +{
> + int vcpu_lpid = vm_lpid;
> +
> + if (threads_per_core == 2)
> + vcpu_lpid |= smp_processor_id() & 1;
> +
> + return vcpu_lpid;
> +}
> +
> +static inline int get_lpid(struct kvm_vcpu *vcpu)
> +{
> + return get_thread_specific_lpid(vcpu->kvm->arch.lpid);
> +}
>  #else
>  unsigned int kvmppc_e500_get_tlb_stid(struct kvm_vcpu *vcpu,
> struct kvm_book3e_206_tlb_entry *gtlbe);
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> b/arch/powerpc/kvm/e500_mmu_host.c
> index 08f14bb..5759608 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -69,7 +69,8 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int
> usermode)
>   * writing shadow tlb entry to host TLB
>   */
>  static inline void __write_host_tlbe(struct kvm_book3e_206_tlb_entry
> *stlbe,
> -  uint32_t mas0)
> +  uint32_t mas0,
> +  uint32_t lpid)
>  {
>   unsigned long flags;
> 
> @@ -80,7 +81,7 @@ static inline void __write_host_tlbe(struct
> kvm_book3e_206_tlb_entry *stlbe,
>   mtspr(SPRN_MAS3, (u32)stlbe->mas7_3);
>   mtspr(SPRN_MAS7, (u32)(stlbe->mas7_3 >> 32));
>  #ifdef CONFIG_KVM_BOOKE_HV
> - mtspr(SPRN_MAS8, stlbe->mas8);
> + mtspr(SPRN_MAS8, MAS8_TGS | get_thread_specific_lpid(lpid));
>  #endif
>   asm volatile("isync; tlbwe" : : : "memory");
> 
> @@ -129,11 +130,12 @@ static inline void write_host_tlbe(struct
> kvmppc_vcpu_e500 *vcpu_e500,
> 
>   if (tlbsel == 0) {
>   mas0 = get_host_mas0(stlbe->mas2);
> - __write_host_tlbe(stlbe, mas0);
> + __write_host_tlbe(stlbe, mas0, vcpu_e500->vcpu.kvm-
> >arch.lpid);
>   } else {
>   __write_host_tlbe(stlbe,
> MAS0_TLBSEL(1) |
> -   MAS0_ESEL(to_htlb1_esel(sesel)));
> +   MAS0_ESEL(to_htlb1_esel(sesel)),
> +   

[PATCH] KVM: EVENTFD: remove inclusion of irq.h

2014-09-01 Thread Eric Auger
No more needed. irq.h would be void on ARM.

Signed-off-by: Eric Auger 

---

I don't think irq.h is needed anymore since Paul Mackerras' work. However
I did not compile for all architectures.
---
 virt/kvm/eventfd.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 3c5981c..0c712a7 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -36,7 +36,6 @@
 #include 
 #include 
 
-#include "irq.h"
 #include "iodev.h"
 
 #ifdef CONFIG_HAVE_KVM_IRQFD
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: share config interrupt between virtio devices

2014-09-01 Thread Michael S. Tsirkin
On Mon, Sep 01, 2014 at 03:58:02PM +0800, Amos Kong wrote:
> On Mon, Sep 01, 2014 at 09:37:30AM +0300, Michael S. Tsirkin wrote:
> >
> 
> Hi Michael,
> 
> > On Mon, Sep 01, 2014 at 01:41:54PM +0800, Amos Kong wrote:
> > > One VM only has 128 msix interrupt, virtio-config interrupt
> > > has less workload. This patch shares one normal interrupt
> 
> Thanks for your quick reply.
> 
> > normal == INT#x? Please don't call it normal. The proper name is
> > "legacy INT#x".
>  
> OK
> 
> > So you are trying to use legacy INT#x at the same time
> > with MSI-X?  This does not work: the PCI spec says:
> > While enabled for MSI or MSI-X
> > operation, a function is prohibited from using its INTx# pin (if
> > implemented) to request
> > service (MSI, MSI-X, and INTx# are mutually exclusive).
> 
> It means we can't use interrupt and triggered-interrupt together for
> one PCI device. I will study this problem.
>  
> > does the patch work for you? If it does it might be a (minor) spec
> > violation in kvm.
> 
> I did some basic testing (multiple nics, scp, ping, etc), it works.

It's quite likely there were no config interrupts in your
basic test. Trigger some event that requires a config interrupt
to function. For example, drop link and see if guest notices this.

> > Besides, INT#x really leads to terrible performance because
> > sharing is forced even if there aren't many devices.
> > 
> > Why do we need INT#x?
> > How about setting IRQF_SHARED for the config interrupt
> > while using MSI-X?
> > You'd have to read ISR to check that the interrupt was
> > intended for your device.
>  
> I have a draft patch to share one MSI-X for all virtio-config, it has
> some problem in hotplugging devices. I will continue this way.
>  
> > > for configuration between virtio devices.
> > > 
> > > Signed-off-by: Amos Kong 
> > > ---
> > >  drivers/virtio/virtio_pci.c | 41 
> > > -
> > >  1 file changed, 16 insertions(+), 25 deletions(-)
> > > 
> > > diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> > > index 3d1463c..b1263b3 100644
> > > --- a/drivers/virtio/virtio_pci.c
> > > +++ b/drivers/virtio/virtio_pci.c
> > > @@ -52,6 +52,7 @@ struct virtio_pci_device
> > >   /* Name strings for interrupts. This size should be enough,
> > >* and I'm too lazy to allocate each name separately. */
> > >   char (*msix_names)[256];
> > > + char config_msix_name[256];
> > >   /* Number of available vectors */
> > >   unsigned msix_vectors;
> > >   /* Vectors allocated, excluding per-vq vectors if any */
> > > @@ -282,12 +283,6 @@ static void vp_free_vectors(struct virtio_device 
> > > *vdev)
> > >   free_cpumask_var(vp_dev->msix_affinity_masks[i]);
> > >  
> > >   if (vp_dev->msix_enabled) {
> > > - /* Disable the vector used for configuration */
> > > - iowrite16(VIRTIO_MSI_NO_VECTOR,
> > > -   vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
> > > - /* Flush the write out to device */
> > > - ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
> > > -
> > >   pci_disable_msix(vp_dev->pci_dev);
> > >   vp_dev->msix_enabled = 0;
> > >   }
> > > @@ -339,24 +334,18 @@ static int vp_request_msix_vectors(struct 
> > > virtio_device *vdev, int nvectors,
> > >   goto error;
> > >   vp_dev->msix_enabled = 1;
> > >  
> > > - /* Set the vector used for configuration */
> > > - v = vp_dev->msix_used_vectors;
> > > - snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > + /* Set shared IRQ for configuration */
> > > + snprintf(vp_dev->config_msix_name, sizeof(*vp_dev->msix_names),
> > >"%s-config", name);
> > > - err = request_irq(vp_dev->msix_entries[v].vector,
> > > -   vp_config_changed, 0, vp_dev->msix_names[v],
> > > + err = request_irq(vp_dev->pci_dev->irq,
> > > +   vp_config_changed,
> > > +   IRQF_SHARED,
> > > +   vp_dev->config_msix_name,
> > > vp_dev);
> > > - if (err)
> > > - goto error;
> > > - ++vp_dev->msix_used_vectors;
> > > -
> > > - iowrite16(v, vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
> > > - /* Verify we had enough resources to assign the vector */
> > > - v = ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
> > > - if (v == VIRTIO_MSI_NO_VECTOR) {
> > > - err = -EBUSY;
> > > + if (!err)
> > > + vp_dev->intx_enabled = 1;
> > > + else
> > >   goto error;
> > > - }
> > >  
> > >   if (!per_vq_vectors) {
> > >   /* Shared vector for all VQs */
> > > @@ -535,14 +524,16 @@ static int vp_try_to_find_vqs(struct virtio_device 
> > > *vdev, unsigned nvqs,
> > >   goto error_request;
> > >   } else {
> > >   if (per_vq_vectors) {
> > > - /* Best option: one for change interrupt, one per vq. */
> > > - nvectors = 1;
> > > + /* Best option: one normal interrupt for

Re: [PATCH RFC] virtio-pci: share config interrupt between virtio devices

2014-09-01 Thread Amos Kong
On Mon, Sep 01, 2014 at 09:37:30AM +0300, Michael S. Tsirkin wrote:
>

Hi Michael,

> On Mon, Sep 01, 2014 at 01:41:54PM +0800, Amos Kong wrote:
> > One VM only has 128 msix interrupt, virtio-config interrupt
> > has less workload. This patch shares one normal interrupt

Thanks for your quick reply.

> normal == INT#x? Please don't call it normal. The proper name is
> "legacy INT#x".
 
OK

> So you are trying to use legacy INT#x at the same time
> with MSI-X?  This does not work: the PCI spec says:
>   While enabled for MSI or MSI-X
>   operation, a function is prohibited from using its INTx# pin (if
>   implemented) to request
>   service (MSI, MSI-X, and INTx# are mutually exclusive).

It means we can't use interrupt and triggered-interrupt together for
one PCI device. I will study this problem.
 
> does the patch work for you? If it does it might be a (minor) spec
> violation in kvm.

I did some basic testing (multiple nics, scp, ping, etc), it works.
 
> Besides, INT#x really leads to terrible performance because
> sharing is forced even if there aren't many devices.
> 
> Why do we need INT#x?
> How about setting IRQF_SHARED for the config interrupt
> while using MSI-X?
> You'd have to read ISR to check that the interrupt was
> intended for your device.
 
I have a draft patch to share one MSI-X for all virtio-config, it has
some problem in hotplugging devices. I will continue this way.
 
> > for configuration between virtio devices.
> > 
> > Signed-off-by: Amos Kong 
> > ---
> >  drivers/virtio/virtio_pci.c | 41 -
> >  1 file changed, 16 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> > index 3d1463c..b1263b3 100644
> > --- a/drivers/virtio/virtio_pci.c
> > +++ b/drivers/virtio/virtio_pci.c
> > @@ -52,6 +52,7 @@ struct virtio_pci_device
> > /* Name strings for interrupts. This size should be enough,
> >  * and I'm too lazy to allocate each name separately. */
> > char (*msix_names)[256];
> > +   char config_msix_name[256];
> > /* Number of available vectors */
> > unsigned msix_vectors;
> > /* Vectors allocated, excluding per-vq vectors if any */
> > @@ -282,12 +283,6 @@ static void vp_free_vectors(struct virtio_device *vdev)
> > free_cpumask_var(vp_dev->msix_affinity_masks[i]);
> >  
> > if (vp_dev->msix_enabled) {
> > -   /* Disable the vector used for configuration */
> > -   iowrite16(VIRTIO_MSI_NO_VECTOR,
> > - vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
> > -   /* Flush the write out to device */
> > -   ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
> > -
> > pci_disable_msix(vp_dev->pci_dev);
> > vp_dev->msix_enabled = 0;
> > }
> > @@ -339,24 +334,18 @@ static int vp_request_msix_vectors(struct 
> > virtio_device *vdev, int nvectors,
> > goto error;
> > vp_dev->msix_enabled = 1;
> >  
> > -   /* Set the vector used for configuration */
> > -   v = vp_dev->msix_used_vectors;
> > -   snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > +   /* Set shared IRQ for configuration */
> > +   snprintf(vp_dev->config_msix_name, sizeof(*vp_dev->msix_names),
> >  "%s-config", name);
> > -   err = request_irq(vp_dev->msix_entries[v].vector,
> > - vp_config_changed, 0, vp_dev->msix_names[v],
> > +   err = request_irq(vp_dev->pci_dev->irq,
> > + vp_config_changed,
> > + IRQF_SHARED,
> > + vp_dev->config_msix_name,
> >   vp_dev);
> > -   if (err)
> > -   goto error;
> > -   ++vp_dev->msix_used_vectors;
> > -
> > -   iowrite16(v, vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
> > -   /* Verify we had enough resources to assign the vector */
> > -   v = ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
> > -   if (v == VIRTIO_MSI_NO_VECTOR) {
> > -   err = -EBUSY;
> > +   if (!err)
> > +   vp_dev->intx_enabled = 1;
> > +   else
> > goto error;
> > -   }
> >  
> > if (!per_vq_vectors) {
> > /* Shared vector for all VQs */
> > @@ -535,14 +524,16 @@ static int vp_try_to_find_vqs(struct virtio_device 
> > *vdev, unsigned nvqs,
> > goto error_request;
> > } else {
> > if (per_vq_vectors) {
> > -   /* Best option: one for change interrupt, one per vq. */
> > -   nvectors = 1;
> > +   /* Best option: one normal interrupt for change,
> > +  one msix per vq. */
> > +   nvectors = 0;
> > for (i = 0; i < nvqs; ++i)
> > if (callbacks[i])
> > ++nvectors;
> > } else {
> > -   /* Second best: one for change, shared for all vqs. */
> > -   nvector