performance trouble

2012-01-23 Thread David Cure

Hello,

I use several kvm box, and no problem at all except for 1
application that have bad response time.

The VM runs Windows 2008R2 and the application is an
client-server app develop with progress software and talk to an Oracle
databasei (on another server) and we access this app with RDS/TSE.
The physical server runs Debian testing to have qemu-kvm 1.0 and
linux kernel 3.1 and libvirt 0.9.8. We use virtio for disk and network
and use the last driver for Windows (from RH).

We have 2 references servers : one physical and one running
Vmware.

Response time :
o physical = 7s
o VM under vmware = 8s
o VM under KVM = 12s (to complete with qemu-kvm 0.12.5 and
kernel 2.6.32 we have 22s ...).

I attach the libvirt xml of my vm.

How can I see what's append ? Do you have idea to increase
performance ?

David.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42635] New: PCIe passthrough broken with AMD iommu after s2disk / resume

2012-01-23 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42635

   Summary: PCIe passthrough broken with AMD iommu after s2disk /
resume
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.1
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: kmuel...@justmail.de
Regression: No


A PCIe ethernet device is passed through to the guest and runs fine. Some time
later, the guest is shutdown (e.g. virsh shutdown guest) and the host is
suspended (s2disk) and resumed again. The guest is started again (and it starts
fine), but the device in the guest isn't working any more (it can be seen, it
can be pinged itself but nothing is reachable outside the device. The rx or tx
counters of the ifconfig output are always 0.

If the guest is shutdown and the device is bound to the host, it's working as
expected. If the device is afterwards bound to the guest again, it doesn't work
as before in the guest after resume.

kernel: 3.1 / 64bit / smp
kvm: 1.0
board: GA-990XA-UD3

If you need more information, I can provide them - feel free to ask!


Regards,
Klaus

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42636] New: PCI passthrough does not work with AMD iommu for PCI device

2012-01-23 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42636

   Summary: PCI passthrough does not work with AMD iommu for PCI
device
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.1
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: kmuel...@justmail.de
Regression: No


I want to passthrough this PCI deivce to a kvm guest:

05:07.0 Network controller: RaLink RT2800 802.11n PCI
Subsystem: Linksys Device 0067
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow TAbort- TAbort-
MAbort- SERR- PERR- INTx-
Interrupt: pin A routed to IRQ 21
Region 0: Memory at fdae (32-bit, non-prefetchable) [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

Unfortunately, I'm always getting an error during virsh start guest:

Failed to assign device hostdev0 : Device or resource busy.
qemu-kvm: -device
pci-assign,host=05:06.0,id=hostdev0,configfd=20,bus=pci.0,addr=0x4: Device
'pci-assign' could not be initialized.

If I'm adding this device (05:06.0) to the guest, too, I'm getting the exactly
same error again. Of course, I unloaded the module of this additional device
before trying to passthrough it to the guest.

lspci -vvs 05:06.0
05:06.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100
(rev 0c)
Subsystem: Intel Corporation EtherExpress PRO/100 S Desktop Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 64 (2000ns min, 14000ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 20
Region 0: Memory at fdaff000 (32-bit, non-prefetchable) [size=4K]
Region 1: I/O ports at af00 [size=64]
Region 2: Memory at fdaa (32-bit, non-prefetchable) [size=128K]
[virtual] Expansion ROM at fd90 [disabled] [size=64K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME-


lspci -vt
-[:00]-+-00.0  ATI Technologies Inc RD890 PCI to PCI bridge (external gfx0
port B)
   +-00.2  ATI Technologies Inc Device 5a23
   +-02.0-[01]--+-00.0  ATI Technologies Inc NI Turks [AMD Radeon HD
6500]
   |\-00.1  ATI Technologies Inc Device aa90
   +-04.0-[02]00.0  Device 1b6f:7023
   +-09.0-[03]00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168B
PCI Express Gigabit Ethernet controller
   +-0a.0-[04]00.0  Device 1b6f:7023
   +-11.0  ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode]
   +-12.0  ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
   +-12.2  ATI Technologies Inc SB700/SB800 USB EHCI Controller
   +-13.0  ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
   +-13.2  ATI Technologies Inc SB700/SB800 USB EHCI Controller
   +-14.0  ATI Technologies Inc SBx00 SMBus Controller
   +-14.1  ATI Technologies Inc SB700/SB800 IDE Controller
   +-14.2  ATI Technologies Inc SBx00 Azalia (Intel HDA)
   +-14.3  ATI Technologies Inc SB700/SB800 LPC host controller
   +-14.4-[05]--+-06.0  Intel Corporation 82557/8/9/0/1 Ethernet Pro
100
   |\-07.0  RaLink RT2800 802.11n PCI
   +-14.5  ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
   +-15.0-[06]--
   +-16.0  ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
   +-16.2  ATI Technologies Inc SB700/SB800 USB EHCI Controller
   +-18.0  Advanced Micro Devices [AMD] Device 1600
   +-18.1  Advanced Micro Devices [AMD] Device 1601
   +-18.2  Advanced Micro Devices [AMD] Device 1602
   +-18.3  Advanced Micro Devices [AMD] Device 1603
   +-18.4  Advanced Micro Devices [AMD] Device 1604
   \-18.5  Advanced Micro Devices [AMD] Device 1605

dmesg | grep AMD-Vi
[0.610182] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: 3e info 1300
[0.610184] AMD-Vi:mmio-addr: fec3
[0.610359] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:00.0 flags: 00
[0.610360] AMD-Vi:   DEV_RANGE_END   devid: 00:00.2
[0.610362] AMD-Vi:   DEV_SELECT  devid: 00:02.0 flags:
00
[0.610363] AMD-Vi:   

Re: performance trouble

2012-01-23 Thread David Cure
Le Mon, Jan 23, 2012 at 09:28:37AM +0100, David Cure ecrivait :
 
   I attach the libvirt xml of my vm.

I forget to attach ;)

David.


rds.xml
Description: XML document


signature.asc
Description: Digital signature


[PATCH 0/4] KVM: Decouple rmap_pde from lpage_info write_count

2012-01-23 Thread Takuya Yoshikawa
The last one is an RFC patch:

I think it is better to refactor the rmap things, if needed, before
other architectures than x86 starts large pages support.

Takuya

 arch/ia64/kvm/kvm-ia64.c|8 
 arch/powerpc/kvm/book3s_64_mmu_hv.c |6 +++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |4 ++--
 arch/x86/kvm/mmu.c  |   24 ++--
 arch/x86/kvm/mmu_audit.c|4 +---
 arch/x86/kvm/x86.c  |4 ++--
 include/linux/kvm_host.h|   10 --
 virt/kvm/kvm_main.c |   29 +
 8 files changed, 47 insertions(+), 42 deletions(-)

-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: MMU: Use gfn_to_rmap() in audit_write_protection()

2012-01-23 Thread Takuya Yoshikawa
We want to eliminate direct access to the rmap array.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/x86/kvm/mmu_audit.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 6eabae3..e62fa4f 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -190,15 +190,13 @@ static void check_mappings_rmap(struct kvm *kvm, struct 
kvm_mmu_page *sp)
 
 static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-   struct kvm_memory_slot *slot;
unsigned long *rmapp;
u64 *spte;
 
if (sp-role.direct || sp-unsync || sp-role.invalid)
return;
 
-   slot = gfn_to_memslot(kvm, sp-gfn);
-   rmapp = slot-rmap[sp-gfn - slot-base_gfn];
+   rmapp = gfn_to_rmap(kvm, sp-gfn, PT_PAGE_TABLE_LEVEL);
 
spte = rmap_next(rmapp, NULL);
while (spte) {
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: MMU: Use __gfn_to_rmap() in kvm_handle_hva()

2012-01-23 Thread Takuya Yoshikawa
We can hide the implementation details and treat every level uniformly.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/x86/kvm/mmu.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 844fcce..0e82d9d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1133,14 +1133,14 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned 
long hva,
gfn_t gfn_offset = (hva - start)  PAGE_SHIFT;
gfn_t gfn = memslot-base_gfn + gfn_offset;
 
-   ret = handler(kvm, memslot-rmap[gfn_offset], data);
+   ret = 0;
 
-   for (j = 0; j  KVM_NR_PAGE_SIZES - 1; ++j) {
-   struct kvm_lpage_info *linfo;
+   for (j = PT_PAGE_TABLE_LEVEL;
+j  PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) {
+   unsigned long *rmapp;
 
-   linfo = lpage_info_slot(gfn, memslot,
-   PT_DIRECTORY_LEVEL + j);
-   ret |= handler(kvm, linfo-rmap_pde, data);
+   rmapp = __gfn_to_rmap(gfn, j, memslot);
+   ret |= handler(kvm, rmapp, data);
}
trace_kvm_age_page(hva, memslot, ret);
retval |= ret;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: Introduce gfn_to_index() which returns the index for a given level

2012-01-23 Thread Takuya Yoshikawa
We can also use this for PT_PAGE_TABLE_LEVEL to treat every level
uniformly.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/x86/kvm/mmu.c   |3 +--
 include/linux/kvm_host.h |7 +++
 virt/kvm/kvm_main.c  |4 +---
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0e82d9d..12f5c99 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -688,8 +688,7 @@ static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn,
 {
unsigned long idx;
 
-   idx = (gfn  KVM_HPAGE_GFN_SHIFT(level)) -
- (slot-base_gfn  KVM_HPAGE_GFN_SHIFT(level));
+   idx = gfn_to_index(gfn, slot-base_gfn, level);
return slot-lpage_info[level - 2][idx];
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index eada8e6..06d4e41 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -656,6 +656,13 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
return gfn_to_memslot(kvm, gfn)-id;
 }
 
+static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
+{
+   /* KVM_HPAGE_GFN_SHIFT(PT_PAGE_TABLE_LEVEL) must be 0. */
+   return (gfn  KVM_HPAGE_GFN_SHIFT(level)) -
+   (base_gfn  KVM_HPAGE_GFN_SHIFT(level));
+}
+
 static inline unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot,
   gfn_t gfn)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9f32bff..4f2574f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -803,9 +803,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (new.lpage_info[i])
continue;
 
-   lpages = 1 + ((base_gfn + npages - 1)
- KVM_HPAGE_GFN_SHIFT(level));
-   lpages -= base_gfn  KVM_HPAGE_GFN_SHIFT(level);
+   lpages = gfn_to_index(base_gfn + npages - 1, base_gfn, level) + 
1;
 
new.lpage_info[i] = vzalloc(lpages * 
sizeof(*new.lpage_info[i]));
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/4] KVM: Decouple rmap_pde from lpage_info write_count

2012-01-23 Thread Takuya Yoshikawa
Though we have one rmap array for every level, those for large pages,
called rmap_pde, are coupled with write_count information and constitute
lpage_info arrays.

To hide this implementation details, we are now using __gfn_to_rmap()
which includes likely(level == PT_PAGE_TABLE_LEVEL) heuristics;  this
is not good because we know that it always fails for higher levels.

Furthermore, when we traverse rmap arrays to write protect pages during
dirty logging, the current layout reduces the locality of their elements
by placing write_count next to rmap_pde in lpage_info.

This patch mitigates this problem by decoupling rmap_pde from lpage_info
write_count and making the rmap array two dimensional which holds the
old rmap_pde elements in it.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/ia64/kvm/kvm-ia64.c|8 
 arch/powerpc/kvm/book3s_64_mmu_hv.c |6 +++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |4 ++--
 arch/x86/kvm/mmu.c  |9 +++--
 arch/x86/kvm/x86.c  |4 ++--
 include/linux/kvm_host.h|3 +--
 virt/kvm/kvm_main.c |   25 -
 7 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 8ca7261..b17eaa1 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1376,8 +1376,8 @@ static void kvm_release_vm_pages(struct kvm *kvm)
kvm_for_each_memslot(memslot, slots) {
base_gfn = memslot-base_gfn;
for (j = 0; j  memslot-npages; j++) {
-   if (memslot-rmap[j])
-   put_page((struct page *)memslot-rmap[j]);
+   if (memslot-rmap[0][j])
+   put_page((struct page *)memslot-rmap[0][j]);
}
}
 }
@@ -1591,12 +1591,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
kvm_set_pmt_entry(kvm, base_gfn + i,
pfn  PAGE_SHIFT,
_PAGE_AR_RWX | _PAGE_MA_WB);
-   memslot-rmap[i] = (unsigned long)pfn_to_page(pfn);
+   memslot-rmap[0][i] = (unsigned long)pfn_to_page(pfn);
} else {
kvm_set_pmt_entry(kvm, base_gfn + i,
GPFN_PHYS_MMIO | (pfn  PAGE_SHIFT),
_PAGE_MA_UC);
-   memslot-rmap[i] = 0;
+   memslot-rmap[0][i] = 0;
}
}
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 783cd35..81f9036 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -631,7 +631,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
goto out_unlock;
hpte[0] = (hpte[0]  ~HPTE_V_ABSENT) | HPTE_V_VALID;
 
-   rmap = memslot-rmap[gfn - memslot-base_gfn];
+   rmap = memslot-rmap[0][gfn - memslot-base_gfn];
lock_rmap(rmap);
 
/* Check if we might have been invalidated; let the guest retry if so */
@@ -693,7 +693,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long 
hva,
if (hva = start  hva  end) {
gfn_t gfn_offset = (hva - start)  PAGE_SHIFT;
 
-   ret = handler(kvm, memslot-rmap[gfn_offset],
+   ret = handler(kvm, memslot-rmap[0][gfn_offset],
  memslot-base_gfn + gfn_offset);
retval |= ret;
}
@@ -928,7 +928,7 @@ long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct 
kvm_memory_slot *memslot)
unsigned long *rmapp, *map;
 
preempt_disable();
-   rmapp = memslot-rmap;
+   rmapp = memslot-rmap[0];
map = memslot-dirty_bitmap;
for (i = 0; i  memslot-npages; ++i) {
if (kvm_test_clear_dirty(kvm, rmapp))
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 5f3c60b..4df9b4a 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -103,7 +103,7 @@ static void remove_revmap_chain(struct kvm *kvm, long 
pte_index,
if (!memslot || (memslot-flags  KVM_MEMSLOT_INVALID))
return;
 
-   rmap = real_vmalloc_addr(memslot-rmap[gfn - memslot-base_gfn]);
+   rmap = real_vmalloc_addr(memslot-rmap[0][gfn - memslot-base_gfn]);
lock_rmap(rmap);
 
head = *rmap  KVMPPC_RMAP_INDEX;
@@ -199,7 +199,7 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long 
flags,
if (!slot_is_aligned(memslot, psize))
return H_PARAMETER;
slot_fn = gfn - memslot-base_gfn;
-   rmap = memslot-rmap[slot_fn];
+   rmap = memslot-rmap[0][slot_fn];
 
if 

[Bug 42636] PCI passthrough does not work with AMD iommu for PCI device

2012-01-23 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42636


Alex Williamson alex.william...@redhat.com changed:

   What|Removed |Added

 CC||alex.william...@redhat.com




--- Comment #1 from Alex Williamson alex.william...@redhat.com  2012-01-23 
14:56:55 ---
   +-14.4-[05]--+-06.0  Intel Corporation 82557/8/9/0/1 Ethernet Pro100
   |\-07.0  RaLink RT2800 802.11n PCI

[0.610382] AMD-Vi:   DEV_SELECT  devid: 00:14.4
flags:00
[0.610384] AMD-Vi:   DEV_ALIAS_RANGE devid: 05:00.0
flags:00 devid_to: 00:14.4
[0.610385] AMD-Vi:   DEV_RANGE_END   devid: 05:1f.7

The devices are behind a PCIe-to-PCI bridge (00:14.4), so both devices get
aliased to the same devices.  You'll need to either add both devices to the
guest or sequester the other device by binding it to pci-stub.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42635] PCIe passthrough broken with AMD iommu after s2disk / resume

2012-01-23 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42635


Alex Williamson alex.william...@redhat.com changed:

   What|Removed |Added

 CC||alex.william...@redhat.com




--- Comment #1 from Alex Williamson alex.william...@redhat.com  2012-01-23 
15:01:29 ---
Between the first and second paragraph, you seem to be saying that the device
never works in the guest on the second invocation of the guest.  Is that true? 
What is the device?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 0/4] VM86 testcase and run_tests.sh

2012-01-23 Thread Kevin Wolf
This adds a test case for taskswitches into/out of VM86. This test case
currently fails on KVM, it passes with TCG. I'll send out KVM fixes together
with this series.

I also included a small shell script that just runs tests and prints a
PASS/FAIL message for each. I've been using this script locally for a while,
but maybe someone else finds it handy, too.

Kevin Wolf (4):
  Add run_tests.sh
  Add taskswitch testcases to unittest.cfg
  Fix i386 build
  x86/taskswitch_vm86: Task switches into/out of VM86

 config-i386.mak   |3 +-
 lib/x86/desc.c|   39 +-
 lib/x86/desc.h|   36 
 lib/x86/vm.c  |4 +-
 lib/x86/vm.h  |1 +
 run_tests.sh  |  107 +
 x86/taskswitch_vm86.c |   59 +++
 x86/unittests.cfg |   18 
 8 files changed, 227 insertions(+), 40 deletions(-)
 create mode 100755 run_tests.sh
 create mode 100644 x86/taskswitch_vm86.c

-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 1/4] Add run_tests.sh

2012-01-23 Thread Kevin Wolf
This adds a convenient way to run all tests without having to set up
Autotest.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 run_tests.sh |  107 ++
 1 files changed, 107 insertions(+), 0 deletions(-)
 create mode 100755 run_tests.sh

diff --git a/run_tests.sh b/run_tests.sh
new file mode 100755
index 000..8d152b0
--- /dev/null
+++ b/run_tests.sh
@@ -0,0 +1,107 @@
+#!/bin/bash
+
+testroot=x86
+config=$testroot/unittests.cfg
+qemu=${qemu:-qemu-system-x86_64}
+verbose=0
+
+function run()
+{
+local testname=$1
+local groups=$2
+local smp=$3
+local kernel=$4
+local opts=$5
+
+if [ -z $testname ]; then
+return
+fi
+
+if [ -n $only_group ]  ! grep -q $only_group $groups; then
+return
+fi
+
+cmdline=$qemu -display none -enable-kvm -device testdev,chardev=testlog 
-chardev stdio,id=testlog -kernel $kernel -smp $smp $opts
+if [ $verbose != 0 ]; then
+echo $cmdline
+fi
+
+# extra_params in the config file may contain backticks that need to be
+# expanded, so use eval to start qemu
+eval $cmdline  test.log
+
+if [ $? == 0 ]; then
+echo PASS $1
+else
+echo FAIL $1
+fi
+}
+
+function run_all()
+{
+local config=$1
+local testname
+local smp
+local kernel
+local opts
+local groups
+
+exec {config_fd}$config
+
+while read -u $config_fd line; do
+if [[ $line =~ ^\[(.*)\]$ ]]; then
+run $testname $groups $smp $kernel $opts
+testname=${BASH_REMATCH[1]}
+smp=1
+kernel=
+opts=
+groups=
+elif [[ $line =~ ^file\ *=\ *(.*)$ ]]; then
+kernel=$testroot/${BASH_REMATCH[1]}
+elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
+smp=${BASH_REMATCH[1]}
+elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
+opts=${BASH_REMATCH[1]}
+elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
+groups=${BASH_REMATCH[1]}
+fi
+done
+
+run $testname $groups $smp $kernel $opts
+
+exec {config_fd}-
+}
+
+function usage()
+{
+cat EOF
+
+Usage: $0 [-g group] [-h] [-v]
+
+-g: Only execute tests in the given group
+-h: Output this help text
+-v: Enables verbose mode
+
+EOF
+}
+
+echo  test.log
+while getopts g:hv opt; do
+case $opt in
+g)
+only_group=$OPTARG
+;;
+h)
+usage
+exit
+;;
+v)
+verbose=1
+;;
+*)
+exit
+;;
+esac
+done
+
+run_all $config
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 2/4] Add taskswitch testcases to unittest.cfg

2012-01-23 Thread Kevin Wolf
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 x86/unittests.cfg |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 065020a..dac7d44 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -64,6 +64,18 @@ file = svm.flat
 smp = 2
 extra_params = -cpu qemu64,-svm
 
+[taskswitch]
+file = taskswitch.flat
+smp = 2
+extra_params = -cpu qemu64,-svm
+groups = task
+
+[taskswitch2]
+file = taskswitch2.flat
+smp = 2
+extra_params = -cpu qemu64,-svm
+groups = task
+
 [kvmclock_test]
 file = kvmclock_test.flat
 smp = 2
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 3/4] Fix i386 build

2012-01-23 Thread Kevin Wolf
Commit 1d946e07 removed idt, but left a reference to idt in i386-only
code.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 lib/x86/desc.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/x86/desc.c b/lib/x86/desc.c
index c268955..770c250 100644
--- a/lib/x86/desc.c
+++ b/lib/x86/desc.c
@@ -329,7 +329,7 @@ void setup_gdt(void)
 
 static void set_idt_task_gate(int vec, u16 sel)
 {
-idt_entry_t *e = idt[vec];
+idt_entry_t *e = boot_idt[vec];
 
 memset(e, 0, sizeof *e);
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86

2012-01-23 Thread Kevin Wolf
This adds a test case that jumps into VM86 by iret-ing to a TSS and back
to Protected Mode using a task gate in the IDT.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 config-i386.mak   |3 +-
 lib/x86/desc.c|   37 +-
 lib/x86/desc.h|   36 +
 lib/x86/vm.c  |4 +-
 lib/x86/vm.h  |1 +
 x86/taskswitch_vm86.c |   59 +
 x86/unittests.cfg |6 +
 7 files changed, 107 insertions(+), 39 deletions(-)
 create mode 100644 x86/taskswitch_vm86.c

diff --git a/config-i386.mak b/config-i386.mak
index de52f3d..b5c3b9c 100644
--- a/config-i386.mak
+++ b/config-i386.mak
@@ -5,9 +5,10 @@ ldarch = elf32-i386
 CFLAGS += -D__i386__
 CFLAGS += -I $(KERNELDIR)/include
 
-tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat
+tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat 
$(TEST_DIR)/taskswitch_vm86.flat
 
 include config-x86-common.mak
 
 $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o
 $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o
+$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o
diff --git a/lib/x86/desc.c b/lib/x86/desc.c
index 770c250..c4a3607 100644
--- a/lib/x86/desc.c
+++ b/lib/x86/desc.c
@@ -27,41 +27,6 @@ typedef struct {
u8 base_high;
 } gdt_entry_t;
 
-typedef struct {
-   u16 prev;
-   u16 res1;
-   u32 esp0;
-   u16 ss0;
-   u16 res2;
-   u32 esp1;
-   u16 ss1;
-   u16 res3;
-   u32 esp2;
-   u16 ss2;
-   u16 res4;
-   u32 cr3;
-   u32 eip;
-   u32 eflags;
-   u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
-   u16 es;
-   u16 res5;
-   u16 cs;
-   u16 res6;
-   u16 ss;
-   u16 res7;
-   u16 ds;
-   u16 res8;
-   u16 fs;
-   u16 res9;
-   u16 gs;
-   u16 res10;
-   u16 ldt;
-   u16 res11;
-   u16 t:1;
-   u16 res12:15;
-   u16 iomap_base;
-} tss32_t;
-
 extern idt_entry_t boot_idt[256];
 
 void set_idt_entry(int vec, void *addr, int dpl)
@@ -327,7 +292,7 @@ void setup_gdt(void)
  .Lflush2: ::r(0x10));
 }
 
-static void set_idt_task_gate(int vec, u16 sel)
+void set_idt_task_gate(int vec, u16 sel)
 {
 idt_entry_t *e = boot_idt[vec];
 
diff --git a/lib/x86/desc.h b/lib/x86/desc.h
index 0b4897c..f819452 100644
--- a/lib/x86/desc.h
+++ b/lib/x86/desc.h
@@ -24,6 +24,41 @@ struct ex_regs {
 unsigned long rflags;
 };
 
+typedef struct {
+   u16 prev;
+   u16 res1;
+   u32 esp0;
+   u16 ss0;
+   u16 res2;
+   u32 esp1;
+   u16 ss1;
+   u16 res3;
+   u32 esp2;
+   u16 ss2;
+   u16 res4;
+   u32 cr3;
+   u32 eip;
+   u32 eflags;
+   u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
+   u16 es;
+   u16 res5;
+   u16 cs;
+   u16 res6;
+   u16 ss;
+   u16 res7;
+   u16 ds;
+   u16 res8;
+   u16 fs;
+   u16 res9;
+   u16 gs;
+   u16 res10;
+   u16 ldt;
+   u16 res11;
+   u16 t:1;
+   u16 res12:15;
+   u16 iomap_base;
+} tss32_t;
+
 #define ASM_TRY(catch)  \
 movl $0, %%gs:4 \n\t  \
 .pushsection .data.ex \n\t\
@@ -44,6 +79,7 @@ unsigned exception_error_code(void);
 void set_idt_entry(int vec, void *addr, int dpl);
 void set_idt_sel(int vec, u16 sel);
 void set_gdt_entry(int num, u32 base,  u32 limit, u8 access, u8 gran);
+void set_idt_task_gate(int vec, u16 sel);
 void set_intr_task_gate(int e, void *fn);
 void print_current_tss_info(void);
 void handle_exception(u8 v, void (*func)(struct ex_regs *regs));
diff --git a/lib/x86/vm.c b/lib/x86/vm.c
index abbb0c9..aae044a 100644
--- a/lib/x86/vm.c
+++ b/lib/x86/vm.c
@@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3,
   unsigned long phys,
   void *virt)
 {
-install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 0);
+install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | 
PTE_PSE, 0);
 }
 
 void install_page(unsigned long *cr3,
   unsigned long phys,
   void *virt)
 {
-install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE, 0);
+install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER, 0);
 }
 
 
diff --git a/lib/x86/vm.h b/lib/x86/vm.h
index bf8fd52..aebc5c3 100644
--- a/lib/x86/vm.h
+++ b/lib/x86/vm.h
@@ -13,6 +13,7 @@
 #define PTE_PRESENT (1ull  0)
 #define PTE_PSE (1ull  7)
 #define PTE_WRITE   (1ull  1)
+#define PTE_USER(1ull  2)
 #define PTE_ADDR(0xff000ull)
 
 void setup_vm();
diff --git a/x86/taskswitch_vm86.c b/x86/taskswitch_vm86.c
new file mode 100644
index 000..363cb00
--- /dev/null
+++ b/x86/taskswitch_vm86.c
@@ -0,0 +1,59 @@
+#include libcflat.h
+#include desc.h
+#include 

[PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-23 Thread Kevin Wolf
Currently, all task switches check privileges against the DPL of the
TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
the DPL of this take gate is used for the check instead. Exceptions,
external interrupts and iret shouldn't perform any check.

This patch fixes the problem for VMX. For SVM, the logic used to
determine the source of the task switch is buggy, so we can't pass
useful information to the emulator there and just disable the check in
all cases.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |2 +-
 arch/x86/include/asm/kvm_host.h|4 +-
 arch/x86/kvm/emulate.c |   51 +++-
 arch/x86/kvm/svm.c |3 +-
 arch/x86/kvm/vmx.c |5 ++-
 arch/x86/kvm/x86.c |6 ++--
 6 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index ab4092e..c8a9cf3 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -372,7 +372,7 @@ bool x86_page_table_writing_insn(struct x86_emulate_ctxt 
*ctxt);
 #define EMULATION_INTERCEPTED 2
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt);
 int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
-u16 tss_selector, int reason,
+u16 tss_selector, int idt_index, int reason,
 bool has_error_code, u32 error_code);
 int emulate_int_real(struct x86_emulate_ctxt *ctxt, int irq);
 #endif /* _ASM_X86_KVM_X86_EMULATE_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 52d6640..0533fc4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -741,8 +741,8 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
 
-int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason,
-   bool has_error_code, u32 error_code);
+int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
+   int reason, bool has_error_code, u32 error_code);
 
 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05a562b..1b98a2c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1151,6 +1151,22 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
return 1;
 }
 
+static int read_interrupt_descriptor(struct x86_emulate_ctxt *ctxt,
+u16 index, struct kvm_desc_struct *desc)
+{
+   struct kvm_desc_ptr dt;
+   ulong addr;
+
+   ctxt-ops-get_idt(ctxt, dt);
+
+   if (dt.size  index * 8 + 7)
+   return emulate_gp(ctxt, index  3 | 0x2);
+
+   addr = dt.address + index * 8;
+   return ctxt-ops-read_std(ctxt, addr, desc, sizeof *desc,
+  ctxt-exception);
+}
+
 static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
 u16 selector, struct desc_ptr *dt)
 {
@@ -2350,7 +2366,7 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt,
 }
 
 static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt,
-  u16 tss_selector, int reason,
+  u16 tss_selector, int idt_index, int reason,
   bool has_error_code, u32 error_code)
 {
struct x86_emulate_ops *ops = ctxt-ops;
@@ -2360,6 +2376,7 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
ulong old_tss_base =
ops-get_cached_segment_base(ctxt, VCPU_SREG_TR);
u32 desc_limit;
+   int dpl;
 
/* FIXME: old_tss_base == ~0 ? */
 
@@ -2372,12 +2389,32 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
 
/* FIXME: check that next_tss_desc is tss */
 
-   if (reason != TASK_SWITCH_IRET) {
-   if ((tss_selector  3)  next_tss_desc.dpl ||
-   ops-cpl(ctxt)  next_tss_desc.dpl)
-   return emulate_gp(ctxt, 0);
+   /*
+* Check privileges. The three cases are task switch caused by...
+*
+* 1. Software interrupt: Check against DPL of the task gate
+* 2. Exception/IRQ/iret: No check is performed
+* 3. jmp/call: Check agains DPL of the TSS
+*/
+   dpl = -1;
+   if (reason == TASK_SWITCH_GATE) {
+   if (idt_index != -1) {
+   struct kvm_desc_struct task_gate_desc;
+
+   ret = read_interrupt_descriptor(ctxt, idt_index,
+   task_gate_desc);
+   if 

[PATCH 2/3] KVM: x86 emulator: VM86 segments must have DPL 3

2012-01-23 Thread Kevin Wolf
Setting the segment DPL to 0 for at least the VM86 code segment makes
the VM entry fail on VMX.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/kvm/emulate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1b98a2c..833969e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1243,6 +1243,8 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
seg_desc.type = 3;
seg_desc.p = 1;
seg_desc.s = 1;
+   if (ctxt-mode == X86EMUL_MODE_VM86)
+   seg_desc.dpl = 3;
goto load;
}
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch

2012-01-23 Thread Kevin Wolf
Task switches can switch between Protected Mode and VM86. The current
mode must be updated during the task switch emulation so that the new
segment selectors are interpreted correctly and privilege checks
succeed.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   17 +
 arch/x86/kvm/x86.c |6 ++
 3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c8a9cf3..4a21c7d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -176,6 +176,7 @@ struct x86_emulate_ops {
void (*set_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr);
int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val);
+   void (*set_rflags)(struct x86_emulate_ctxt *ctxt, ulong val);
int (*cpl)(struct x86_emulate_ctxt *ctxt);
int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest);
int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 833969e..52fce89 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2273,6 +2273,23 @@ static int load_state_from_tss32(struct x86_emulate_ctxt 
*ctxt,
return emulate_gp(ctxt, 0);
ctxt-_eip = tss-eip;
ctxt-eflags = tss-eflags | 2;
+
+   /*
+* If we're switching between Protected Mode and VM86, we need to make
+* sure to update the mode before loading the segment descriptors so
+* that the selectors are interpreted correctly.
+*
+* Need to get it to the vcpu struct immediately because it influences
+* the CPL which is checked when loading the segment descriptors.
+*/
+   if (ctxt-eflags  X86_EFLAGS_VM)
+   ctxt-mode = X86EMUL_MODE_VM86;
+   else
+   ctxt-mode = X86EMUL_MODE_PROT32;
+
+   ctxt-ops-set_rflags(ctxt, ctxt-eflags);
+
+   /* General purpose registers */
ctxt-regs[VCPU_REGS_RAX] = tss-eax;
ctxt-regs[VCPU_REGS_RCX] = tss-ecx;
ctxt-regs[VCPU_REGS_RDX] = tss-edx;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dc3e945..502b5c3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4040,6 +4040,11 @@ static int emulator_set_cr(struct x86_emulate_ctxt 
*ctxt, int cr, ulong val)
return res;
 }
 
+static void emulator_set_rflags(struct x86_emulate_ctxt *ctxt, ulong val)
+{
+   kvm_set_rflags(emul_to_vcpu(ctxt), val);
+}
+
 static int emulator_get_cpl(struct x86_emulate_ctxt *ctxt)
 {
return kvm_x86_ops-get_cpl(emul_to_vcpu(ctxt));
@@ -4199,6 +4204,7 @@ static struct x86_emulate_ops emulate_ops = {
.set_idt = emulator_set_idt,
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
+   .set_rflags  = emulator_set_rflags,
.cpl = emulator_get_cpl,
.get_dr  = emulator_get_dr,
.set_dr  = emulator_set_dr,
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86

2012-01-23 Thread Gleb Natapov
On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote:
 This adds a test case that jumps into VM86 by iret-ing to a TSS and back
 to Protected Mode using a task gate in the IDT.
 
Can you add the test case to taskswitch2.c?

 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  config-i386.mak   |3 +-
  lib/x86/desc.c|   37 +-
  lib/x86/desc.h|   36 +
  lib/x86/vm.c  |4 +-
  lib/x86/vm.h  |1 +
  x86/taskswitch_vm86.c |   59 
 +
  x86/unittests.cfg |6 +
  7 files changed, 107 insertions(+), 39 deletions(-)
  create mode 100644 x86/taskswitch_vm86.c
 
 diff --git a/config-i386.mak b/config-i386.mak
 index de52f3d..b5c3b9c 100644
 --- a/config-i386.mak
 +++ b/config-i386.mak
 @@ -5,9 +5,10 @@ ldarch = elf32-i386
  CFLAGS += -D__i386__
  CFLAGS += -I $(KERNELDIR)/include
  
 -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat
 +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat 
 $(TEST_DIR)/taskswitch_vm86.flat
  
  include config-x86-common.mak
  
  $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o
  $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o
 +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o
 diff --git a/lib/x86/desc.c b/lib/x86/desc.c
 index 770c250..c4a3607 100644
 --- a/lib/x86/desc.c
 +++ b/lib/x86/desc.c
 @@ -27,41 +27,6 @@ typedef struct {
   u8 base_high;
  } gdt_entry_t;
  
 -typedef struct {
 - u16 prev;
 - u16 res1;
 - u32 esp0;
 - u16 ss0;
 - u16 res2;
 - u32 esp1;
 - u16 ss1;
 - u16 res3;
 - u32 esp2;
 - u16 ss2;
 - u16 res4;
 - u32 cr3;
 - u32 eip;
 - u32 eflags;
 - u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
 - u16 es;
 - u16 res5;
 - u16 cs;
 - u16 res6;
 - u16 ss;
 - u16 res7;
 - u16 ds;
 - u16 res8;
 - u16 fs;
 - u16 res9;
 - u16 gs;
 - u16 res10;
 - u16 ldt;
 - u16 res11;
 - u16 t:1;
 - u16 res12:15;
 - u16 iomap_base;
 -} tss32_t;
 -
  extern idt_entry_t boot_idt[256];
  
  void set_idt_entry(int vec, void *addr, int dpl)
 @@ -327,7 +292,7 @@ void setup_gdt(void)
 .Lflush2: ::r(0x10));
  }
  
 -static void set_idt_task_gate(int vec, u16 sel)
 +void set_idt_task_gate(int vec, u16 sel)
  {
  idt_entry_t *e = boot_idt[vec];
  
 diff --git a/lib/x86/desc.h b/lib/x86/desc.h
 index 0b4897c..f819452 100644
 --- a/lib/x86/desc.h
 +++ b/lib/x86/desc.h
 @@ -24,6 +24,41 @@ struct ex_regs {
  unsigned long rflags;
  };
  
 +typedef struct {
 + u16 prev;
 + u16 res1;
 + u32 esp0;
 + u16 ss0;
 + u16 res2;
 + u32 esp1;
 + u16 ss1;
 + u16 res3;
 + u32 esp2;
 + u16 ss2;
 + u16 res4;
 + u32 cr3;
 + u32 eip;
 + u32 eflags;
 + u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
 + u16 es;
 + u16 res5;
 + u16 cs;
 + u16 res6;
 + u16 ss;
 + u16 res7;
 + u16 ds;
 + u16 res8;
 + u16 fs;
 + u16 res9;
 + u16 gs;
 + u16 res10;
 + u16 ldt;
 + u16 res11;
 + u16 t:1;
 + u16 res12:15;
 + u16 iomap_base;
 +} tss32_t;
 +
  #define ASM_TRY(catch)  \
  movl $0, %%gs:4 \n\t  \
  .pushsection .data.ex \n\t\
 @@ -44,6 +79,7 @@ unsigned exception_error_code(void);
  void set_idt_entry(int vec, void *addr, int dpl);
  void set_idt_sel(int vec, u16 sel);
  void set_gdt_entry(int num, u32 base,  u32 limit, u8 access, u8 gran);
 +void set_idt_task_gate(int vec, u16 sel);
  void set_intr_task_gate(int e, void *fn);
  void print_current_tss_info(void);
  void handle_exception(u8 v, void (*func)(struct ex_regs *regs));
 diff --git a/lib/x86/vm.c b/lib/x86/vm.c
 index abbb0c9..aae044a 100644
 --- a/lib/x86/vm.c
 +++ b/lib/x86/vm.c
 @@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3,
unsigned long phys,
void *virt)
  {
 -install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 0);
 +install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | 
 PTE_PSE, 0);
  }
  
  void install_page(unsigned long *cr3,
unsigned long phys,
void *virt)
  {
 -install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE, 0);
 +install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER, 0);
  }
  
  
 diff --git a/lib/x86/vm.h b/lib/x86/vm.h
 index bf8fd52..aebc5c3 100644
 --- a/lib/x86/vm.h
 +++ b/lib/x86/vm.h
 @@ -13,6 +13,7 @@
  #define PTE_PRESENT (1ull  0)
  #define PTE_PSE (1ull  7)
  #define PTE_WRITE   (1ull  1)
 +#define PTE_USER(1ull  2)
  #define PTE_ADDR(0xff000ull)
  
  void setup_vm();
 diff --git a/x86/taskswitch_vm86.c b/x86/taskswitch_vm86.c
 

Continuous reboots on qemu-kvm master

2012-01-23 Thread erik . rull
Hi all,

I get continuous reboots on my guest system, including these dmesg entries:

[   31.770538] device tap0 entered promiscuous mode
[   31.770554] br0: port 2(tap0) entering learning state
[   39.259921] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74
[   39.259936] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
[   39.259946] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
[   44.870691] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74
[   44.870801] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
[   44.870901] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
[   46.727081] br0: port 2(tap0) entering forwarding state
[   50.481469] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74
[   50.481583] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
[   50.481685] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
[   55.827950] br0: port 2(tap0) entering disabled state
[   55.828110] device tap0 left promiscuous mode
[   55.828200] br0: port 2(tap0) entering disabled state


My ./configure is:
 ./configure --prefix= --target-list=x86_64-softmmu --disable-vnc-png
--disable-vnc-jpeg --disable-vnc-tls --disable-vnc-sasl --audio-card-list=
--audio-drv-list= --enable-sdl --disable-xen --disable-brlapi
--disable-bluez --disable-nptl --disable-curl --disable-guest-agent
--disable-guest-base --disable-werror --disable-attr

My qemu cmdline is:
/usr/X11R6/bin/qemu-system-x86_64 -serial /dev/ttyS2 -readconfig
/etc/ich9-ehci-uhci.cfg -device usb-host,bus=ehci.0 -device usb-tablet
-drive file=/dev/sda2,cache=off -m 1536 -net nic -net
tap,script=/etc/qemu-ifup -no-acpi -monitor stdio -L /usr/X11R6/share/qemu
-boot c -localtime -enable-kvm

Was fine with qemu-kvm-1.0 and the same options!

Best regards,

Erik


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86

2012-01-23 Thread Kevin Wolf
Am 23.01.2012 17:10, schrieb Gleb Natapov:
 On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote:
 This adds a test case that jumps into VM86 by iret-ing to a TSS and back
 to Protected Mode using a task gate in the IDT.

 Can you add the test case to taskswitch2.c?

That's actually what I intended to do at first, but there's nothing to
share and having a clean environment that can't interfere with other
tests feels nicer.

What would we gain from merging the files?

Kevin


 
 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  config-i386.mak   |3 +-
  lib/x86/desc.c|   37 +-
  lib/x86/desc.h|   36 +
  lib/x86/vm.c  |4 +-
  lib/x86/vm.h  |1 +
  x86/taskswitch_vm86.c |   59 
 +
  x86/unittests.cfg |6 +
  7 files changed, 107 insertions(+), 39 deletions(-)
  create mode 100644 x86/taskswitch_vm86.c

 diff --git a/config-i386.mak b/config-i386.mak
 index de52f3d..b5c3b9c 100644
 --- a/config-i386.mak
 +++ b/config-i386.mak
 @@ -5,9 +5,10 @@ ldarch = elf32-i386
  CFLAGS += -D__i386__
  CFLAGS += -I $(KERNELDIR)/include
  
 -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat
 +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat 
 $(TEST_DIR)/taskswitch_vm86.flat
  
  include config-x86-common.mak
  
  $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o
  $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o
 +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o
 diff --git a/lib/x86/desc.c b/lib/x86/desc.c
 index 770c250..c4a3607 100644
 --- a/lib/x86/desc.c
 +++ b/lib/x86/desc.c
 @@ -27,41 +27,6 @@ typedef struct {
  u8 base_high;
  } gdt_entry_t;
  
 -typedef struct {
 -u16 prev;
 -u16 res1;
 -u32 esp0;
 -u16 ss0;
 -u16 res2;
 -u32 esp1;
 -u16 ss1;
 -u16 res3;
 -u32 esp2;
 -u16 ss2;
 -u16 res4;
 -u32 cr3;
 -u32 eip;
 -u32 eflags;
 -u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
 -u16 es;
 -u16 res5;
 -u16 cs;
 -u16 res6;
 -u16 ss;
 -u16 res7;
 -u16 ds;
 -u16 res8;
 -u16 fs;
 -u16 res9;
 -u16 gs;
 -u16 res10;
 -u16 ldt;
 -u16 res11;
 -u16 t:1;
 -u16 res12:15;
 -u16 iomap_base;
 -} tss32_t;
 -
  extern idt_entry_t boot_idt[256];
  
  void set_idt_entry(int vec, void *addr, int dpl)
 @@ -327,7 +292,7 @@ void setup_gdt(void)
.Lflush2: ::r(0x10));
  }
  
 -static void set_idt_task_gate(int vec, u16 sel)
 +void set_idt_task_gate(int vec, u16 sel)
  {
  idt_entry_t *e = boot_idt[vec];
  
 diff --git a/lib/x86/desc.h b/lib/x86/desc.h
 index 0b4897c..f819452 100644
 --- a/lib/x86/desc.h
 +++ b/lib/x86/desc.h
 @@ -24,6 +24,41 @@ struct ex_regs {
  unsigned long rflags;
  };
  
 +typedef struct {
 +u16 prev;
 +u16 res1;
 +u32 esp0;
 +u16 ss0;
 +u16 res2;
 +u32 esp1;
 +u16 ss1;
 +u16 res3;
 +u32 esp2;
 +u16 ss2;
 +u16 res4;
 +u32 cr3;
 +u32 eip;
 +u32 eflags;
 +u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
 +u16 es;
 +u16 res5;
 +u16 cs;
 +u16 res6;
 +u16 ss;
 +u16 res7;
 +u16 ds;
 +u16 res8;
 +u16 fs;
 +u16 res9;
 +u16 gs;
 +u16 res10;
 +u16 ldt;
 +u16 res11;
 +u16 t:1;
 +u16 res12:15;
 +u16 iomap_base;
 +} tss32_t;
 +
  #define ASM_TRY(catch)  \
  movl $0, %%gs:4 \n\t  \
  .pushsection .data.ex \n\t\
 @@ -44,6 +79,7 @@ unsigned exception_error_code(void);
  void set_idt_entry(int vec, void *addr, int dpl);
  void set_idt_sel(int vec, u16 sel);
  void set_gdt_entry(int num, u32 base,  u32 limit, u8 access, u8 gran);
 +void set_idt_task_gate(int vec, u16 sel);
  void set_intr_task_gate(int e, void *fn);
  void print_current_tss_info(void);
  void handle_exception(u8 v, void (*func)(struct ex_regs *regs));
 diff --git a/lib/x86/vm.c b/lib/x86/vm.c
 index abbb0c9..aae044a 100644
 --- a/lib/x86/vm.c
 +++ b/lib/x86/vm.c
 @@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3,
unsigned long phys,
void *virt)
  {
 -install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 0);
 +install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | 
 PTE_PSE, 0);
  }
  
  void install_page(unsigned long *cr3,
unsigned long phys,
void *virt)
  {
 -install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE, 0);
 +install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER, 0);
  }
  
  
 diff --git a/lib/x86/vm.h b/lib/x86/vm.h
 index bf8fd52..aebc5c3 100644
 --- a/lib/x86/vm.h
 +++ b/lib/x86/vm.h
 @@ -13,6 +13,7 @@
  #define PTE_PRESENT (1ull  0)
  #define PTE_PSE (1ull  7)

[Bug 42635] PCIe passthrough broken with AMD iommu after s2disk / resume

2012-01-23 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42635





--- Comment #2 from Klaus Mueller kmuel...@justmail.de  2012-01-23 16:18:33 
---
The device never works in the guest after s2disk/resume cycle of the host. But
it always works if it's used by the host itself - even after a s2disk/resume
cycle.

The device is (as seen by the host):

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI
Express Gigabit Ethernet controller (rev 06)
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort-
MAbort- SERR- PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 45
Region 0: I/O ports at ee00 [size=256]
Region 2: Memory at fdcff000 (64-bit, prefetchable) [size=4K]
Region 4: Memory at fdcf8000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: fee0f00c  Data: 4122
Capabilities: [70] Express (v2) Endpoint, MSI 01
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s 512ns,
L1 64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+
TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency
L0 unlimited, L1 64us
ClockPM+ Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
 Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Vector table: BAR=4 offset=
PBA: BAR=4 offset=0800
Capabilities: [d0] Vital Product Data
Unknown small resource type 00, will not decode more.
Capabilities: [100 v1] Advanced Error Reporting
UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Virtual Channel
Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
Arb:Fixed- WRR32- WRR64- WRR128-
Ctrl:   ArbSelect=Fixed
Status: InProgress-
VC0:Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb:Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [160 v1] Device Serial Number 12-34-56-78-12-34-56-78
Kernel driver in use: pci-stub


One more thing: the device can't be used in the guest anymore, too, after a
reboot of the guest by rebooting the guest itself. I have to shutdown the guest
with virsh shutdown and virsh start to get it working again. But this is
useless, if there was a suspend / resume cycle of the host between.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86

2012-01-23 Thread Gleb Natapov
On Mon, Jan 23, 2012 at 05:20:22PM +0100, Kevin Wolf wrote:
 Am 23.01.2012 17:10, schrieb Gleb Natapov:
  On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote:
  This adds a test case that jumps into VM86 by iret-ing to a TSS and back
  to Protected Mode using a task gate in the IDT.
 
  Can you add the test case to taskswitch2.c?
 
Running one test to check all aspects of taskswitch emulation.

 That's actually what I intended to do at first, but there's nothing to
 share and having a clean environment that can't interfere with other
 tests feels nicer.
 
 What would we gain from merging the files?
 
 Kevin
 
 
  
  Signed-off-by: Kevin Wolf kw...@redhat.com
  ---
   config-i386.mak   |3 +-
   lib/x86/desc.c|   37 +-
   lib/x86/desc.h|   36 +
   lib/x86/vm.c  |4 +-
   lib/x86/vm.h  |1 +
   x86/taskswitch_vm86.c |   59 
  +
   x86/unittests.cfg |6 +
   7 files changed, 107 insertions(+), 39 deletions(-)
   create mode 100644 x86/taskswitch_vm86.c
 
  diff --git a/config-i386.mak b/config-i386.mak
  index de52f3d..b5c3b9c 100644
  --- a/config-i386.mak
  +++ b/config-i386.mak
  @@ -5,9 +5,10 @@ ldarch = elf32-i386
   CFLAGS += -D__i386__
   CFLAGS += -I $(KERNELDIR)/include
   
  -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat
  +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat 
  $(TEST_DIR)/taskswitch_vm86.flat
   
   include config-x86-common.mak
   
   $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o
   $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o
  +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o
  diff --git a/lib/x86/desc.c b/lib/x86/desc.c
  index 770c250..c4a3607 100644
  --- a/lib/x86/desc.c
  +++ b/lib/x86/desc.c
  @@ -27,41 +27,6 @@ typedef struct {
 u8 base_high;
   } gdt_entry_t;
   
  -typedef struct {
  -  u16 prev;
  -  u16 res1;
  -  u32 esp0;
  -  u16 ss0;
  -  u16 res2;
  -  u32 esp1;
  -  u16 ss1;
  -  u16 res3;
  -  u32 esp2;
  -  u16 ss2;
  -  u16 res4;
  -  u32 cr3;
  -  u32 eip;
  -  u32 eflags;
  -  u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
  -  u16 es;
  -  u16 res5;
  -  u16 cs;
  -  u16 res6;
  -  u16 ss;
  -  u16 res7;
  -  u16 ds;
  -  u16 res8;
  -  u16 fs;
  -  u16 res9;
  -  u16 gs;
  -  u16 res10;
  -  u16 ldt;
  -  u16 res11;
  -  u16 t:1;
  -  u16 res12:15;
  -  u16 iomap_base;
  -} tss32_t;
  -
   extern idt_entry_t boot_idt[256];
   
   void set_idt_entry(int vec, void *addr, int dpl)
  @@ -327,7 +292,7 @@ void setup_gdt(void)
   .Lflush2: ::r(0x10));
   }
   
  -static void set_idt_task_gate(int vec, u16 sel)
  +void set_idt_task_gate(int vec, u16 sel)
   {
   idt_entry_t *e = boot_idt[vec];
   
  diff --git a/lib/x86/desc.h b/lib/x86/desc.h
  index 0b4897c..f819452 100644
  --- a/lib/x86/desc.h
  +++ b/lib/x86/desc.h
  @@ -24,6 +24,41 @@ struct ex_regs {
   unsigned long rflags;
   };
   
  +typedef struct {
  +  u16 prev;
  +  u16 res1;
  +  u32 esp0;
  +  u16 ss0;
  +  u16 res2;
  +  u32 esp1;
  +  u16 ss1;
  +  u16 res3;
  +  u32 esp2;
  +  u16 ss2;
  +  u16 res4;
  +  u32 cr3;
  +  u32 eip;
  +  u32 eflags;
  +  u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
  +  u16 es;
  +  u16 res5;
  +  u16 cs;
  +  u16 res6;
  +  u16 ss;
  +  u16 res7;
  +  u16 ds;
  +  u16 res8;
  +  u16 fs;
  +  u16 res9;
  +  u16 gs;
  +  u16 res10;
  +  u16 ldt;
  +  u16 res11;
  +  u16 t:1;
  +  u16 res12:15;
  +  u16 iomap_base;
  +} tss32_t;
  +
   #define ASM_TRY(catch)  \
   movl $0, %%gs:4 \n\t  \
   .pushsection .data.ex \n\t\
  @@ -44,6 +79,7 @@ unsigned exception_error_code(void);
   void set_idt_entry(int vec, void *addr, int dpl);
   void set_idt_sel(int vec, u16 sel);
   void set_gdt_entry(int num, u32 base,  u32 limit, u8 access, u8 gran);
  +void set_idt_task_gate(int vec, u16 sel);
   void set_intr_task_gate(int e, void *fn);
   void print_current_tss_info(void);
   void handle_exception(u8 v, void (*func)(struct ex_regs *regs));
  diff --git a/lib/x86/vm.c b/lib/x86/vm.c
  index abbb0c9..aae044a 100644
  --- a/lib/x86/vm.c
  +++ b/lib/x86/vm.c
  @@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3,
 unsigned long phys,
 void *virt)
   {
  -install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 
  0);
  +install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | 
  PTE_PSE, 0);
   }
   
   void install_page(unsigned long *cr3,
 unsigned long phys,
 void *virt)
   {
  -install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE, 0);
  +install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER, 
  0);
   }
   
   
  diff --git 

[Bug 42636] PCI passthrough does not work with AMD iommu for PCI device

2012-01-23 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42636





--- Comment #2 from Klaus Mueller kmuel...@justmail.de  2012-01-23 16:26:49 
---
Well, I did exactly what you proposed, but I got the same error again, as I
tried to apply both devices. That's the relevant part of the xml file:

hostdev mode='subsystem' type='pci' managed='yes'
  source
address domain='0x' bus='0x05' slot='0x06' function='0x0'/
  /source
/hostdev

hostdev mode='subsystem' type='pci' managed='yes'
  source
address domain='0x' bus='0x05' slot='0x07' function='0x0'/
  /source
/hostdev

Did I make a mistake?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86

2012-01-23 Thread Kevin Wolf
Am 23.01.2012 17:22, schrieb Gleb Natapov:
 On Mon, Jan 23, 2012 at 05:20:22PM +0100, Kevin Wolf wrote:
 Am 23.01.2012 17:10, schrieb Gleb Natapov:
 On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote:
 This adds a test case that jumps into VM86 by iret-ing to a TSS and back
 to Protected Mode using a task gate in the IDT.

 Can you add the test case to taskswitch2.c?

 Running one test to check all aspects of taskswitch emulation.

(We all know that top-posting is disliked, but middle-posting looks even
crazier!)

Does having one test provide any value in and of itself? It's just an
implementation detail of the test suite. When testing the KVM patches I
ran all three test cases with './run_tests.sh -g task', which is
hopefully easy enough.

Kevin

 
 That's actually what I intended to do at first, but there's nothing to
 share and having a clean environment that can't interfere with other
 tests feels nicer.

 What would we gain from merging the files?

 Kevin



 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  config-i386.mak   |3 +-
  lib/x86/desc.c|   37 +-
  lib/x86/desc.h|   36 +
  lib/x86/vm.c  |4 +-
  lib/x86/vm.h  |1 +
  x86/taskswitch_vm86.c |   59 
 +
  x86/unittests.cfg |6 +
  7 files changed, 107 insertions(+), 39 deletions(-)
  create mode 100644 x86/taskswitch_vm86.c

 diff --git a/config-i386.mak b/config-i386.mak
 index de52f3d..b5c3b9c 100644
 --- a/config-i386.mak
 +++ b/config-i386.mak
 @@ -5,9 +5,10 @@ ldarch = elf32-i386
  CFLAGS += -D__i386__
  CFLAGS += -I $(KERNELDIR)/include
  
 -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat
 +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat 
 $(TEST_DIR)/taskswitch_vm86.flat
  
  include config-x86-common.mak
  
  $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o
  $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o
 +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o
 diff --git a/lib/x86/desc.c b/lib/x86/desc.c
 index 770c250..c4a3607 100644
 --- a/lib/x86/desc.c
 +++ b/lib/x86/desc.c
 @@ -27,41 +27,6 @@ typedef struct {
u8 base_high;
  } gdt_entry_t;
  
 -typedef struct {
 -  u16 prev;
 -  u16 res1;
 -  u32 esp0;
 -  u16 ss0;
 -  u16 res2;
 -  u32 esp1;
 -  u16 ss1;
 -  u16 res3;
 -  u32 esp2;
 -  u16 ss2;
 -  u16 res4;
 -  u32 cr3;
 -  u32 eip;
 -  u32 eflags;
 -  u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
 -  u16 es;
 -  u16 res5;
 -  u16 cs;
 -  u16 res6;
 -  u16 ss;
 -  u16 res7;
 -  u16 ds;
 -  u16 res8;
 -  u16 fs;
 -  u16 res9;
 -  u16 gs;
 -  u16 res10;
 -  u16 ldt;
 -  u16 res11;
 -  u16 t:1;
 -  u16 res12:15;
 -  u16 iomap_base;
 -} tss32_t;
 -
  extern idt_entry_t boot_idt[256];
  
  void set_idt_entry(int vec, void *addr, int dpl)
 @@ -327,7 +292,7 @@ void setup_gdt(void)
  .Lflush2: ::r(0x10));
  }
  
 -static void set_idt_task_gate(int vec, u16 sel)
 +void set_idt_task_gate(int vec, u16 sel)
  {
  idt_entry_t *e = boot_idt[vec];
  
 diff --git a/lib/x86/desc.h b/lib/x86/desc.h
 index 0b4897c..f819452 100644
 --- a/lib/x86/desc.h
 +++ b/lib/x86/desc.h
 @@ -24,6 +24,41 @@ struct ex_regs {
  unsigned long rflags;
  };
  
 +typedef struct {
 +  u16 prev;
 +  u16 res1;
 +  u32 esp0;
 +  u16 ss0;
 +  u16 res2;
 +  u32 esp1;
 +  u16 ss1;
 +  u16 res3;
 +  u32 esp2;
 +  u16 ss2;
 +  u16 res4;
 +  u32 cr3;
 +  u32 eip;
 +  u32 eflags;
 +  u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
 +  u16 es;
 +  u16 res5;
 +  u16 cs;
 +  u16 res6;
 +  u16 ss;
 +  u16 res7;
 +  u16 ds;
 +  u16 res8;
 +  u16 fs;
 +  u16 res9;
 +  u16 gs;
 +  u16 res10;
 +  u16 ldt;
 +  u16 res11;
 +  u16 t:1;
 +  u16 res12:15;
 +  u16 iomap_base;
 +} tss32_t;
 +
  #define ASM_TRY(catch)  \
  movl $0, %%gs:4 \n\t  \
  .pushsection .data.ex \n\t\
 @@ -44,6 +79,7 @@ unsigned exception_error_code(void);
  void set_idt_entry(int vec, void *addr, int dpl);
  void set_idt_sel(int vec, u16 sel);
  void set_gdt_entry(int num, u32 base,  u32 limit, u8 access, u8 gran);
 +void set_idt_task_gate(int vec, u16 sel);
  void set_intr_task_gate(int e, void *fn);
  void print_current_tss_info(void);
  void handle_exception(u8 v, void (*func)(struct ex_regs *regs));
 diff --git a/lib/x86/vm.c b/lib/x86/vm.c
 index abbb0c9..aae044a 100644
 --- a/lib/x86/vm.c
 +++ b/lib/x86/vm.c
 @@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3,
unsigned long phys,
void *virt)
  {
 -install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 
 0);
 +install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | 
 PTE_PSE, 0);
  }
  
  void install_page(unsigned long *cr3,
unsigned long phys,

Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86

2012-01-23 Thread Gleb Natapov
On Mon, Jan 23, 2012 at 05:32:59PM +0100, Kevin Wolf wrote:
 Am 23.01.2012 17:22, schrieb Gleb Natapov:
  On Mon, Jan 23, 2012 at 05:20:22PM +0100, Kevin Wolf wrote:
  Am 23.01.2012 17:10, schrieb Gleb Natapov:
  On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote:
  This adds a test case that jumps into VM86 by iret-ing to a TSS and back
  to Protected Mode using a task gate in the IDT.
 
  Can you add the test case to taskswitch2.c?
 
  Running one test to check all aspects of taskswitch emulation.
 
 (We all know that top-posting is disliked, but middle-posting looks even
 crazier!)
 
Inserting replies at random places is a new cool thing!

 Does having one test provide any value in and of itself? It's just an
 implementation detail of the test suite. When testing the KVM patches I
 ran all three test cases with './run_tests.sh -g task', which is
 hopefully easy enough.
 
I think it does. I do not have to use external script to combine tests
on the same topic or even remember that such script exists. We do not
create separate tests to test each instruction emulation either. And I
usually run qemu not on the same machine I compile it on, so I need
special tricks to make those test script work. Of course if putting this
code into existing test file is hard separate test is OK, but is this
really the case here?

 Kevin
 
  
  That's actually what I intended to do at first, but there's nothing to
  share and having a clean environment that can't interfere with other
  tests feels nicer.
 
  What would we gain from merging the files?
 
  Kevin
 
 
 
  Signed-off-by: Kevin Wolf kw...@redhat.com
  ---
   config-i386.mak   |3 +-
   lib/x86/desc.c|   37 +-
   lib/x86/desc.h|   36 +
   lib/x86/vm.c  |4 +-
   lib/x86/vm.h  |1 +
   x86/taskswitch_vm86.c |   59 
  +
   x86/unittests.cfg |6 +
   7 files changed, 107 insertions(+), 39 deletions(-)
   create mode 100644 x86/taskswitch_vm86.c
 
  diff --git a/config-i386.mak b/config-i386.mak
  index de52f3d..b5c3b9c 100644
  --- a/config-i386.mak
  +++ b/config-i386.mak
  @@ -5,9 +5,10 @@ ldarch = elf32-i386
   CFLAGS += -D__i386__
   CFLAGS += -I $(KERNELDIR)/include
   
  -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat
  +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat 
  $(TEST_DIR)/taskswitch_vm86.flat
   
   include config-x86-common.mak
   
   $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o
   $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o
  +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) 
  $(TEST_DIR)/taskswitch_vm86.o
  diff --git a/lib/x86/desc.c b/lib/x86/desc.c
  index 770c250..c4a3607 100644
  --- a/lib/x86/desc.c
  +++ b/lib/x86/desc.c
  @@ -27,41 +27,6 @@ typedef struct {
   u8 base_high;
   } gdt_entry_t;
   
  -typedef struct {
  -u16 prev;
  -u16 res1;
  -u32 esp0;
  -u16 ss0;
  -u16 res2;
  -u32 esp1;
  -u16 ss1;
  -u16 res3;
  -u32 esp2;
  -u16 ss2;
  -u16 res4;
  -u32 cr3;
  -u32 eip;
  -u32 eflags;
  -u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
  -u16 es;
  -u16 res5;
  -u16 cs;
  -u16 res6;
  -u16 ss;
  -u16 res7;
  -u16 ds;
  -u16 res8;
  -u16 fs;
  -u16 res9;
  -u16 gs;
  -u16 res10;
  -u16 ldt;
  -u16 res11;
  -u16 t:1;
  -u16 res12:15;
  -u16 iomap_base;
  -} tss32_t;
  -
   extern idt_entry_t boot_idt[256];
   
   void set_idt_entry(int vec, void *addr, int dpl)
  @@ -327,7 +292,7 @@ void setup_gdt(void)
 .Lflush2: ::r(0x10));
   }
   
  -static void set_idt_task_gate(int vec, u16 sel)
  +void set_idt_task_gate(int vec, u16 sel)
   {
   idt_entry_t *e = boot_idt[vec];
   
  diff --git a/lib/x86/desc.h b/lib/x86/desc.h
  index 0b4897c..f819452 100644
  --- a/lib/x86/desc.h
  +++ b/lib/x86/desc.h
  @@ -24,6 +24,41 @@ struct ex_regs {
   unsigned long rflags;
   };
   
  +typedef struct {
  +u16 prev;
  +u16 res1;
  +u32 esp0;
  +u16 ss0;
  +u16 res2;
  +u32 esp1;
  +u16 ss1;
  +u16 res3;
  +u32 esp2;
  +u16 ss2;
  +u16 res4;
  +u32 cr3;
  +u32 eip;
  +u32 eflags;
  +u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
  +u16 es;
  +u16 res5;
  +u16 cs;
  +u16 res6;
  +u16 ss;
  +u16 res7;
  +u16 ds;
  +u16 res8;
  +u16 fs;
  +u16 res9;
  +u16 gs;
  +u16 res10;
  +u16 ldt;
  +u16 res11;
  +u16 t:1;
  +u16 res12:15;
  +u16 iomap_base;
  +} tss32_t;
  +
   #define 

[PATCH v2 0/5] VFIO core framework

2012-01-23 Thread Alex Williamson
This series includes the core framework for the VFIO driver.
VFIO is a userspace driver interface meant to replace both the
KVM device assignment code as well as interfaces like UIO.  Please
see patch 1/5 for a complete description of VFIO, what it can do,
and how it's designed.

This series can also be found here:

git://github.com/awilliam/linux-vfio.git vfio-next

This plus the PCI VFIO bus driver for exposing PCI devices to
userspace can be found here:

git://github.com/awilliam/linux-vfio.git vfio-next-staging

or here for a linux-2.6.git based tree:

git://github.com/awilliam/linux-vfio.git vfio-linux-staging

A fully functional qemu driver for doing non-KVM based PCI
device assignment can be found here:

git://github.com/awilliam/qemu-vfio.git vfio-ng

I'd like to propose VFIO for inclusion in Linux 3.4, starting with
this core framework series.  Once we have agreement on these, I'll
split up and post the VFIO PCI bus driver for inclusion as well.
I can also host the above vfio-next branch for inclusion in
linux-next.  Please review and comment.  Thanks,

Alex

v2: Interrupt setup ioctl rework based on comments by Konrad.
The interrupt ioctls are no longer exclusively targeted
at eventfds, allowing for more flexibility of other vfio
bus drivers making use of alternate mechanisms.  Also
updated vfio_iommu_info to report common IOMMU geometry
fields that we know we're going to need for Freescale
PAMU.  Additional ioctls and fields to be added via flags
as they're implemented in the IOMMU API.

---

Alex Williamson (5):
  vfio: VFIO core Kconfig and Makefile
  vfio: VFIO core IOMMU mapping support
  vfio: VFIO core group interface
  vfio: VFIO core header
  vfio: Introduce documentation for VFIO driver


 Documentation/ioctl/ioctl-number.txt |1 
 Documentation/vfio.txt   |  359 ++
 MAINTAINERS  |8 
 drivers/Kconfig  |2 
 drivers/Makefile |1 
 drivers/vfio/Kconfig |8 
 drivers/vfio/Makefile|3 
 drivers/vfio/vfio_iommu.c|  611 +
 drivers/vfio/vfio_main.c | 1248 ++
 drivers/vfio/vfio_private.h  |   36 +
 include/linux/vfio.h |  395 +++
 11 files changed, 2672 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/vfio.txt
 create mode 100644 drivers/vfio/Kconfig
 create mode 100644 drivers/vfio/Makefile
 create mode 100644 drivers/vfio/vfio_iommu.c
 create mode 100644 drivers/vfio/vfio_main.c
 create mode 100644 drivers/vfio/vfio_private.h
 create mode 100644 include/linux/vfio.h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/5] vfio: Introduce documentation for VFIO driver

2012-01-23 Thread Alex Williamson
Including rationale for design, example usage and API description.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/vfio.txt |  359 
 1 files changed, 359 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/vfio.txt

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
new file mode 100644
index 000..4dfccf6
--- /dev/null
+++ b/Documentation/vfio.txt
@@ -0,0 +1,359 @@
+VFIO - Virtual Function I/O[1]
+---
+Many modern system now provide DMA and interrupt remapping facilities
+to help ensure I/O devices behave within the boundaries they've been
+allotted.  This includes x86 hardware with AMD-Vi and Intel VT-d,
+POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC
+systems such as Freescale PAMU.  The VFIO driver is an IOMMU/device
+agnostic framework for exposing direct device access to userspace, in
+a secure, IOMMU protected environment.  In other words, this allows
+safe[2], non-privileged, userspace drivers.
+
+Why do we want that?  Virtual machines often make use of direct device
+access (device assignment) when configured for the highest possible
+I/O performance.  From a device and host perspective, this simply
+turns the VM into a userspace driver, with the benefits of
+significantly reduced latency, higher bandwidth, and direct use of
+bare-metal device drivers[3].
+
+Some applications, particularly in the high performance computing
+field, also benefit from low-overhead, direct device access from
+userspace.  Examples include network adapters (often non-TCP/IP based)
+and compute accelerators.  Prior to VFIO, these drivers had to either
+go through the full development cycle to become proper upstream
+driver, be maintained out of tree, or make use of the UIO framework,
+which has no notion of IOMMU protection, limited interrupt support,
+and requires root privileges to access things like PCI configuration
+space.
+
+The VFIO driver framework intends to unify these, replacing both the
+KVM PCI specific device assignment code as well as provide a more
+secure, more featureful userspace driver environment than UIO.
+
+Groups, Devices, and IOMMUs
+---
+
+Userspace drivers are primarily concerned with manipulating individual
+devices and setting up mappings in the IOMMU for those devices.
+Unfortunately, the IOMMU doesn't always have the granularity to track
+mappings for an individual device.  Sometimes this is a topology
+barrier, such as a PCIe-to-PCI bridge interposing the device and
+IOMMU, other times this is an IOMMU limitation.  In any case, the
+reality is that devices are not always independent with respect to the
+IOMMU.  Translations setup for one device can be used by another
+device in these scenarios.
+
+The IOMMU API exposes these relationships by identifying an IOMMU
+group for these dependent devices.  Devices on the same bus with the
+same IOMMU group (or just group for this document) are not isolated
+from each other with respect to DMA mappings.  For userspace usage,
+this logically means that instead of being able to grant ownership of
+an individual device, we must grant ownership of a group, which may
+contain one or more devices.
+
+These groups therefore become a fundamental component of VFIO and the
+working unit we use for exposing devices and granting permissions to
+userspace.  In addition, VFIO make efforts to ensure the integrity of
+the group for user access.  This includes ensuring that all devices
+within the group are controlled by VFIO (vs native host drivers)
+before allowing a user to access any member of the group or the IOMMU
+mappings, as well as maintaining the group viability as devices are
+dynamically added or removed from the system.
+
+To access a device through VFIO, a user must open a character device
+for the group that the device belongs to and then issue an ioctl to
+retrieve a file descriptor for the individual device.  This ensures
+that the user has permissions to the group (file based access to the
+/dev entry) and allows a check point at which VFIO can deny access to
+the device if the group is not viable (all devices within the group
+controlled by VFIO).  A file descriptor for the IOMMU is obtain in the
+same fashion.
+
+VFIO defines a standard set of APIs for access to devices and a
+modular interface for adding new, bus-specific VFIO device drivers.
+We call these VFIO bus drivers.  The vfio-pci module is an example
+of a bus driver for exposing PCI devices.  When the bus driver module
+is loaded it enumerates all of the devices for it's bus, registering
+each device with the vfio core along with a set of callbacks.  For
+buses that support hotplug, the bus driver also adds itself to the
+notification chain for such events.  The callbacks registered with
+each device 

[PATCH v2 2/5] vfio: VFIO core header

2012-01-23 Thread Alex Williamson
This defines both the user and bus driver APIs.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/ioctl/ioctl-number.txt |1 
 include/linux/vfio.h |  395 ++
 2 files changed, 396 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/vfio.h

diff --git a/Documentation/ioctl/ioctl-number.txt 
b/Documentation/ioctl/ioctl-number.txt
index 2550754..79c5ef8 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -88,6 +88,7 @@ Code  Seq#(hex)   Include FileComments
and kernel/power/user.c
 '8'all SNP8023 advanced NIC card
mailto:m...@solidum.com
+';'64-83   linux/vfio.h
 '@'00-0F   linux/radeonfb.hconflict!
 '@'00-0F   drivers/video/aty/aty128fb.cconflict!
 'A'00-1F   linux/apm_bios.hconflict!
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
new file mode 100644
index 000..797dbe4
--- /dev/null
+++ b/include/linux/vfio.h
@@ -0,0 +1,395 @@
+/*
+ * VFIO API definition
+ *
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson alex.william...@redhat.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef VFIO_H
+#define VFIO_H
+
+#include linux/types.h
+
+#ifdef __KERNEL__  /* Internal VFIO-core/bus driver API */
+
+/**
+ * struct vfio_device_ops - VFIO bus driver device callbacks
+ *
+ * @match: Return true if buf describes the device
+ * @claim: Force driver to attach to device
+ * @open: Called when userspace receives file descriptor for device
+ * @release: Called when userspace releases file descriptor for device
+ * @read: Perform read(2) on device file descriptor
+ * @write: Perform write(2) on device file descriptor
+ * @ioctl: Perform ioctl(2) on device file descriptor, supporting VFIO_DEVICE_*
+ * operations documented below
+ * @mmap: Perform mmap(2) on a region of the device file descriptor
+ */
+struct vfio_device_ops {
+   bool(*match)(struct device *dev, const char *buf);
+   int (*claim)(struct device *dev);
+   int (*open)(void *device_data);
+   void(*release)(void *device_data);
+   ssize_t (*read)(void *device_data, char __user *buf,
+   size_t count, loff_t *ppos);
+   ssize_t (*write)(void *device_data, const char __user *buf,
+size_t count, loff_t *size);
+   long(*ioctl)(void *device_data, unsigned int cmd,
+unsigned long arg);
+   int (*mmap)(void *device_data, struct vm_area_struct *vma);
+};
+
+/**
+ * vfio_group_add_dev() - Add a device to the vfio-core
+ *
+ * @dev: Device to add
+ * @ops: VFIO bus driver callbacks for device
+ *
+ * This registration makes the VFIO core aware of the device, creates
+ * groups objects as required and exposes chardevs under /dev/vfio.
+ *
+ * Return 0 on success, errno on failure.
+ */
+extern int vfio_group_add_dev(struct device *dev,
+ const struct vfio_device_ops *ops);
+
+/**
+ * vfio_group_del_dev() - Remove a device from the vfio-core
+ *
+ * @dev: Device to remove
+ *
+ * Remove a device previously added to the VFIO core, removing groups
+ * and chardevs as necessary.
+ */
+extern void vfio_group_del_dev(struct device *dev);
+
+/**
+ * vfio_bind_dev() - Indicate device is bound to the VFIO bus driver and
+ *   register private data structure for ops callbacks.
+ *
+ * @dev: Device being bound
+ * @device_data: VFIO bus driver private data
+ *
+ * This registration indicate that a device previously registered with
+ * vfio_group_add_dev() is now available for use by the VFIO core.  When
+ * all devices within a group are available, the group is viable and my
+ * be used by userspace drivers.  Typically called from VFIO bus driver
+ * probe function.
+ *
+ * Return 0 on success, errno on failure
+ */
+extern int vfio_bind_dev(struct device *dev, void *device_data);
+
+/**
+ * vfio_unbind_dev() - Indicate device is unbinding from VFIO bus driver
+ *
+ * @dev: Device being unbound
+ *
+ * De-registration of the device previously registered with vfio_bind_dev()
+ * from VFIO.  Upon completion, the device is no longer available for use by
+ * the VFIO core.  Typically called from the VFIO bus driver remove function.
+ * The VFIO core will attempt to release the device from users and may take
+ * measures to free the device and/or block as necessary.
+ *
+ * Returns pointer to private device_data structure registered with
+ * vfio_bind_dev().
+ */
+extern void *vfio_unbind_dev(struct device *dev);
+
+
+/**
+ * offsetofend(TYPE, MEMBER)
+ *
+ * @TYPE: The type of the structure
+ * @MEMBER: The member within the 

[PATCH v2 3/5] vfio: VFIO core group interface

2012-01-23 Thread Alex Williamson
This provides the base group management with conduits to the
IOMMU driver and VFIO bus drivers.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 drivers/vfio/vfio_main.c| 1248 +++
 drivers/vfio/vfio_private.h |   36 +
 2 files changed, 1284 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vfio/vfio_main.c
 create mode 100644 drivers/vfio/vfio_private.h

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
new file mode 100644
index 000..fcd6476
--- /dev/null
+++ b/drivers/vfio/vfio_main.c
@@ -0,0 +1,1248 @@
+/*
+ * VFIO framework
+ *
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson alex.william...@redhat.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, p...@cisco.com
+ */
+
+#include linux/cdev.h
+#include linux/compat.h
+#include linux/device.h
+#include linux/file.h
+#include linux/anon_inodes.h
+#include linux/fs.h
+#include linux/idr.h
+#include linux/iommu.h
+#include linux/mm.h
+#include linux/module.h
+#include linux/slab.h
+#include linux/string.h
+#include linux/uaccess.h
+#include linux/vfio.h
+#include linux/wait.h
+
+#include vfio_private.h
+
+#define DRIVER_VERSION 0.2
+#define DRIVER_AUTHOR  Alex Williamson alex.william...@redhat.com
+#define DRIVER_DESCVFIO - User Level meta-driver
+
+static struct vfio {
+   dev_t   devt;
+   struct cdev cdev;
+   struct list_headgroup_list;
+   struct mutexlock;
+   struct kref kref;
+   struct class*class;
+   struct idr  idr;
+   wait_queue_head_t   release_q;
+} vfio;
+
+static const struct file_operations vfio_group_fops;
+
+struct vfio_group {
+   dev_t   devt;
+   unsigned intgroupid;
+   struct bus_type *bus;
+   struct vfio_iommu   *iommu;
+   struct list_headdevice_list;
+   struct list_headiommu_next;
+   struct list_headgroup_next;
+   struct device   *dev;
+   struct kobject  *devices_kobj;
+   int refcnt;
+   booltainted;
+};
+
+struct vfio_device {
+   struct device   *dev;
+   const struct vfio_device_ops*ops;
+   struct vfio_group   *group;
+   struct list_headdevice_next;
+   boolattached;
+   booldeleteme;
+   int refcnt;
+   void*device_data;
+};
+
+/*
+ * Helper functions called under vfio.lock
+ */
+
+/* Return true if any devices within a group are opened */
+static bool __vfio_group_devs_inuse(struct vfio_group *group)
+{
+   struct list_head *pos;
+
+   list_for_each(pos, group-device_list) {
+   struct vfio_device *device;
+
+   device = list_entry(pos, struct vfio_device, device_next);
+   if (device-refcnt)
+   return true;
+   }
+   return false;
+}
+
+/*
+ * Return true if any of the groups attached to an iommu are opened.
+ * We can only tear apart merged groups when nothing is left open.
+ */
+static bool __vfio_iommu_groups_inuse(struct vfio_iommu *iommu)
+{
+   struct list_head *pos;
+
+   list_for_each(pos, iommu-group_list) {
+   struct vfio_group *group;
+
+   group = list_entry(pos, struct vfio_group, iommu_next);
+   if (group-refcnt)
+   return true;
+   }
+   return false;
+}
+
+/*
+ * An iommu is in use if it has a file descriptor open or if any of
+ * the groups assigned to the iommu have devices open.
+ */
+static bool __vfio_iommu_inuse(struct vfio_iommu *iommu)
+{
+   struct list_head *pos;
+
+   if (iommu-refcnt)
+   return true;
+
+   list_for_each(pos, iommu-group_list) {
+   struct vfio_group *group;
+
+   group = list_entry(pos, struct vfio_group, iommu_next);
+
+   if (__vfio_group_devs_inuse(group))
+   return true;
+   }
+   return false;
+}
+
+static void __vfio_group_set_iommu(struct vfio_group *group,
+  struct vfio_iommu *iommu)
+{
+   if (group-iommu)
+   list_del(group-iommu_next);
+   if (iommu)
+   list_add(group-iommu_next, iommu-group_list);
+
+   group-iommu = iommu;
+}
+
+static void __vfio_iommu_detach_dev(struct vfio_iommu *iommu,
+   struct vfio_device *device)
+{
+   if (WARN_ON(!iommu-domain  

[PATCH v2 4/5] vfio: VFIO core IOMMU mapping support

2012-01-23 Thread Alex Williamson
Backing for operations on the IOMMU object, including DMA
mapping and unmapping.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 drivers/vfio/vfio_iommu.c |  611 +
 1 files changed, 611 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vfio/vfio_iommu.c

diff --git a/drivers/vfio/vfio_iommu.c b/drivers/vfio/vfio_iommu.c
new file mode 100644
index 000..49e6b2d
--- /dev/null
+++ b/drivers/vfio/vfio_iommu.c
@@ -0,0 +1,611 @@
+/*
+ * VFIO: IOMMU DMA mapping support
+ *
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson alex.william...@redhat.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, p...@cisco.com
+ */
+
+#include linux/compat.h
+#include linux/device.h
+#include linux/fs.h
+#include linux/iommu.h
+#include linux/module.h
+#include linux/mm.h
+#include linux/sched.h
+#include linux/slab.h
+#include linux/uaccess.h
+#include linux/vfio.h
+#include linux/workqueue.h
+
+#include vfio_private.h
+
+struct vfio_dma_map_entry {
+   struct list_headlist;
+   dma_addr_t  iova;   /* Device address */
+   unsigned long   vaddr;  /* Process virtual addr */
+   longnpage;  /* Number of pages */
+   int prot;   /* IOMMU_READ/WRITE */
+};
+
+/*
+ * This code handles mapping and unmapping of user data buffers
+ * into DMA'ble space using the IOMMU
+ */
+
+#define NPAGE_TO_SIZE(npage)   ((size_t)(npage)  PAGE_SHIFT)
+
+struct vwork {
+   struct mm_struct*mm;
+   longnpage;
+   struct work_struct  work;
+};
+
+/* delayed decrement/increment for locked_vm */
+static void vfio_lock_acct_bg(struct work_struct *work)
+{
+   struct vwork *vwork = container_of(work, struct vwork, work);
+   struct mm_struct *mm;
+
+   mm = vwork-mm;
+   down_write(mm-mmap_sem);
+   mm-locked_vm += vwork-npage;
+   up_write(mm-mmap_sem);
+   mmput(mm);
+   kfree(vwork);
+}
+
+static void vfio_lock_acct(long npage)
+{
+   struct vwork *vwork;
+   struct mm_struct *mm;
+
+   if (!current-mm)
+   return; /* process exited */
+
+   if (down_write_trylock(current-mm-mmap_sem)) {
+   current-mm-locked_vm += npage;
+   up_write(current-mm-mmap_sem);
+   return;
+   }
+
+   /*
+* Couldn't get mmap_sem lock, so must setup to update
+* mm-locked_vm later. If locked_vm were atomic, we
+* wouldn't need this silliness
+*/
+   vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
+   if (!vwork)
+   return;
+   mm = get_task_mm(current);
+   if (!mm) {
+   kfree(vwork);
+   return;
+   }
+   INIT_WORK(vwork-work, vfio_lock_acct_bg);
+   vwork-mm = mm;
+   vwork-npage = npage;
+   schedule_work(vwork-work);
+}
+
+/*
+ * Some mappings aren't backed by a struct page, for example an mmap'd
+ * MMIO range for our own or another device.  These use a different
+ * pfn conversion and shouldn't be tracked as locked pages.
+ */
+static bool is_invalid_reserved_pfn(unsigned long pfn)
+{
+   if (pfn_valid(pfn)) {
+   bool reserved;
+   struct page *tail = pfn_to_page(pfn);
+   struct page *head = compound_trans_head(tail);
+   reserved = !!(PageReserved(head));
+   if (head != tail) {
+   /*
+* head is not a dangling pointer
+* (compound_trans_head takes care of that)
+* but the hugepage may have been split
+* from under us (and we may not hold a
+* reference count on the head page so it can
+* be reused before we run PageReferenced), so
+* we've to check PageTail before returning
+* what we just read.
+*/
+   smp_rmb();
+   if (PageTail(tail))
+   return reserved;
+   }
+   return PageReserved(tail);
+   }
+
+   return true;
+}
+
+static int put_pfn(unsigned long pfn, int prot)
+{
+   if (!is_invalid_reserved_pfn(pfn)) {
+   struct page *page = pfn_to_page(pfn);
+   if (prot  IOMMU_WRITE)
+   SetPageDirty(page);
+   put_page(page);
+   return 1;
+   }
+   return 0;
+}
+
+/* Unmap DMA region */
+static long __vfio_dma_do_unmap(struct vfio_iommu *iommu, dma_addr_t 

[PATCH v2 5/5] vfio: VFIO core Kconfig and Makefile

2012-01-23 Thread Alex Williamson
Enable the base code.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 MAINTAINERS   |8 
 drivers/Kconfig   |2 ++
 drivers/Makefile  |1 +
 drivers/vfio/Kconfig  |8 
 drivers/vfio/Makefile |3 +++
 5 files changed, 22 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vfio/Kconfig
 create mode 100644 drivers/vfio/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index df8cb66..2f3a5c8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7129,6 +7129,14 @@ S:   Maintained
 F: Documentation/filesystems/vfat.txt
 F: fs/fat/
 
+VFIO DRIVER
+M: Alex Williamson alex.william...@redhat.com
+L: kvm@vger.kernel.org
+S: Maintained
+F: Documentation/vfio.txt
+F: drivers/vfio/
+F: include/linux/vfio.h
+
 VIDEOBUF2 FRAMEWORK
 M: Pawel Osciak pa...@osciak.com
 M: Marek Szyprowski m.szyprow...@samsung.com
diff --git a/drivers/Kconfig b/drivers/Kconfig
index d5138e6..f168bf3 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -114,6 +114,8 @@ source drivers/auxdisplay/Kconfig
 
 source drivers/uio/Kconfig
 
+source drivers/vfio/Kconfig
+
 source drivers/vlynq/Kconfig
 
 source drivers/virtio/Kconfig
diff --git a/drivers/Makefile b/drivers/Makefile
index 71a1f16..6be03a1 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_ATM) += atm/
 obj-$(CONFIG_FUSION)   += message/
 obj-y  += firewire/
 obj-$(CONFIG_UIO)  += uio/
+obj-$(CONFIG_VFIO) += vfio/
 obj-y  += cdrom/
 obj-y  += auxdisplay/
 obj-$(CONFIG_PCCARD)   += pcmcia/
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
new file mode 100644
index 000..9acb1e7
--- /dev/null
+++ b/drivers/vfio/Kconfig
@@ -0,0 +1,8 @@
+menuconfig VFIO
+   tristate VFIO Non-Privileged userspace driver framework
+   depends on IOMMU_API
+   help
+ VFIO provides a framework for secure userspace device drivers.
+ See Documentation/vfio.txt for more details.
+
+ If you don't know what to do here, say N.
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
new file mode 100644
index 000..088faf1
--- /dev/null
+++ b/drivers/vfio/Makefile
@@ -0,0 +1,3 @@
+vfio-y := vfio_main.o vfio_iommu.o
+
+obj-$(CONFIG_VFIO) := vfio.o

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Tuesday 24

2012-01-23 Thread Markus Armbruster
Please send in any agenda items you are interested in covering.

Cheers,

Markus
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/20] [PULL] qemu-kvm.git uq/master queue

2012-01-23 Thread Anthony Liguori

On 01/20/2012 11:26 AM, Marcelo Tosatti wrote:

The following changes since commit 8c4ec5c0269bda18bb777a64b2008088d1c632dc:

   pxa2xx_keypad: fix unbalanced parenthesis. (2012-01-17 02:14:42 +0100)

are available in the git repository at:
   git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master


Applied.  Thanks.

Regards,

Anthony Liguori



Jan Kiszka (18):
   msi: Generalize msix_supported to msi_supported
   kvm: Move kvmclock into hw/kvm folder
   apic: Stop timer on reset
   apic: Inject external NMI events via LINT1
   apic: Introduce apic_report_irq_delivered
   apic: Factor out base class for KVM reuse
   apic: Open-code timer save/restore
   i8259: Completely privatize PicState
   i8259: Factor out base class for KVM reuse
   ioapic: Drop post-load irr initialization
   ioapic: Factor out base class for KVM reuse
   memory: Introduce memory_region_init_reservation
   kvm: Introduce core services for in-kernel irqchip support
   kvm: x86: Establish IRQ0 override control
   kvm: x86: Add user space part for in-kernel APIC
   kvm: x86: Add user space part for in-kernel i8259
   kvm: x86: Add user space part for in-kernel IOAPIC
   kvm: Activate in-kernel irqchip support

Vadim Rozenfeld (2):
   hyper-v: introduce Hyper-V support infrastructure.
   hyper-v: initialize Hyper-V CPUID leaves.

  Makefile.objs  |2 +-
  Makefile.target|8 +-
  configure  |1 +
  cpus.c |6 +-
  hw/apic.c  |  356 ++--
  hw/apic.h  |1 +
  hw/apic_common.c   |  302 ++
  hw/apic_internal.h |  115 +
  hw/i8259.c |  163 --
  hw/i8259_common.c  |  147 +
  hw/i8259_internal.h|   76 +
  hw/ioapic.c|  142 ++--
  hw/ioapic_common.c |  104 
  hw/ioapic_internal.h   |   97 +++
  hw/kvm/apic.c  |  138 
  hw/{kvmclock.c =  kvm/clock.c} |4 +-
  hw/{kvmclock.h =  kvm/clock.h} |0
  hw/kvm/i8259.c |  128 ++
  hw/kvm/ioapic.c|  114 +
  hw/msi.c   |8 +
  hw/msi.h   |2 +
  hw/msix.c  |9 +-
  hw/msix.h  |2 -
  hw/pc.c|   20 ++-
  hw/pc.h|8 +-
  hw/pc_piix.c   |   69 +++-
  kvm-all.c  |  154 +
  kvm-stub.c |5 +
  kvm.h  |   14 ++
  memory.c   |   36 
  memory.h   |   16 ++
  qemu-config.c  |4 +
  qemu-options.hx|5 +-
  sysemu.h   |1 -
  target-i386/cpuid.c|   14 ++
  target-i386/hyperv.c   |   64 +++
  target-i386/hyperv.h   |   43 +
  target-i386/kvm.c  |  114 +-
  trace-events   |2 +-
  vl.c   |1 -
  40 files changed, 1902 insertions(+), 593 deletions(-)
  create mode 100644 hw/apic_common.c
  create mode 100644 hw/apic_internal.h
  create mode 100644 hw/i8259_common.c
  create mode 100644 hw/i8259_internal.h
  create mode 100644 hw/ioapic_common.c
  create mode 100644 hw/ioapic_internal.h
  create mode 100644 hw/kvm/apic.c
  rename hw/{kvmclock.c =  kvm/clock.c} (98%)
  rename hw/{kvmclock.h =  kvm/clock.h} (100%)
  create mode 100644 hw/kvm/i8259.c
  create mode 100644 hw/kvm/ioapic.c
  create mode 100644 target-i386/hyperv.c
  create mode 100644 target-i386/hyperv.h




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Factor out kvm_vcpu_kick to arch-generic code

2012-01-23 Thread Marcelo Tosatti
On Thu, Jan 19, 2012 at 10:22:41PM -0500, Christoffer Dall wrote:
 The kvm_vcpu_kick function performs roughly the same funcitonality on
 most all architectures, so we shouldn't have separate copies.
 
 PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch
 structure and to accomodate this special need a
 __KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function
 kvm_arch_vcpu_wq have been defined. For all other architectures this
 is a generic inline that just returns vcpu-wq;
 
 This patch applies to Linus' tree on the Linux 3.3-rc1 tag.
 
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  arch/ia64/include/asm/kvm_host.h|1 +
  arch/ia64/kvm/kvm-ia64.c|   15 ---
  arch/powerpc/include/asm/kvm_host.h |6 ++
  arch/powerpc/kvm/powerpc.c  |   12 ++--
  arch/x86/kvm/x86.c  |   17 -
  include/linux/kvm_host.h|8 
  virt/kvm/kvm_main.c |   23 +++
  7 files changed, 40 insertions(+), 42 deletions(-)
 
 diff --git a/arch/ia64/include/asm/kvm_host.h 
 b/arch/ia64/include/asm/kvm_host.h
 index 2689ee5..06a5e91 100644
 --- a/arch/ia64/include/asm/kvm_host.h
 +++ b/arch/ia64/include/asm/kvm_host.h
 @@ -365,6 +365,7 @@ struct thash_cb {
  };
  
  struct kvm_vcpu_stat {
 + u32 halt_wakeup;
  };
  
  struct kvm_vcpu_arch {
 diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
 index 43f4c92..f22ffb6 100644
 --- a/arch/ia64/kvm/kvm-ia64.c
 +++ b/arch/ia64/kvm/kvm-ia64.c
 @@ -1851,21 +1851,6 @@ void kvm_arch_hardware_unsetup(void)
  {
  }
  
 -void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 -{
 - int me;
 - int cpu = vcpu-cpu;
 -
 - if (waitqueue_active(vcpu-wq))
 - wake_up_interruptible(vcpu-wq);
 -
 - me = get_cpu();
 - if (cpu != me  (unsigned) cpu  nr_cpu_ids  cpu_online(cpu))
 - if (!test_and_set_bit(KVM_REQ_KICK, vcpu-requests))
 - smp_send_reschedule(cpu);
 - put_cpu();
 -}
 -
  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
  {
   return __apic_accept_irq(vcpu, irq-vector);
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index bf8af5d..b687444 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -438,4 +438,10 @@ struct kvm_vcpu_arch {
  #define KVMPPC_VCPU_BUSY_IN_HOST 1
  #define KVMPPC_VCPU_RUNNABLE 2
  
 +#define __KVM_HAVE_ARCH_VCPU_GET_WQ 1
 +static inline wait_queue_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
 +{
 + return vcpu-arch.wqp;
 +}
 +
  #endif /* __POWERPC_KVM_HOST_H__ */
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 607fbdf..30cd621 100644
 --- a/arch/powerpc/kvm/powerpc.c
 +++ b/arch/powerpc/kvm/powerpc.c
 @@ -311,10 +311,7 @@ static void kvmppc_decrementer_func(unsigned long data)
  
   kvmppc_core_queue_dec(vcpu);
  
 - if (waitqueue_active(vcpu-arch.wqp)) {
 - wake_up_interruptible(vcpu-arch.wqp);
 - vcpu-stat.halt_wakeup++;
 - }
 + kvm_vcpu_kick(vcpu);
  }
  
  /*
 @@ -572,12 +569,7 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, 
 struct kvm_interrupt *irq)
  
   kvmppc_core_queue_external(vcpu, irq);
  
 - if (waitqueue_active(vcpu-arch.wqp)) {
 - wake_up_interruptible(vcpu-arch.wqp);
 - vcpu-stat.halt_wakeup++;
 - } else if (vcpu-cpu != -1) {
 - smp_send_reschedule(vcpu-cpu);
 - }
 + kvm_vcpu_kick(vcpu);
  
   return 0;
  }
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index c38efd7..6de0af8 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -6688,23 +6688,6 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
kvm_cpu_has_interrupt(vcpu));
  }
  
 -void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 -{
 - int me;
 - int cpu = vcpu-cpu;
 -
 - if (waitqueue_active(vcpu-wq)) {
 - wake_up_interruptible(vcpu-wq);
 - ++vcpu-stat.halt_wakeup;
 - }
 -
 - me = get_cpu();
 - if (cpu != me  (unsigned)cpu  nr_cpu_ids  cpu_online(cpu))
 - if (kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE)
 - smp_send_reschedule(cpu);
 - put_cpu();
 -}
 -
  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
  {
   return kvm_x86_ops-interrupt_allowed(vcpu);
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index d526231..301ae34 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -407,6 +407,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct 
 kvm_memory_slot *memslot,
gfn_t gfn);
  
  void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 +void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
  void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
  void kvm_resched(struct kvm_vcpu *vcpu);
  void kvm_load_guest_fpu(struct 

Re: [PATCH 0/4] KVM: Decouple rmap_pde from lpage_info write_count

2012-01-23 Thread Marcelo Tosatti
On Mon, Jan 23, 2012 at 07:42:04PM +0900, Takuya Yoshikawa wrote:
 The last one is an RFC patch:
 
 I think it is better to refactor the rmap things, if needed, before
 other architectures than x86 starts large pages support.
 
   Takuya

Looks good to me.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Continuous reboots on qemu-kvm master

2012-01-23 Thread Marcelo Tosatti
On Mon, Jan 23, 2012 at 05:12:07PM +0100, erik.r...@rdsoftware.de wrote:
 Hi all,
 
 I get continuous reboots on my guest system, including these dmesg entries:
 
 [   31.770538] device tap0 entered promiscuous mode
 [   31.770554] br0: port 2(tap0) entering learning state
 [   39.259921] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74
 [   39.259936] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
 [   39.259946] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
 [   44.870691] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74
 [   44.870801] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
 [   44.870901] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
 [   46.727081] br0: port 2(tap0) entering forwarding state
 [   50.481469] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74
 [   50.481583] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
 [   50.481685] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
 [   55.827950] br0: port 2(tap0) entering disabled state
 [   55.828110] device tap0 left promiscuous mode
 [   55.828200] br0: port 2(tap0) entering disabled state
 
 
 My ./configure is:
  ./configure --prefix= --target-list=x86_64-softmmu --disable-vnc-png
 --disable-vnc-jpeg --disable-vnc-tls --disable-vnc-sasl --audio-card-list=
 --audio-drv-list= --enable-sdl --disable-xen --disable-brlapi
 --disable-bluez --disable-nptl --disable-curl --disable-guest-agent
 --disable-guest-base --disable-werror --disable-attr
 
 My qemu cmdline is:
 /usr/X11R6/bin/qemu-system-x86_64 -serial /dev/ttyS2 -readconfig
 /etc/ich9-ehci-uhci.cfg -device usb-host,bus=ehci.0 -device usb-tablet
 -drive file=/dev/sda2,cache=off -m 1536 -net nic -net
 tap,script=/etc/qemu-ifup -no-acpi -monitor stdio -L /usr/X11R6/share/qemu
 -boot c -localtime -enable-kvm
 
 Was fine with qemu-kvm-1.0 and the same options!
 
 Best regards,
 
 Erik

Erik, 

Can you bisect to find the culprit, please?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Continuous reboots on qemu-kvm master

2012-01-23 Thread Erik Rull

Marcelo Tosatti wrote:

On Mon, Jan 23, 2012 at 05:12:07PM +0100, erik.r...@rdsoftware.de wrote:

Hi all,

I get continuous reboots on my guest system, including these dmesg entries:

[   31.770538] device tap0 entered promiscuous mode
[   31.770554] br0: port 2(tap0) entering learning state
[   39.259921] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74
[   39.259936] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
[   39.259946] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
[   44.870691] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74
[   44.870801] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
[   44.870901] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
[   46.727081] br0: port 2(tap0) entering forwarding state
[   50.481469] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74
[   50.481583] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
[   50.481685] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0
[   55.827950] br0: port 2(tap0) entering disabled state
[   55.828110] device tap0 left promiscuous mode
[   55.828200] br0: port 2(tap0) entering disabled state


My ./configure is:
  ./configure --prefix= --target-list=x86_64-softmmu --disable-vnc-png
--disable-vnc-jpeg --disable-vnc-tls --disable-vnc-sasl --audio-card-list=
--audio-drv-list= --enable-sdl --disable-xen --disable-brlapi
--disable-bluez --disable-nptl --disable-curl --disable-guest-agent
--disable-guest-base --disable-werror --disable-attr

My qemu cmdline is:
/usr/X11R6/bin/qemu-system-x86_64 -serial /dev/ttyS2 -readconfig
/etc/ich9-ehci-uhci.cfg -device usb-host,bus=ehci.0 -device usb-tablet
-drive file=/dev/sda2,cache=off -m 1536 -net nic -net
tap,script=/etc/qemu-ifup -no-acpi -monitor stdio -L /usr/X11R6/share/qemu
-boot c -localtime -enable-kvm

Was fine with qemu-kvm-1.0 and the same options!

Best regards,

Erik


Erik,

Can you bisect to find the culprit, please?



I will try to do that. Currently I have to find another issue between 
0.15.0 and 1.0 :-) After having found that, I will continue bisecting here :-)


Best regards,

Erik
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: Resolve unneeded diffs to upstream in pc-bios

2012-01-23 Thread Jan Kiszka
None of those files have any meaning for today's qemu-kvm.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Note that I removed the binary patch to delete pc-bios/openbios-sparc.
It seemms to have caused troubles getting this on the list.

 pc-bios/bios-vista.diff |   17 -
 pc-bios/bochs-manifest  |   24 
 pc-bios/openbios-sparc  |  Bin 506966 - 0 bytes
 3 files changed, 0 insertions(+), 41 deletions(-)
 delete mode 100644 pc-bios/bios-vista.diff
 delete mode 100644 pc-bios/bochs-manifest
 delete mode 100644 pc-bios/openbios-sparc

diff --git a/pc-bios/bios-vista.diff b/pc-bios/bios-vista.diff
deleted file mode 100644
index 684a310..000
--- a/pc-bios/bios-vista.diff
+++ /dev/null
@@ -1,17 +0,0 @@
-Index: rombios32.c
-===
-RCS file: /cvsroot/bochs/bochs/bios/rombios32.c,v
-retrieving revision 1.9
-diff -u -w -r1.9 rombios32.c
 rombios32.c 20 Feb 2007 09:36:55 -  1.9
-+++ rombios32.c 2 May 2007 06:07:31 -
-@@ -1191,7 +1191,7 @@
- {
- memcpy(h-signature, sig, 4);
- h-length = cpu_to_le32(len);
--h-revision = 0;
-+h-revision = 1;
- #ifdef BX_QEMU
- memcpy(h-oem_id, QEMU  , 6);
- memcpy(h-oem_table_id, QEMU, 4);
-
diff --git a/pc-bios/bochs-manifest b/pc-bios/bochs-manifest
deleted file mode 100644
index 1b25aa4..000
--- a/pc-bios/bochs-manifest
+++ /dev/null
@@ -1,24 +0,0 @@
-.cvsignore   1.2
-BIOS-bochs-latest1.145
-BIOS-bochs-legacy1.9
-Makefile.in  1.26
-VGABIOS-elpin-2.40   1.4
-VGABIOS-elpin-LICENSE1.3
-VGABIOS-lgpl-README  1.9
-VGABIOS-lgpl-latest  1.13
-VGABIOS-lgpl-latest-cirrus   1.5
-VGABIOS-lgpl-latest-cirrus-debug 1.5
-VGABIOS-lgpl-latest-debug1.9
-acpi-dsdt.dsl1.1
-acpi-dsdt.hex1.1
-apmbios.S1.5
-bios_usage   1.1
-biossums.c   1.3
-makesym.perl 1.1
-notes1.1
-rombios.c1.178
-rombios.h1.4
-rombios32.c  1.9
-rombios32.ld 1.1
-rombios32start.S 1.3
-usage.cc 1.4
diff --git a/pc-bios/openbios-sparc b/pc-bios/openbios-sparc
deleted file mode 100644
index 
7a729aa81ba39b3ed037ac7fad1db4616818738b..
GIT binary patch

[ removed to shrink posting size ]

-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2012-01-23 Thread Sergei Trofimovich
subscribe kvm
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] qemu-kvm: Remove icache flush from cpu_physical_memory_rw

2012-01-23 Thread Scott Wood
On 01/19/2012 12:04 PM, Jan Kiszka wrote:
 On 2012-01-19 18:54, Marcelo Tosatti wrote:
 On Thu, Jan 19, 2012 at 01:39:24PM +0100, Jan Kiszka wrote:
 This is at best a PPC topi but according to [1] even there unneeded. In
 any case, remove this diff to upstream, it should be handled there if
 actually needed.

 [1] ? 

 
 Oops.
 
 8-
 
 This is at best a PPC topi but according to [1] even there unneeded. In
 any case, remove this diff to upstream, it should be handled there if
 actually needed.
 
 [1] http://thread.gmane.org/gmane.comp.emulators.qemu/119022/focus=119086

That says that it's unneeded on (some?) IBM Power systems.  We need it
on Freescale chips.  I submitted an upstream-QEMU patch to do this flush
(referenced in that thread, still not applied) because I was seeing
cache problems when loading images.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Videos for kvm forum 2010

2012-01-23 Thread Nick H
Hello All,

Non-development question, apologies if I am posting to the wrong list,
but I cannot seem to find linux kvm forum 2010 videos at the following
link:

http://www.linux-kvm.org/page/KVM_Forum_2010

Is there some place else where they might be present ?

Nick
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2012-01-23 Thread Gabe Black
Hi, I think I've tracked down the bug that causes
KVM_GET_SUPPORTED_CPUID failed: Argument list too long errors when
using the kvm tool. Basically, this (possibly squished) code seems to
be to blame:

case 0xd: {
int i;

entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
for (i = 1; *nent  maxnent  i  64; ++i) {
if (entry[i].eax == 0)
continue;
do_cpuid_1_ent(entry[i], function, i);
entry[i].flags |=
  KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
++*nent;
}
break;
}

You can see there's a check whether entry[i].eax is 0, but it isn't
until the next line that entry[i] is actually filled in. That means
that whether or not an entry is filled in for the 0xd function is
essentially random, and that can lead to the loss of valid entries. It
also means that nent may be incremented too often, and since all 64
entries are iterated over, that can fill up the available storage and
cause that error.

I tested my theory by commenting out the if (100% failure rate) and
moving it after do_cpuid_1_ent (100% success rate). Since this is a
non-deterministic failure that isn't really conclusive, but I'm fairly
confident my fix is correct. I don't know exactly what your procedure
is for submitting patches, but one is attached.

Gabe
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 77c9d86..35d7ae0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2414,9 +2414,9 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 		entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
 		for (i = 1; *nent  maxnent  i  64; ++i) {
+			do_cpuid_1_ent(entry[i], function, i);
 			if (entry[i].eax == 0)
 continue;
-			do_cpuid_1_ent(entry[i], function, i);
 			entry[i].flags |=
 			   KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
 			++*nent;


Fix for bug that causes KVM_GET_SUPPORTED_CPUID failed errors.

2012-01-23 Thread Gabe Black
Sorry, forgot to add a subject.

Gabe

On Mon, Jan 23, 2012 at 9:18 PM, Gabe Black gabebl...@google.com wrote:
 Hi, I think I've tracked down the bug that causes
 KVM_GET_SUPPORTED_CPUID failed: Argument list too long errors when
 using the kvm tool. Basically, this (possibly squished) code seems to
 be to blame:

 case 0xd: {
 int i;

 entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
 for (i = 1; *nent  maxnent  i  64; ++i) {
 if (entry[i].eax == 0)
 continue;
 do_cpuid_1_ent(entry[i], function, i);
 entry[i].flags |=
      KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
 ++*nent;
 }
 break;
 }

 You can see there's a check whether entry[i].eax is 0, but it isn't
 until the next line that entry[i] is actually filled in. That means
 that whether or not an entry is filled in for the 0xd function is
 essentially random, and that can lead to the loss of valid entries. It
 also means that nent may be incremented too often, and since all 64
 entries are iterated over, that can fill up the available storage and
 cause that error.

 I tested my theory by commenting out the if (100% failure rate) and
 moving it after do_cpuid_1_ent (100% success rate). Since this is a
 non-deterministic failure that isn't really conclusive, but I'm fairly
 confident my fix is correct. I don't know exactly what your procedure
 is for submitting patches, but one is attached.

 Gabe
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fix for bug that causes KVM_GET_SUPPORTED_CPUID failed errors.

2012-01-23 Thread Sasha Levin
The GET_SUPPORTED_CPUID bug has been fixed and shouldn't be happening
from v3.2 onwards.

Do you still see the issue in older versions?

On Mon, 2012-01-23 at 21:20 -0800, Gabe Black wrote:
 Sorry, forgot to add a subject.
 
 Gabe
 
 On Mon, Jan 23, 2012 at 9:18 PM, Gabe Black gabebl...@google.com wrote:
  Hi, I think I've tracked down the bug that causes
  KVM_GET_SUPPORTED_CPUID failed: Argument list too long errors when
  using the kvm tool. Basically, this (possibly squished) code seems to
  be to blame:
 
  case 0xd: {
  int i;
 
  entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
  for (i = 1; *nent  maxnent  i  64; ++i) {
  if (entry[i].eax == 0)
  continue;
  do_cpuid_1_ent(entry[i], function, i);
  entry[i].flags |=
   KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
  ++*nent;
  }
  break;
  }
 
  You can see there's a check whether entry[i].eax is 0, but it isn't
  until the next line that entry[i] is actually filled in. That means
  that whether or not an entry is filled in for the 0xd function is
  essentially random, and that can lead to the loss of valid entries. It
  also means that nent may be incremented too often, and since all 64
  entries are iterated over, that can fill up the available storage and
  cause that error.
 
  I tested my theory by commenting out the if (100% failure rate) and
  moving it after do_cpuid_1_ent (100% success rate). Since this is a
  non-deterministic failure that isn't really conclusive, but I'm fairly
  confident my fix is correct. I don't know exactly what your procedure
  is for submitting patches, but one is attached.
 
  Gabe
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] KVM: Decouple rmap_pde from lpage_info write_count

2012-01-23 Thread Takuya Yoshikawa
The last one is an RFC patch:

I think it is better to refactor the rmap things, if needed, before
other architectures than x86 starts large pages support.

Takuya

 arch/ia64/kvm/kvm-ia64.c|8 
 arch/powerpc/kvm/book3s_64_mmu_hv.c |6 +++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |4 ++--
 arch/x86/kvm/mmu.c  |   24 ++--
 arch/x86/kvm/mmu_audit.c|4 +---
 arch/x86/kvm/x86.c  |4 ++--
 include/linux/kvm_host.h|   10 --
 virt/kvm/kvm_main.c |   29 +
 8 files changed, 47 insertions(+), 42 deletions(-)

-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: MMU: Use gfn_to_rmap() in audit_write_protection()

2012-01-23 Thread Takuya Yoshikawa
We want to eliminate direct access to the rmap array.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/x86/kvm/mmu_audit.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 6eabae3..e62fa4f 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -190,15 +190,13 @@ static void check_mappings_rmap(struct kvm *kvm, struct 
kvm_mmu_page *sp)
 
 static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-   struct kvm_memory_slot *slot;
unsigned long *rmapp;
u64 *spte;
 
if (sp-role.direct || sp-unsync || sp-role.invalid)
return;
 
-   slot = gfn_to_memslot(kvm, sp-gfn);
-   rmapp = slot-rmap[sp-gfn - slot-base_gfn];
+   rmapp = gfn_to_rmap(kvm, sp-gfn, PT_PAGE_TABLE_LEVEL);
 
spte = rmap_next(rmapp, NULL);
while (spte) {
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: MMU: Use __gfn_to_rmap() in kvm_handle_hva()

2012-01-23 Thread Takuya Yoshikawa
We can hide the implementation details and treat every level uniformly.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/x86/kvm/mmu.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 844fcce..0e82d9d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1133,14 +1133,14 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned 
long hva,
gfn_t gfn_offset = (hva - start)  PAGE_SHIFT;
gfn_t gfn = memslot-base_gfn + gfn_offset;
 
-   ret = handler(kvm, memslot-rmap[gfn_offset], data);
+   ret = 0;
 
-   for (j = 0; j  KVM_NR_PAGE_SIZES - 1; ++j) {
-   struct kvm_lpage_info *linfo;
+   for (j = PT_PAGE_TABLE_LEVEL;
+j  PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) {
+   unsigned long *rmapp;
 
-   linfo = lpage_info_slot(gfn, memslot,
-   PT_DIRECTORY_LEVEL + j);
-   ret |= handler(kvm, linfo-rmap_pde, data);
+   rmapp = __gfn_to_rmap(gfn, j, memslot);
+   ret |= handler(kvm, rmapp, data);
}
trace_kvm_age_page(hva, memslot, ret);
retval |= ret;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/4] KVM: Decouple rmap_pde from lpage_info write_count

2012-01-23 Thread Takuya Yoshikawa
Though we have one rmap array for every level, those for large pages,
called rmap_pde, are coupled with write_count information and constitute
lpage_info arrays.

To hide this implementation details, we are now using __gfn_to_rmap()
which includes likely(level == PT_PAGE_TABLE_LEVEL) heuristics;  this
is not good because we know that it always fails for higher levels.

Furthermore, when we traverse rmap arrays to write protect pages during
dirty logging, the current layout reduces the locality of their elements
by placing write_count next to rmap_pde in lpage_info.

This patch mitigates this problem by decoupling rmap_pde from lpage_info
write_count and making the rmap array two dimensional which holds the
old rmap_pde elements in it.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/ia64/kvm/kvm-ia64.c|8 
 arch/powerpc/kvm/book3s_64_mmu_hv.c |6 +++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |4 ++--
 arch/x86/kvm/mmu.c  |9 +++--
 arch/x86/kvm/x86.c  |4 ++--
 include/linux/kvm_host.h|3 +--
 virt/kvm/kvm_main.c |   25 -
 7 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 8ca7261..b17eaa1 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1376,8 +1376,8 @@ static void kvm_release_vm_pages(struct kvm *kvm)
kvm_for_each_memslot(memslot, slots) {
base_gfn = memslot-base_gfn;
for (j = 0; j  memslot-npages; j++) {
-   if (memslot-rmap[j])
-   put_page((struct page *)memslot-rmap[j]);
+   if (memslot-rmap[0][j])
+   put_page((struct page *)memslot-rmap[0][j]);
}
}
 }
@@ -1591,12 +1591,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
kvm_set_pmt_entry(kvm, base_gfn + i,
pfn  PAGE_SHIFT,
_PAGE_AR_RWX | _PAGE_MA_WB);
-   memslot-rmap[i] = (unsigned long)pfn_to_page(pfn);
+   memslot-rmap[0][i] = (unsigned long)pfn_to_page(pfn);
} else {
kvm_set_pmt_entry(kvm, base_gfn + i,
GPFN_PHYS_MMIO | (pfn  PAGE_SHIFT),
_PAGE_MA_UC);
-   memslot-rmap[i] = 0;
+   memslot-rmap[0][i] = 0;
}
}
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 783cd35..81f9036 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -631,7 +631,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
goto out_unlock;
hpte[0] = (hpte[0]  ~HPTE_V_ABSENT) | HPTE_V_VALID;
 
-   rmap = memslot-rmap[gfn - memslot-base_gfn];
+   rmap = memslot-rmap[0][gfn - memslot-base_gfn];
lock_rmap(rmap);
 
/* Check if we might have been invalidated; let the guest retry if so */
@@ -693,7 +693,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long 
hva,
if (hva = start  hva  end) {
gfn_t gfn_offset = (hva - start)  PAGE_SHIFT;
 
-   ret = handler(kvm, memslot-rmap[gfn_offset],
+   ret = handler(kvm, memslot-rmap[0][gfn_offset],
  memslot-base_gfn + gfn_offset);
retval |= ret;
}
@@ -928,7 +928,7 @@ long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct 
kvm_memory_slot *memslot)
unsigned long *rmapp, *map;
 
preempt_disable();
-   rmapp = memslot-rmap;
+   rmapp = memslot-rmap[0];
map = memslot-dirty_bitmap;
for (i = 0; i  memslot-npages; ++i) {
if (kvm_test_clear_dirty(kvm, rmapp))
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 5f3c60b..4df9b4a 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -103,7 +103,7 @@ static void remove_revmap_chain(struct kvm *kvm, long 
pte_index,
if (!memslot || (memslot-flags  KVM_MEMSLOT_INVALID))
return;
 
-   rmap = real_vmalloc_addr(memslot-rmap[gfn - memslot-base_gfn]);
+   rmap = real_vmalloc_addr(memslot-rmap[0][gfn - memslot-base_gfn]);
lock_rmap(rmap);
 
head = *rmap  KVMPPC_RMAP_INDEX;
@@ -199,7 +199,7 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long 
flags,
if (!slot_is_aligned(memslot, psize))
return H_PARAMETER;
slot_fn = gfn - memslot-base_gfn;
-   rmap = memslot-rmap[slot_fn];
+   rmap = memslot-rmap[0][slot_fn];
 
if 

Re: [PATCH] KVM: Factor out kvm_vcpu_kick to arch-generic code

2012-01-23 Thread Marcelo Tosatti
On Thu, Jan 19, 2012 at 10:22:41PM -0500, Christoffer Dall wrote:
 The kvm_vcpu_kick function performs roughly the same funcitonality on
 most all architectures, so we shouldn't have separate copies.
 
 PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch
 structure and to accomodate this special need a
 __KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function
 kvm_arch_vcpu_wq have been defined. For all other architectures this
 is a generic inline that just returns vcpu-wq;
 
 This patch applies to Linus' tree on the Linux 3.3-rc1 tag.
 
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  arch/ia64/include/asm/kvm_host.h|1 +
  arch/ia64/kvm/kvm-ia64.c|   15 ---
  arch/powerpc/include/asm/kvm_host.h |6 ++
  arch/powerpc/kvm/powerpc.c  |   12 ++--
  arch/x86/kvm/x86.c  |   17 -
  include/linux/kvm_host.h|8 
  virt/kvm/kvm_main.c |   23 +++
  7 files changed, 40 insertions(+), 42 deletions(-)
 
 diff --git a/arch/ia64/include/asm/kvm_host.h 
 b/arch/ia64/include/asm/kvm_host.h
 index 2689ee5..06a5e91 100644
 --- a/arch/ia64/include/asm/kvm_host.h
 +++ b/arch/ia64/include/asm/kvm_host.h
 @@ -365,6 +365,7 @@ struct thash_cb {
  };
  
  struct kvm_vcpu_stat {
 + u32 halt_wakeup;
  };
  
  struct kvm_vcpu_arch {
 diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
 index 43f4c92..f22ffb6 100644
 --- a/arch/ia64/kvm/kvm-ia64.c
 +++ b/arch/ia64/kvm/kvm-ia64.c
 @@ -1851,21 +1851,6 @@ void kvm_arch_hardware_unsetup(void)
  {
  }
  
 -void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 -{
 - int me;
 - int cpu = vcpu-cpu;
 -
 - if (waitqueue_active(vcpu-wq))
 - wake_up_interruptible(vcpu-wq);
 -
 - me = get_cpu();
 - if (cpu != me  (unsigned) cpu  nr_cpu_ids  cpu_online(cpu))
 - if (!test_and_set_bit(KVM_REQ_KICK, vcpu-requests))
 - smp_send_reschedule(cpu);
 - put_cpu();
 -}
 -
  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
  {
   return __apic_accept_irq(vcpu, irq-vector);
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index bf8af5d..b687444 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -438,4 +438,10 @@ struct kvm_vcpu_arch {
  #define KVMPPC_VCPU_BUSY_IN_HOST 1
  #define KVMPPC_VCPU_RUNNABLE 2
  
 +#define __KVM_HAVE_ARCH_VCPU_GET_WQ 1
 +static inline wait_queue_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
 +{
 + return vcpu-arch.wqp;
 +}
 +
  #endif /* __POWERPC_KVM_HOST_H__ */
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 607fbdf..30cd621 100644
 --- a/arch/powerpc/kvm/powerpc.c
 +++ b/arch/powerpc/kvm/powerpc.c
 @@ -311,10 +311,7 @@ static void kvmppc_decrementer_func(unsigned long data)
  
   kvmppc_core_queue_dec(vcpu);
  
 - if (waitqueue_active(vcpu-arch.wqp)) {
 - wake_up_interruptible(vcpu-arch.wqp);
 - vcpu-stat.halt_wakeup++;
 - }
 + kvm_vcpu_kick(vcpu);
  }
  
  /*
 @@ -572,12 +569,7 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, 
 struct kvm_interrupt *irq)
  
   kvmppc_core_queue_external(vcpu, irq);
  
 - if (waitqueue_active(vcpu-arch.wqp)) {
 - wake_up_interruptible(vcpu-arch.wqp);
 - vcpu-stat.halt_wakeup++;
 - } else if (vcpu-cpu != -1) {
 - smp_send_reschedule(vcpu-cpu);
 - }
 + kvm_vcpu_kick(vcpu);
  
   return 0;
  }
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index c38efd7..6de0af8 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -6688,23 +6688,6 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
kvm_cpu_has_interrupt(vcpu));
  }
  
 -void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 -{
 - int me;
 - int cpu = vcpu-cpu;
 -
 - if (waitqueue_active(vcpu-wq)) {
 - wake_up_interruptible(vcpu-wq);
 - ++vcpu-stat.halt_wakeup;
 - }
 -
 - me = get_cpu();
 - if (cpu != me  (unsigned)cpu  nr_cpu_ids  cpu_online(cpu))
 - if (kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE)
 - smp_send_reschedule(cpu);
 - put_cpu();
 -}
 -
  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
  {
   return kvm_x86_ops-interrupt_allowed(vcpu);
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index d526231..301ae34 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -407,6 +407,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct 
 kvm_memory_slot *memslot,
gfn_t gfn);
  
  void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 +void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
  void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
  void kvm_resched(struct kvm_vcpu *vcpu);
  void kvm_load_guest_fpu(struct 

Re: [PATCH 0/4] KVM: Decouple rmap_pde from lpage_info write_count

2012-01-23 Thread Marcelo Tosatti
On Mon, Jan 23, 2012 at 07:42:04PM +0900, Takuya Yoshikawa wrote:
 The last one is an RFC patch:
 
 I think it is better to refactor the rmap things, if needed, before
 other architectures than x86 starts large pages support.
 
   Takuya

Looks good to me.

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html