Re: [PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-08 Thread Gleb Natapov
On Fri, Sep 06, 2013 at 08:40:26PM +1000, Alexey Kardashevskiy wrote:
 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
 them to user space which saves time on switching to user space and back.
 
 Both real and virtual modes are supported. The kernel tries to
 handle a TCE request in the real mode, if fails it passes the request
 to the virtual mode to complete the operation. If it a virtual mode
 handler fails, the request is passed to user space.
 
 The first user of this is VFIO on POWER. Trampolines to the VFIO external
 user API functions are required for this patch.
 
 This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
 number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
 of map/unmap requests. The device supports a single attribute which is
 a struct with LIOBN and IOMMU fd. When the attribute is set, the device
 establishes the connection between KVM and VFIO.
 
 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 ---
 
 Changes:
 v10:
 * all IOMMU TCE links are handled by one KVM device now
 * KVM device has its own list of TCE descriptors
 * the search-by-liobn function was extended to search through
 emulated and IOMMU lists
 
 v9:
 * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
 KVM device
 * release_spapr_tce_table() is not shared between different TCE types
 * reduced the patch size by moving KVM device bits and VFIO external API
 trampolines to separate patches
 * moved documentation from Documentation/virtual/kvm/api.txt to
 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
 v8:
 * fixed warnings from check_patch.pl
 
 2013/07/11:
 * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
 for KVM_BOOK3S_64
 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
 for this here but the next patch for hugepages support will use it more.
 
 2013/07/06:
 * added realmode arch_spin_lock to protect TCE table from races
 in real and virtual modes
 * POWERPC IOMMU API is changed to support real mode
 * iommu_take_ownership and iommu_release_ownership are protected by
 iommu_table's locks
 * VFIO external user API use rewritten
 * multiple small fixes
 
 2013/06/27:
 * tce_list page is referenced now in order to protect it from accident
 invalidation during H_PUT_TCE_INDIRECT execution
 * added use of the external user VFIO API
 
 2013/06/05:
 * changed capability number
 * changed ioctl number
 * update the doc article number
 
 2013/05/20:
 * removed get_user() from real mode handlers
 * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
 translated TCEs, tries realmode_get_page() on those and if it fails, it
 passes control over the virtual mode handler which tries to finish
 the request handling
 * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
 on a page
 * The only reason to pass the request to user mode now is when the user mode
 did not register TCE table in the kernel, in all other cases the virtual mode
 handler is expected to do the job
 ---
  .../virtual/kvm/devices/spapr_tce_iommu.txt|  40 +++
  arch/powerpc/include/asm/kvm_host.h|   8 +
  arch/powerpc/include/uapi/asm/kvm.h|   5 -
  arch/powerpc/kvm/book3s_64_vio.c   | 327 
 -
  arch/powerpc/kvm/book3s_64_vio_hv.c| 142 +
  arch/powerpc/kvm/powerpc.c |   1 +
  include/linux/kvm_host.h   |   1 +
  virt/kvm/kvm_main.c|   5 +
  8 files changed, 517 insertions(+), 12 deletions(-)
  create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
 diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
 b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 new file mode 100644
 index 000..b911945
 --- /dev/null
 +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 @@ -0,0 +1,40 @@
 +SPAPR TCE IOMMU device
 +
 +Capability: KVM_CAP_SPAPR_TCE_IOMMU
 +Architectures: powerpc
 +
 +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
 +
 +Groups:
 +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
 +  Attributes: one VFIO IOMMU fd per LIOBN, indexed by LIOBN
 +
 +This is completely made up device which provides API to link
 +logical bus number (LIOBN) and IOMMU group. The user space has
 +to create a new SPAPR TCE IOMMU device once per KVM session
 +and use set_attr to add or remove a logical bus.
 +
 +LIOBN is a PCI bus identifier from PPC64-server (sPAPR) DMA hypercalls
 +(H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE).
 +IOMMU group is a minimal isolated device set which can be passed to
 +the user space via VFIO.
 +
 +The userspace 

[PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-06 Thread Alexey Kardashevskiy
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
them to user space which saves time on switching to user space and back.

Both real and virtual modes are supported. The kernel tries to
handle a TCE request in the real mode, if fails it passes the request
to the virtual mode to complete the operation. If it a virtual mode
handler fails, the request is passed to user space.

The first user of this is VFIO on POWER. Trampolines to the VFIO external
user API functions are required for this patch.

This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
of map/unmap requests. The device supports a single attribute which is
a struct with LIOBN and IOMMU fd. When the attribute is set, the device
establishes the connection between KVM and VFIO.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

---

Changes:
v10:
* all IOMMU TCE links are handled by one KVM device now
* KVM device has its own list of TCE descriptors
* the search-by-liobn function was extended to search through
emulated and IOMMU lists

v9:
* KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
KVM device
* release_spapr_tce_table() is not shared between different TCE types
* reduced the patch size by moving KVM device bits and VFIO external API
trampolines to separate patches
* moved documentation from Documentation/virtual/kvm/api.txt to
Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

v8:
* fixed warnings from check_patch.pl

2013/07/11:
* removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
for KVM_BOOK3S_64
* kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
for this here but the next patch for hugepages support will use it more.

2013/07/06:
* added realmode arch_spin_lock to protect TCE table from races
in real and virtual modes
* POWERPC IOMMU API is changed to support real mode
* iommu_take_ownership and iommu_release_ownership are protected by
iommu_table's locks
* VFIO external user API use rewritten
* multiple small fixes

2013/06/27:
* tce_list page is referenced now in order to protect it from accident
invalidation during H_PUT_TCE_INDIRECT execution
* added use of the external user VFIO API

2013/06/05:
* changed capability number
* changed ioctl number
* update the doc article number

2013/05/20:
* removed get_user() from real mode handlers
* kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
translated TCEs, tries realmode_get_page() on those and if it fails, it
passes control over the virtual mode handler which tries to finish
the request handling
* kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
on a page
* The only reason to pass the request to user mode now is when the user mode
did not register TCE table in the kernel, in all other cases the virtual mode
handler is expected to do the job
---
 .../virtual/kvm/devices/spapr_tce_iommu.txt|  40 +++
 arch/powerpc/include/asm/kvm_host.h|   8 +
 arch/powerpc/include/uapi/asm/kvm.h|   5 -
 arch/powerpc/kvm/book3s_64_vio.c   | 327 -
 arch/powerpc/kvm/book3s_64_vio_hv.c| 142 +
 arch/powerpc/kvm/powerpc.c |   1 +
 include/linux/kvm_host.h   |   1 +
 virt/kvm/kvm_main.c|   5 +
 8 files changed, 517 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
new file mode 100644
index 000..b911945
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
@@ -0,0 +1,40 @@
+SPAPR TCE IOMMU device
+
+Capability: KVM_CAP_SPAPR_TCE_IOMMU
+Architectures: powerpc
+
+Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
+
+Groups:
+  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
+  Attributes: one VFIO IOMMU fd per LIOBN, indexed by LIOBN
+
+This is completely made up device which provides API to link
+logical bus number (LIOBN) and IOMMU group. The user space has
+to create a new SPAPR TCE IOMMU device once per KVM session
+and use set_attr to add or remove a logical bus.
+
+LIOBN is a PCI bus identifier from PPC64-server (sPAPR) DMA hypercalls
+(H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE).
+IOMMU group is a minimal isolated device set which can be passed to
+the user space via VFIO.
+
+The userspace adds the new LIOBN-IOMMU link by calling KVM_SET_DEVICE_ATTR
+with the attribute initialized as shown below:
+struct kvm_device_attr attr = {
+   .flags = 0,
+   .group =