Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-06 Thread Gleb Natapov
On Fri, Sep 06, 2013 at 09:38:21AM +1000, Alexey Kardashevskiy wrote:
 On 09/06/2013 04:10 AM, Gleb Natapov wrote:
  On Wed, Sep 04, 2013 at 02:01:28AM +1000, Alexey Kardashevskiy wrote:
  On 09/03/2013 08:53 PM, Gleb Natapov wrote:
  On Mon, Sep 02, 2013 at 01:14:29PM +1000, Alexey Kardashevskiy wrote:
  On 09/01/2013 10:06 PM, Gleb Natapov wrote:
  On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
  This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
  and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
  them to user space which saves time on switching to user space and 
  back.
 
  Both real and virtual modes are supported. The kernel tries to
  handle a TCE request in the real mode, if fails it passes the request
  to the virtual mode to complete the operation. If it a virtual mode
  handler fails, the request is passed to user space.
 
  The first user of this is VFIO on POWER. Trampolines to the VFIO 
  external
  user API functions are required for this patch.
 
  This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
  number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel 
  handling
  of map/unmap requests. The device supports a single attribute which is
  a struct with LIOBN and IOMMU fd. When the attribute is set, the device
  establishes the connection between KVM and VFIO.
 
  Tests show that this patch increases transmission speed from 220MB/s
  to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
 
  Signed-off-by: Paul Mackerras pau...@samba.org
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
  ---
 
  Changes:
  v9:
  * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
  KVM device
  * release_spapr_tce_table() is not shared between different TCE types
  * reduced the patch size by moving VFIO external API
  trampolines to separate patche
  * moved documentation from Documentation/virtual/kvm/api.txt to
  Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
  v8:
  * fixed warnings from check_patch.pl
 
  2013/07/11:
  * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
  for KVM_BOOK3S_64
  * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much 
  sense
  for this here but the next patch for hugepages support will use it 
  more.
 
  2013/07/06:
  * added realmode arch_spin_lock to protect TCE table from races
  in real and virtual modes
  * POWERPC IOMMU API is changed to support real mode
  * iommu_take_ownership and iommu_release_ownership are protected by
  iommu_table's locks
  * VFIO external user API use rewritten
  * multiple small fixes
 
  2013/06/27:
  * tce_list page is referenced now in order to protect it from accident
  invalidation during H_PUT_TCE_INDIRECT execution
  * added use of the external user VFIO API
 
  2013/06/05:
  * changed capability number
  * changed ioctl number
  * update the doc article number
 
  2013/05/20:
  * removed get_user() from real mode handlers
  * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts 
  there
  translated TCEs, tries realmode_get_page() on those and if it fails, it
  passes control over the virtual mode handler which tries to finish
  the request handling
  * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY 
  bit
  on a page
  * The only reason to pass the request to user mode now is when the 
  user mode
  did not register TCE table in the kernel, in all other cases the 
  virtual mode
  handler is expected to do the job
  ---
   .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
   arch/powerpc/include/asm/kvm_host.h|   4 +
   arch/powerpc/kvm/book3s_64_vio.c   | 310 
  -
   arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
   arch/powerpc/kvm/powerpc.c |   1 +
   include/linux/kvm_host.h   |   1 +
   virt/kvm/kvm_main.c|   5 +
   7 files changed, 477 insertions(+), 3 deletions(-)
   create mode 100644 
  Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
  diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
  b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
  new file mode 100644
  index 000..4bc8fc3
  --- /dev/null
  +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
  @@ -0,0 +1,37 @@
  +SPAPR TCE IOMMU device
  +
  +Capability: KVM_CAP_SPAPR_TCE_IOMMU
  +Architectures: powerpc
  +
  +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
  +
  +Groups:
  +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
  +  Attributes: single attribute with pair { LIOBN, IOMMU fd}
  +
  +This is completely made up device which provides API to link
  +logical bus number (LIOBN) and IOMMU group. The user space has
  +to create a new SPAPR TCE IOMMU device per a logical bus.
  +
  Why not have one device that can handle multimple links?
 
 
  I can do that. If I make it so, it 

Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-06 Thread Alexey Kardashevskiy
 bothers me is it is just a
 May be I do not understand usage pattern here. Why do you feel that device
 that can handle multiple links is worse than device per link? How many 
 logical
 buses is there usually? How often they created/destroyed? I am not 
 insisting
 on the change, just trying to understand why you do not like it.


 Is it usually one PCI host bus adapter per IOMMU group which is usually
 one PCI card or 2-3 cards if it is a legacy PCI-X, and they are created
 when QEMU-KVM starts. Not many. And they live till KVM ends.

 My point is why would I want to put all links to one device? It all is just
 a matter of taste and nothing more. Or I am missing something but I do not
 see what. If it is all about making thing to be kosher/halal/orthodox, then
 I have more stuff to do, like reworking the emulated TCEs. But if is it for
 (I do not know, just guessing) performance or something like that - then
 I'll fix it, I just need to know what I am fixing.

 Each device creates an fd, if you can have a lot of them eventually this
 will be a bottleneck. You are saying this is not the case, so lets go
 with proposed interface.


 Did you decide not to answer the email which Ben sent yesterday or you just
 did not see it? Just checking :)

 Haven't seen it. Which one?


Subject: Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel
handling
Date: Thu, 05 Sep 2013 14:05:09 +1000
From: Benjamin Herrenschmidt b...@kernel.crashing.org
To: Gleb Natapov g...@redhat.com
CC: Alexey Kardashevskiy a...@ozlabs.ru, linuxppc-dev@lists.ozlabs.org,
 David Gibson da...@gibson.dropbear.id.au,Paul Mackerras
pau...@samba.org,Paolo Bonzini pbonz...@redhat.com, Alexander
Graf ag...@suse.de,k...@vger.kernel.org,
linux-ker...@vger.kernel.org,kvm-...@vger.kernel.org,
linux...@kvack.org


-- 
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-06 Thread Gleb Natapov
On Thu, Sep 05, 2013 at 02:05:09PM +1000, Benjamin Herrenschmidt wrote:
 On Tue, 2013-09-03 at 13:53 +0300, Gleb Natapov wrote:
   Or supporting all IOMMU links (and leaving emulated stuff as is) in on
   device is the last thing I have to do and then you'll ack the patch?
   
  I am concerned more about API here. Internal implementation details I
  leave to powerpc experts :)
 
 So Gleb, I want to step in for a bit here.
 
 While I understand that the new KVM device API is all nice and shiny and that 
 this
 whole thing should probably have been KVM devices in the first place (had they
 existed or had we been told back then), the point is, the API for handling
 HW IOMMUs that Alexey is trying to add is an extension of an existing 
 mechanism
 used for emulated IOMMUs.
 
 The internal data structure is shared, and fundamentally, by forcing him to
 use that new KVM device for the new stuff, we create a oddball API with
 an ioctl for one type of iommu and a KVM device for the other, which makes
 the implementation a complete mess in the kernel (and you should care :-)
 
Is it unfixable mess? Even if Alexey will do what you suggested earlier?

  - Convert *both* existing TCE objects to the new
  KVM_CREATE_DEVICE, and have some backward compat code for the old one.

The point is implementation usually can be changed, but for API it is
much harder to do so.

 So for something completely new, I would tend to agree with you. However, I
 still think that for this specific case, we should just plonk-in the original
 ioctl proposed by Alexey and be done with it.
 
Do you think this is the last extension to IOMMU code, or we will see
more and will use same justification to continue adding ioctls?

--
Gleb.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-06 Thread Alexey Kardashevskiy
On 09/06/2013 04:57 PM, Gleb Natapov wrote:
 On Thu, Sep 05, 2013 at 02:05:09PM +1000, Benjamin Herrenschmidt wrote:
 On Tue, 2013-09-03 at 13:53 +0300, Gleb Natapov wrote:
 Or supporting all IOMMU links (and leaving emulated stuff as is) in on
 device is the last thing I have to do and then you'll ack the patch?

 I am concerned more about API here. Internal implementation details I
 leave to powerpc experts :)

 So Gleb, I want to step in for a bit here.

 While I understand that the new KVM device API is all nice and shiny and 
 that this
 whole thing should probably have been KVM devices in the first place (had 
 they
 existed or had we been told back then), the point is, the API for handling
 HW IOMMUs that Alexey is trying to add is an extension of an existing 
 mechanism
 used for emulated IOMMUs.

 The internal data structure is shared, and fundamentally, by forcing him to
 use that new KVM device for the new stuff, we create a oddball API with
 an ioctl for one type of iommu and a KVM device for the other, which makes
 the implementation a complete mess in the kernel (and you should care :-)

 Is it unfixable mess? Even if Alexey will do what you suggested earlier?
 
   - Convert *both* existing TCE objects to the new
   KVM_CREATE_DEVICE, and have some backward compat code for the old one.



If we convert *old*, then for compatibility we will need one KVM device
(more precisely, one fd) per an TCE object (not one for all TCE objects) as
the guest (upstream QEMU) will mmap the table via mmap() and it won't use
any offset when mapping this fd.

The current KVM device implementation does not even support mmap().

So I would go with the KVM device patch I posted and really want to avoid
putting all TCEs into one device.



 The point is implementation usually can be changed, but for API it is
 much harder to do so.
 
 So for something completely new, I would tend to agree with you. However, I
 still think that for this specific case, we should just plonk-in the original
 ioctl proposed by Alexey and be done with it.

 Do you think this is the last extension to IOMMU code, or we will see
 more and will use same justification to continue adding ioctls?




-- 
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-06 Thread Alexey Kardashevskiy
On 09/06/2013 04:57 PM, Gleb Natapov wrote:
 On Thu, Sep 05, 2013 at 02:05:09PM +1000, Benjamin Herrenschmidt wrote:
 On Tue, 2013-09-03 at 13:53 +0300, Gleb Natapov wrote:
 Or supporting all IOMMU links (and leaving emulated stuff as is) in on
 device is the last thing I have to do and then you'll ack the patch?

 I am concerned more about API here. Internal implementation details I
 leave to powerpc experts :)

 So Gleb, I want to step in for a bit here.

 While I understand that the new KVM device API is all nice and shiny and 
 that this
 whole thing should probably have been KVM devices in the first place (had 
 they
 existed or had we been told back then), the point is, the API for handling
 HW IOMMUs that Alexey is trying to add is an extension of an existing 
 mechanism
 used for emulated IOMMUs.

 The internal data structure is shared, and fundamentally, by forcing him to
 use that new KVM device for the new stuff, we create a oddball API with
 an ioctl for one type of iommu and a KVM device for the other, which makes
 the implementation a complete mess in the kernel (and you should care :-)

 Is it unfixable mess? Even if Alexey will do what you suggested earlier?
 
   - Convert *both* existing TCE objects to the new
   KVM_CREATE_DEVICE, and have some backward compat code for the old one.
 
 The point is implementation usually can be changed, but for API it is
 much harder to do so.
 
 So for something completely new, I would tend to agree with you. However, I
 still think that for this specific case, we should just plonk-in the original
 ioctl proposed by Alexey and be done with it.

 Do you think this is the last extension to IOMMU code, or we will see
 more and will use same justification to continue adding ioctls?


Ok. I give up :) I implemented KVM device the way you suggested. Could you
please have a look? It is [PATCH v10 12/13] KVM: PPC: Add support for
IOMMU in-kernel handling, attached to this thread. Thanks!



-- 
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-05 Thread Gleb Natapov
On Wed, Sep 04, 2013 at 02:01:28AM +1000, Alexey Kardashevskiy wrote:
 On 09/03/2013 08:53 PM, Gleb Natapov wrote:
  On Mon, Sep 02, 2013 at 01:14:29PM +1000, Alexey Kardashevskiy wrote:
  On 09/01/2013 10:06 PM, Gleb Natapov wrote:
  On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
  This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
  and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
  them to user space which saves time on switching to user space and back.
 
  Both real and virtual modes are supported. The kernel tries to
  handle a TCE request in the real mode, if fails it passes the request
  to the virtual mode to complete the operation. If it a virtual mode
  handler fails, the request is passed to user space.
 
  The first user of this is VFIO on POWER. Trampolines to the VFIO external
  user API functions are required for this patch.
 
  This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
  number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
  of map/unmap requests. The device supports a single attribute which is
  a struct with LIOBN and IOMMU fd. When the attribute is set, the device
  establishes the connection between KVM and VFIO.
 
  Tests show that this patch increases transmission speed from 220MB/s
  to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
 
  Signed-off-by: Paul Mackerras pau...@samba.org
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
  ---
 
  Changes:
  v9:
  * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
  KVM device
  * release_spapr_tce_table() is not shared between different TCE types
  * reduced the patch size by moving VFIO external API
  trampolines to separate patche
  * moved documentation from Documentation/virtual/kvm/api.txt to
  Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
  v8:
  * fixed warnings from check_patch.pl
 
  2013/07/11:
  * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
  for KVM_BOOK3S_64
  * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much 
  sense
  for this here but the next patch for hugepages support will use it more.
 
  2013/07/06:
  * added realmode arch_spin_lock to protect TCE table from races
  in real and virtual modes
  * POWERPC IOMMU API is changed to support real mode
  * iommu_take_ownership and iommu_release_ownership are protected by
  iommu_table's locks
  * VFIO external user API use rewritten
  * multiple small fixes
 
  2013/06/27:
  * tce_list page is referenced now in order to protect it from accident
  invalidation during H_PUT_TCE_INDIRECT execution
  * added use of the external user VFIO API
 
  2013/06/05:
  * changed capability number
  * changed ioctl number
  * update the doc article number
 
  2013/05/20:
  * removed get_user() from real mode handlers
  * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
  translated TCEs, tries realmode_get_page() on those and if it fails, it
  passes control over the virtual mode handler which tries to finish
  the request handling
  * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
  on a page
  * The only reason to pass the request to user mode now is when the user 
  mode
  did not register TCE table in the kernel, in all other cases the virtual 
  mode
  handler is expected to do the job
  ---
   .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
   arch/powerpc/include/asm/kvm_host.h|   4 +
   arch/powerpc/kvm/book3s_64_vio.c   | 310 
  -
   arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
   arch/powerpc/kvm/powerpc.c |   1 +
   include/linux/kvm_host.h   |   1 +
   virt/kvm/kvm_main.c|   5 +
   7 files changed, 477 insertions(+), 3 deletions(-)
   create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
  diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
  b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
  new file mode 100644
  index 000..4bc8fc3
  --- /dev/null
  +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
  @@ -0,0 +1,37 @@
  +SPAPR TCE IOMMU device
  +
  +Capability: KVM_CAP_SPAPR_TCE_IOMMU
  +Architectures: powerpc
  +
  +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
  +
  +Groups:
  +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
  +  Attributes: single attribute with pair { LIOBN, IOMMU fd}
  +
  +This is completely made up device which provides API to link
  +logical bus number (LIOBN) and IOMMU group. The user space has
  +to create a new SPAPR TCE IOMMU device per a logical bus.
  +
  Why not have one device that can handle multimple links?
 
 
  I can do that. If I make it so, it won't even look as a device at all, just
  some weird interface to KVM but ok. What bothers me is it is just a
  May be I do not understand 

Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-05 Thread Alexey Kardashevskiy
On 09/06/2013 04:10 AM, Gleb Natapov wrote:
 On Wed, Sep 04, 2013 at 02:01:28AM +1000, Alexey Kardashevskiy wrote:
 On 09/03/2013 08:53 PM, Gleb Natapov wrote:
 On Mon, Sep 02, 2013 at 01:14:29PM +1000, Alexey Kardashevskiy wrote:
 On 09/01/2013 10:06 PM, Gleb Natapov wrote:
 On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
 them to user space which saves time on switching to user space and back.

 Both real and virtual modes are supported. The kernel tries to
 handle a TCE request in the real mode, if fails it passes the request
 to the virtual mode to complete the operation. If it a virtual mode
 handler fails, the request is passed to user space.

 The first user of this is VFIO on POWER. Trampolines to the VFIO external
 user API functions are required for this patch.

 This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
 number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
 of map/unmap requests. The device supports a single attribute which is
 a struct with LIOBN and IOMMU fd. When the attribute is set, the device
 establishes the connection between KVM and VFIO.

 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

 Signed-off-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---

 Changes:
 v9:
 * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
 KVM device
 * release_spapr_tce_table() is not shared between different TCE types
 * reduced the patch size by moving VFIO external API
 trampolines to separate patche
 * moved documentation from Documentation/virtual/kvm/api.txt to
 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

 v8:
 * fixed warnings from check_patch.pl

 2013/07/11:
 * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
 for KVM_BOOK3S_64
 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much 
 sense
 for this here but the next patch for hugepages support will use it more.

 2013/07/06:
 * added realmode arch_spin_lock to protect TCE table from races
 in real and virtual modes
 * POWERPC IOMMU API is changed to support real mode
 * iommu_take_ownership and iommu_release_ownership are protected by
 iommu_table's locks
 * VFIO external user API use rewritten
 * multiple small fixes

 2013/06/27:
 * tce_list page is referenced now in order to protect it from accident
 invalidation during H_PUT_TCE_INDIRECT execution
 * added use of the external user VFIO API

 2013/06/05:
 * changed capability number
 * changed ioctl number
 * update the doc article number

 2013/05/20:
 * removed get_user() from real mode handlers
 * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
 translated TCEs, tries realmode_get_page() on those and if it fails, it
 passes control over the virtual mode handler which tries to finish
 the request handling
 * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
 on a page
 * The only reason to pass the request to user mode now is when the user 
 mode
 did not register TCE table in the kernel, in all other cases the virtual 
 mode
 handler is expected to do the job
 ---
  .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
  arch/powerpc/include/asm/kvm_host.h|   4 +
  arch/powerpc/kvm/book3s_64_vio.c   | 310 
 -
  arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
  arch/powerpc/kvm/powerpc.c |   1 +
  include/linux/kvm_host.h   |   1 +
  virt/kvm/kvm_main.c|   5 +
  7 files changed, 477 insertions(+), 3 deletions(-)
  create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

 diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
 b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 new file mode 100644
 index 000..4bc8fc3
 --- /dev/null
 +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 @@ -0,0 +1,37 @@
 +SPAPR TCE IOMMU device
 +
 +Capability: KVM_CAP_SPAPR_TCE_IOMMU
 +Architectures: powerpc
 +
 +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
 +
 +Groups:
 +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
 +  Attributes: single attribute with pair { LIOBN, IOMMU fd}
 +
 +This is completely made up device which provides API to link
 +logical bus number (LIOBN) and IOMMU group. The user space has
 +to create a new SPAPR TCE IOMMU device per a logical bus.
 +
 Why not have one device that can handle multimple links?


 I can do that. If I make it so, it won't even look as a device at all, just
 some weird interface to KVM but ok. What bothers me is it is just a
 May be I do not understand usage pattern here. Why do you feel that device
 that can handle multiple 

Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-04 Thread Benjamin Herrenschmidt
On Tue, 2013-09-03 at 13:53 +0300, Gleb Natapov wrote:
  Or supporting all IOMMU links (and leaving emulated stuff as is) in on
  device is the last thing I have to do and then you'll ack the patch?
  
 I am concerned more about API here. Internal implementation details I
 leave to powerpc experts :)

So Gleb, I want to step in for a bit here.

While I understand that the new KVM device API is all nice and shiny and that 
this
whole thing should probably have been KVM devices in the first place (had they
existed or had we been told back then), the point is, the API for handling
HW IOMMUs that Alexey is trying to add is an extension of an existing mechanism
used for emulated IOMMUs.

The internal data structure is shared, and fundamentally, by forcing him to
use that new KVM device for the new stuff, we create a oddball API with
an ioctl for one type of iommu and a KVM device for the other, which makes
the implementation a complete mess in the kernel (and you should care :-)

So for something completely new, I would tend to agree with you. However, I
still think that for this specific case, we should just plonk-in the original
ioctl proposed by Alexey and be done with it.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-03 Thread Gleb Natapov
On Mon, Sep 02, 2013 at 01:14:29PM +1000, Alexey Kardashevskiy wrote:
 On 09/01/2013 10:06 PM, Gleb Natapov wrote:
  On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
  This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
  and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
  them to user space which saves time on switching to user space and back.
 
  Both real and virtual modes are supported. The kernel tries to
  handle a TCE request in the real mode, if fails it passes the request
  to the virtual mode to complete the operation. If it a virtual mode
  handler fails, the request is passed to user space.
 
  The first user of this is VFIO on POWER. Trampolines to the VFIO external
  user API functions are required for this patch.
 
  This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
  number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
  of map/unmap requests. The device supports a single attribute which is
  a struct with LIOBN and IOMMU fd. When the attribute is set, the device
  establishes the connection between KVM and VFIO.
 
  Tests show that this patch increases transmission speed from 220MB/s
  to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
 
  Signed-off-by: Paul Mackerras pau...@samba.org
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
  ---
 
  Changes:
  v9:
  * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
  KVM device
  * release_spapr_tce_table() is not shared between different TCE types
  * reduced the patch size by moving VFIO external API
  trampolines to separate patche
  * moved documentation from Documentation/virtual/kvm/api.txt to
  Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
  v8:
  * fixed warnings from check_patch.pl
 
  2013/07/11:
  * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
  for KVM_BOOK3S_64
  * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
  for this here but the next patch for hugepages support will use it more.
 
  2013/07/06:
  * added realmode arch_spin_lock to protect TCE table from races
  in real and virtual modes
  * POWERPC IOMMU API is changed to support real mode
  * iommu_take_ownership and iommu_release_ownership are protected by
  iommu_table's locks
  * VFIO external user API use rewritten
  * multiple small fixes
 
  2013/06/27:
  * tce_list page is referenced now in order to protect it from accident
  invalidation during H_PUT_TCE_INDIRECT execution
  * added use of the external user VFIO API
 
  2013/06/05:
  * changed capability number
  * changed ioctl number
  * update the doc article number
 
  2013/05/20:
  * removed get_user() from real mode handlers
  * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
  translated TCEs, tries realmode_get_page() on those and if it fails, it
  passes control over the virtual mode handler which tries to finish
  the request handling
  * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
  on a page
  * The only reason to pass the request to user mode now is when the user 
  mode
  did not register TCE table in the kernel, in all other cases the virtual 
  mode
  handler is expected to do the job
  ---
   .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
   arch/powerpc/include/asm/kvm_host.h|   4 +
   arch/powerpc/kvm/book3s_64_vio.c   | 310 
  -
   arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
   arch/powerpc/kvm/powerpc.c |   1 +
   include/linux/kvm_host.h   |   1 +
   virt/kvm/kvm_main.c|   5 +
   7 files changed, 477 insertions(+), 3 deletions(-)
   create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
  diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
  b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
  new file mode 100644
  index 000..4bc8fc3
  --- /dev/null
  +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
  @@ -0,0 +1,37 @@
  +SPAPR TCE IOMMU device
  +
  +Capability: KVM_CAP_SPAPR_TCE_IOMMU
  +Architectures: powerpc
  +
  +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
  +
  +Groups:
  +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
  +  Attributes: single attribute with pair { LIOBN, IOMMU fd}
  +
  +This is completely made up device which provides API to link
  +logical bus number (LIOBN) and IOMMU group. The user space has
  +to create a new SPAPR TCE IOMMU device per a logical bus.
  +
  Why not have one device that can handle multimple links?
 
 
 I can do that. If I make it so, it won't even look as a device at all, just
 some weird interface to KVM but ok. What bothers me is it is just a
May be I do not understand usage pattern here. Why do you feel that device
that can handle multiple links is worse than device per link? How many logical

Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-03 Thread Alexey Kardashevskiy
On 09/03/2013 08:53 PM, Gleb Natapov wrote:
 On Mon, Sep 02, 2013 at 01:14:29PM +1000, Alexey Kardashevskiy wrote:
 On 09/01/2013 10:06 PM, Gleb Natapov wrote:
 On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
 them to user space which saves time on switching to user space and back.

 Both real and virtual modes are supported. The kernel tries to
 handle a TCE request in the real mode, if fails it passes the request
 to the virtual mode to complete the operation. If it a virtual mode
 handler fails, the request is passed to user space.

 The first user of this is VFIO on POWER. Trampolines to the VFIO external
 user API functions are required for this patch.

 This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
 number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
 of map/unmap requests. The device supports a single attribute which is
 a struct with LIOBN and IOMMU fd. When the attribute is set, the device
 establishes the connection between KVM and VFIO.

 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

 Signed-off-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---

 Changes:
 v9:
 * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
 KVM device
 * release_spapr_tce_table() is not shared between different TCE types
 * reduced the patch size by moving VFIO external API
 trampolines to separate patche
 * moved documentation from Documentation/virtual/kvm/api.txt to
 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

 v8:
 * fixed warnings from check_patch.pl

 2013/07/11:
 * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
 for KVM_BOOK3S_64
 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
 for this here but the next patch for hugepages support will use it more.

 2013/07/06:
 * added realmode arch_spin_lock to protect TCE table from races
 in real and virtual modes
 * POWERPC IOMMU API is changed to support real mode
 * iommu_take_ownership and iommu_release_ownership are protected by
 iommu_table's locks
 * VFIO external user API use rewritten
 * multiple small fixes

 2013/06/27:
 * tce_list page is referenced now in order to protect it from accident
 invalidation during H_PUT_TCE_INDIRECT execution
 * added use of the external user VFIO API

 2013/06/05:
 * changed capability number
 * changed ioctl number
 * update the doc article number

 2013/05/20:
 * removed get_user() from real mode handlers
 * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
 translated TCEs, tries realmode_get_page() on those and if it fails, it
 passes control over the virtual mode handler which tries to finish
 the request handling
 * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
 on a page
 * The only reason to pass the request to user mode now is when the user 
 mode
 did not register TCE table in the kernel, in all other cases the virtual 
 mode
 handler is expected to do the job
 ---
  .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
  arch/powerpc/include/asm/kvm_host.h|   4 +
  arch/powerpc/kvm/book3s_64_vio.c   | 310 
 -
  arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
  arch/powerpc/kvm/powerpc.c |   1 +
  include/linux/kvm_host.h   |   1 +
  virt/kvm/kvm_main.c|   5 +
  7 files changed, 477 insertions(+), 3 deletions(-)
  create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

 diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
 b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 new file mode 100644
 index 000..4bc8fc3
 --- /dev/null
 +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 @@ -0,0 +1,37 @@
 +SPAPR TCE IOMMU device
 +
 +Capability: KVM_CAP_SPAPR_TCE_IOMMU
 +Architectures: powerpc
 +
 +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
 +
 +Groups:
 +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
 +  Attributes: single attribute with pair { LIOBN, IOMMU fd}
 +
 +This is completely made up device which provides API to link
 +logical bus number (LIOBN) and IOMMU group. The user space has
 +to create a new SPAPR TCE IOMMU device per a logical bus.
 +
 Why not have one device that can handle multimple links?


 I can do that. If I make it so, it won't even look as a device at all, just
 some weird interface to KVM but ok. What bothers me is it is just a
 May be I do not understand usage pattern here. Why do you feel that device
 that can handle multiple links is worse than device per link? How many logical
 buses is there usually? How often they created/destroyed? I am not 

Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-01 Thread Gleb Natapov
On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
 them to user space which saves time on switching to user space and back.
 
 Both real and virtual modes are supported. The kernel tries to
 handle a TCE request in the real mode, if fails it passes the request
 to the virtual mode to complete the operation. If it a virtual mode
 handler fails, the request is passed to user space.
 
 The first user of this is VFIO on POWER. Trampolines to the VFIO external
 user API functions are required for this patch.
 
 This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
 number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
 of map/unmap requests. The device supports a single attribute which is
 a struct with LIOBN and IOMMU fd. When the attribute is set, the device
 establishes the connection between KVM and VFIO.
 
 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 ---
 
 Changes:
 v9:
 * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
 KVM device
 * release_spapr_tce_table() is not shared between different TCE types
 * reduced the patch size by moving VFIO external API
 trampolines to separate patche
 * moved documentation from Documentation/virtual/kvm/api.txt to
 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
 v8:
 * fixed warnings from check_patch.pl
 
 2013/07/11:
 * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
 for KVM_BOOK3S_64
 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
 for this here but the next patch for hugepages support will use it more.
 
 2013/07/06:
 * added realmode arch_spin_lock to protect TCE table from races
 in real and virtual modes
 * POWERPC IOMMU API is changed to support real mode
 * iommu_take_ownership and iommu_release_ownership are protected by
 iommu_table's locks
 * VFIO external user API use rewritten
 * multiple small fixes
 
 2013/06/27:
 * tce_list page is referenced now in order to protect it from accident
 invalidation during H_PUT_TCE_INDIRECT execution
 * added use of the external user VFIO API
 
 2013/06/05:
 * changed capability number
 * changed ioctl number
 * update the doc article number
 
 2013/05/20:
 * removed get_user() from real mode handlers
 * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
 translated TCEs, tries realmode_get_page() on those and if it fails, it
 passes control over the virtual mode handler which tries to finish
 the request handling
 * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
 on a page
 * The only reason to pass the request to user mode now is when the user mode
 did not register TCE table in the kernel, in all other cases the virtual mode
 handler is expected to do the job
 ---
  .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
  arch/powerpc/include/asm/kvm_host.h|   4 +
  arch/powerpc/kvm/book3s_64_vio.c   | 310 
 -
  arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
  arch/powerpc/kvm/powerpc.c |   1 +
  include/linux/kvm_host.h   |   1 +
  virt/kvm/kvm_main.c|   5 +
  7 files changed, 477 insertions(+), 3 deletions(-)
  create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
 diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
 b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 new file mode 100644
 index 000..4bc8fc3
 --- /dev/null
 +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 @@ -0,0 +1,37 @@
 +SPAPR TCE IOMMU device
 +
 +Capability: KVM_CAP_SPAPR_TCE_IOMMU
 +Architectures: powerpc
 +
 +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
 +
 +Groups:
 +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
 +  Attributes: single attribute with pair { LIOBN, IOMMU fd}
 +
 +This is completely made up device which provides API to link
 +logical bus number (LIOBN) and IOMMU group. The user space has
 +to create a new SPAPR TCE IOMMU device per a logical bus.
 +
Why not have one device that can handle multimple links?

 +LIOBN is a PCI bus identifier from PPC64-server (sPAPR) DMA hypercalls
 +(H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE).
 +IOMMU group is a minimal isolated device set which can be passed to
 +the user space via VFIO.
 +
 +Right after creation the device is in uninitlized state and requires
 +a KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute to be set.
 +The attribute contains liobn, IOMMU fd and flags:
 +
 +struct kvm_create_spapr_tce_iommu_linkage {
 + __u64 liobn;
 + __u32 fd;
 + __u32 flags;
 +};
 +
 

Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-01 Thread Alexey Kardashevskiy
On 09/01/2013 10:06 PM, Gleb Natapov wrote:
 On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
 them to user space which saves time on switching to user space and back.

 Both real and virtual modes are supported. The kernel tries to
 handle a TCE request in the real mode, if fails it passes the request
 to the virtual mode to complete the operation. If it a virtual mode
 handler fails, the request is passed to user space.

 The first user of this is VFIO on POWER. Trampolines to the VFIO external
 user API functions are required for this patch.

 This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
 number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
 of map/unmap requests. The device supports a single attribute which is
 a struct with LIOBN and IOMMU fd. When the attribute is set, the device
 establishes the connection between KVM and VFIO.

 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

 Signed-off-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---

 Changes:
 v9:
 * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
 KVM device
 * release_spapr_tce_table() is not shared between different TCE types
 * reduced the patch size by moving VFIO external API
 trampolines to separate patche
 * moved documentation from Documentation/virtual/kvm/api.txt to
 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

 v8:
 * fixed warnings from check_patch.pl

 2013/07/11:
 * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
 for KVM_BOOK3S_64
 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
 for this here but the next patch for hugepages support will use it more.

 2013/07/06:
 * added realmode arch_spin_lock to protect TCE table from races
 in real and virtual modes
 * POWERPC IOMMU API is changed to support real mode
 * iommu_take_ownership and iommu_release_ownership are protected by
 iommu_table's locks
 * VFIO external user API use rewritten
 * multiple small fixes

 2013/06/27:
 * tce_list page is referenced now in order to protect it from accident
 invalidation during H_PUT_TCE_INDIRECT execution
 * added use of the external user VFIO API

 2013/06/05:
 * changed capability number
 * changed ioctl number
 * update the doc article number

 2013/05/20:
 * removed get_user() from real mode handlers
 * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
 translated TCEs, tries realmode_get_page() on those and if it fails, it
 passes control over the virtual mode handler which tries to finish
 the request handling
 * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
 on a page
 * The only reason to pass the request to user mode now is when the user mode
 did not register TCE table in the kernel, in all other cases the virtual mode
 handler is expected to do the job
 ---
  .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
  arch/powerpc/include/asm/kvm_host.h|   4 +
  arch/powerpc/kvm/book3s_64_vio.c   | 310 
 -
  arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
  arch/powerpc/kvm/powerpc.c |   1 +
  include/linux/kvm_host.h   |   1 +
  virt/kvm/kvm_main.c|   5 +
  7 files changed, 477 insertions(+), 3 deletions(-)
  create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

 diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
 b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 new file mode 100644
 index 000..4bc8fc3
 --- /dev/null
 +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 @@ -0,0 +1,37 @@
 +SPAPR TCE IOMMU device
 +
 +Capability: KVM_CAP_SPAPR_TCE_IOMMU
 +Architectures: powerpc
 +
 +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
 +
 +Groups:
 +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
 +  Attributes: single attribute with pair { LIOBN, IOMMU fd}
 +
 +This is completely made up device which provides API to link
 +logical bus number (LIOBN) and IOMMU group. The user space has
 +to create a new SPAPR TCE IOMMU device per a logical bus.
 +
 Why not have one device that can handle multimple links?


I can do that. If I make it so, it won't even look as a device at all, just
some weird interface to KVM but ok. What bothers me is it is just a
question what I will have to do next. Because I can easily predict a
suggestion to move kvmppc_spapr_tce_table's (a links list) from
kvm-arch.spapr_tce_tables to that device but I cannot do that for obvious
compatibility reasons caused by the fact that the list is already used for
emulated devices (for the starter - they need mmap()).

Or 

[PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-08-28 Thread Alexey Kardashevskiy
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
them to user space which saves time on switching to user space and back.

Both real and virtual modes are supported. The kernel tries to
handle a TCE request in the real mode, if fails it passes the request
to the virtual mode to complete the operation. If it a virtual mode
handler fails, the request is passed to user space.

The first user of this is VFIO on POWER. Trampolines to the VFIO external
user API functions are required for this patch.

This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
of map/unmap requests. The device supports a single attribute which is
a struct with LIOBN and IOMMU fd. When the attribute is set, the device
establishes the connection between KVM and VFIO.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

---

Changes:
v9:
* KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
KVM device
* release_spapr_tce_table() is not shared between different TCE types
* reduced the patch size by moving VFIO external API
trampolines to separate patche
* moved documentation from Documentation/virtual/kvm/api.txt to
Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

v8:
* fixed warnings from check_patch.pl

2013/07/11:
* removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
for KVM_BOOK3S_64
* kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
for this here but the next patch for hugepages support will use it more.

2013/07/06:
* added realmode arch_spin_lock to protect TCE table from races
in real and virtual modes
* POWERPC IOMMU API is changed to support real mode
* iommu_take_ownership and iommu_release_ownership are protected by
iommu_table's locks
* VFIO external user API use rewritten
* multiple small fixes

2013/06/27:
* tce_list page is referenced now in order to protect it from accident
invalidation during H_PUT_TCE_INDIRECT execution
* added use of the external user VFIO API

2013/06/05:
* changed capability number
* changed ioctl number
* update the doc article number

2013/05/20:
* removed get_user() from real mode handlers
* kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
translated TCEs, tries realmode_get_page() on those and if it fails, it
passes control over the virtual mode handler which tries to finish
the request handling
* kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
on a page
* The only reason to pass the request to user mode now is when the user mode
did not register TCE table in the kernel, in all other cases the virtual mode
handler is expected to do the job
---
 .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
 arch/powerpc/include/asm/kvm_host.h|   4 +
 arch/powerpc/kvm/book3s_64_vio.c   | 310 -
 arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
 arch/powerpc/kvm/powerpc.c |   1 +
 include/linux/kvm_host.h   |   1 +
 virt/kvm/kvm_main.c|   5 +
 7 files changed, 477 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
new file mode 100644
index 000..4bc8fc3
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
@@ -0,0 +1,37 @@
+SPAPR TCE IOMMU device
+
+Capability: KVM_CAP_SPAPR_TCE_IOMMU
+Architectures: powerpc
+
+Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
+
+Groups:
+  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
+  Attributes: single attribute with pair { LIOBN, IOMMU fd}
+
+This is completely made up device which provides API to link
+logical bus number (LIOBN) and IOMMU group. The user space has
+to create a new SPAPR TCE IOMMU device per a logical bus.
+
+LIOBN is a PCI bus identifier from PPC64-server (sPAPR) DMA hypercalls
+(H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE).
+IOMMU group is a minimal isolated device set which can be passed to
+the user space via VFIO.
+
+Right after creation the device is in uninitlized state and requires
+a KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute to be set.
+The attribute contains liobn, IOMMU fd and flags:
+
+struct kvm_create_spapr_tce_iommu_linkage {
+   __u64 liobn;
+   __u32 fd;
+   __u32 flags;
+};
+
+The user space creates the SPAPR TCE IOMMU device, obtains
+an IOMMU fd via VFIO ABI and sets the attribute to the SPAPR TCE IOMMU
+device. At the moment of setting the attribute, the SPAPR TCE IOMMU
+device links LIOBN to IOMMU group and makes