Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-09 Thread Marcelo Tosatti
On Mon, Nov 08, 2010 at 01:41:40PM +0800, Sheng Yang wrote:
 On Saturday 06 November 2010 08:24:51 Marcelo Tosatti wrote:
  On Fri, Nov 05, 2010 at 06:53:50PM +0800, Sheng Yang wrote:
   On Friday 05 November 2010 16:43:33 Michael S. Tsirkin wrote:
On Fri, Nov 05, 2010 at 10:44:19AM +0800, Sheng Yang wrote:
   +};
   +
   +This ioctl would enable in-kernel MSI-X emulation, which would
   handle MSI-X +mask bit in the kernel.
  
  What happens on repeated calls when it's already enabled?
  How does one disable after it's enabled?
 
 Suppose this should only be called once. But again would update the
 MMIO base.

So maybe add this all to documentation.

 It
 would be disabled along with device deassignment.

So what are you going to do when guest enables and later disables MSIX?
Disable assignment completely?
   
   This device goes with PCI resources allocation, rather than IRQ
   assignment.
  
  What about hot plug, for example? I can't see how this can function
  without unregistering regions.
 
 The register time is when PCI resources allocation, and the unregister time 
 is 
 when device was deassigned.
 
 I suppose hot plug path would also notify kernel to deassign the device?

Right.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-07 Thread Sheng Yang
On Friday 05 November 2010 21:27:42 Michael S. Tsirkin wrote:
 On Fri, Nov 05, 2010 at 06:53:50PM +0800, Sheng Yang wrote:
  On Friday 05 November 2010 16:43:33 Michael S. Tsirkin wrote:
   On Fri, Nov 05, 2010 at 10:44:19AM +0800, Sheng Yang wrote:
  +};
  +
  +This ioctl would enable in-kernel MSI-X emulation, which would
  handle MSI-X +mask bit in the kernel.
 
 What happens on repeated calls when it's already enabled?
 How does one disable after it's enabled?

Suppose this should only be called once. But again would update the
MMIO base.
   
   So maybe add this all to documentation.
   
It
would be disabled along with device deassignment.
   
   So what are you going to do when guest enables and later disables MSIX?
   Disable assignment completely?
  
  This device goes with PCI resources allocation, rather than IRQ
  assignment.
 
 I see. I guess we can live with it, but it seems tied to a specific
 mode of use. Would be better to support enabling together with msix too,
 this requires an ability to disable.
 
We enable it explicitly because
not all devices have MSI-X capability.

  +
 
 This is very specialized for assigned devices.  I would like an
 interface not tied to assigned devices explicitly
 (even if the implementation at the moment only works
 for assigned devices, I can patch that in later). E.g. vhost-net
 would benefit from such an extension too.  Maybe tie this to a
 special GSI and this GSI can be an assigned device or not?

You're talking about KVM_ASSIGN_GET_MSIX_ENTRY?
   
   Yes but also, and more importantly, about KVM_ASSIGN_REG_MSIX_MMIO
   and kvm_assigned_msix_mmio really.
   
We can't tie it to GSI, because we
should also cover entries without GSI. The entry number should be
fine for all kinds of devices using MSI-X.
   
   I don't completely understand: entries without GSI
   never need to inject an interrupt. Why do we care
   about them? Let's pass such accesses to userspace.
   
   In any case, I think MSIX GSIs are inexpensive (we never search them
   all), so we could create GSIs for all entries if we wanted to.
  
  All mask bit accessing controlled by kernel, in order to keep consistent.
 
 Well I think passing whatever we can't handle to userspace makes
 complete sense. We do this for address/data writes after all.
 If we wanted to do it *all* in kernel we would handle
 address/data there too. I think Avi suggested this at some point.

I think Avi's point was we handle mask bit half in the kernel and half in the 
userspace, which makes API complex. I agree with this so I moved all mask bit 
handling to kernel.

Data/address is another issue, due to QEmu own the routing table. We can change 
it 
in the future, but this is not related to this patch.
 
 The main point is msix mmio BAR handling is completely unrelated
 to the assigned device. Emulated devices have an msix BAR as well.
 So tying it to an assigned devices is suboptimal, we'll need to
 grow yet another interface for these.

If you only means the name, we can change that. But I think all MSI-X entries 
would have entry number, regardless of if it got GSI number. Keep entry number 
as 
parameter makes more sense to me.

And still, the patch's only current user is assigned device. I am glad if it 
can 
cover the other future case along with it, but that's not the primary target of 
this patch.
 
I am not sure about what PV one would looks like.
I use kvm_assigned_msix_entry
just because it can be easily reused.
   
   Not only PV, emulated devices too. OK let's try to figure out:
   
   What happens with userspace is, we inject interrupts either through
   ioctls or eventfds. Avoiding an exit to userspace on MSIX
   access is beneficial in exactly the same way.
   
   We could have an ioctl to pass in the table addresses, and mapping
   table to relevant GSIs. For example, add kvm_msix_mmio with table
   start/end, also specify which GSIs are covered.
   Maybe ask that userspace allocate all GSIs for a table consequitively?
   Or add kvm_irq_routing_msix which is same as msi but
   also has the mmio addresses? Bonus points for avoiding the need
   to scan all table on each write. For example, don't scan at all,
   instead just set a bit in KVM, and set irq entry on the next
   interrupt only: we have KVM_MAX_MSIX_PER_DEV as 256 so maybe a small 32
   byte bitmap would be enough for this, avoding a linear scan.
   
   When we pass in kvm_msix_mmio, kvm would start registering masked
   state.
   
  +4.48 KVM_ASSIGN_GET_MSIX_ENTRY
  +
  +Capability: KVM_CAP_DEVICE_MSIX_MASK
  +Architectures: x86
  +Type: vm ioctl
  +Parameters: struct kvm_assigned_msix_entry (in and out)
  +Returns: 0 on success, !0 on error
  +
  +struct kvm_assigned_msix_entry {
  +   /* Assigned device's ID */
  +   __u32 assigned_dev_id;
  +   /* Ignored */
  +   

Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-07 Thread Sheng Yang
On Saturday 06 November 2010 08:24:51 Marcelo Tosatti wrote:
 On Fri, Nov 05, 2010 at 06:53:50PM +0800, Sheng Yang wrote:
  On Friday 05 November 2010 16:43:33 Michael S. Tsirkin wrote:
   On Fri, Nov 05, 2010 at 10:44:19AM +0800, Sheng Yang wrote:
  +};
  +
  +This ioctl would enable in-kernel MSI-X emulation, which would
  handle MSI-X +mask bit in the kernel.
 
 What happens on repeated calls when it's already enabled?
 How does one disable after it's enabled?

Suppose this should only be called once. But again would update the
MMIO base.
   
   So maybe add this all to documentation.
   
It
would be disabled along with device deassignment.
   
   So what are you going to do when guest enables and later disables MSIX?
   Disable assignment completely?
  
  This device goes with PCI resources allocation, rather than IRQ
  assignment.
 
 What about hot plug, for example? I can't see how this can function
 without unregistering regions.

The register time is when PCI resources allocation, and the unregister time is 
when device was deassigned.

I suppose hot plug path would also notify kernel to deassign the device?

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-07 Thread Michael S. Tsirkin
On Mon, Nov 08, 2010 at 11:18:34AM +0800, Sheng Yang wrote:
 On Friday 05 November 2010 21:27:42 Michael S. Tsirkin wrote:
  On Fri, Nov 05, 2010 at 06:53:50PM +0800, Sheng Yang wrote:
   On Friday 05 November 2010 16:43:33 Michael S. Tsirkin wrote:
On Fri, Nov 05, 2010 at 10:44:19AM +0800, Sheng Yang wrote:
   +};
   +
   +This ioctl would enable in-kernel MSI-X emulation, which would
   handle MSI-X +mask bit in the kernel.
  
  What happens on repeated calls when it's already enabled?
  How does one disable after it's enabled?
 
 Suppose this should only be called once. But again would update the
 MMIO base.

So maybe add this all to documentation.

 It
 would be disabled along with device deassignment.

So what are you going to do when guest enables and later disables MSIX?
Disable assignment completely?
   
   This device goes with PCI resources allocation, rather than IRQ
   assignment.
  
  I see. I guess we can live with it, but it seems tied to a specific
  mode of use. Would be better to support enabling together with msix too,
  this requires an ability to disable.
  
 We enable it explicitly because
 not all devices have MSI-X capability.
 
   +
  
  This is very specialized for assigned devices.  I would like an
  interface not tied to assigned devices explicitly
  (even if the implementation at the moment only works
  for assigned devices, I can patch that in later). E.g. vhost-net
  would benefit from such an extension too.  Maybe tie this to a
  special GSI and this GSI can be an assigned device or not?
 
 You're talking about KVM_ASSIGN_GET_MSIX_ENTRY?

Yes but also, and more importantly, about KVM_ASSIGN_REG_MSIX_MMIO
and kvm_assigned_msix_mmio really.

 We can't tie it to GSI, because we
 should also cover entries without GSI. The entry number should be
 fine for all kinds of devices using MSI-X.

I don't completely understand: entries without GSI
never need to inject an interrupt. Why do we care
about them? Let's pass such accesses to userspace.

In any case, I think MSIX GSIs are inexpensive (we never search them
all), so we could create GSIs for all entries if we wanted to.
   
   All mask bit accessing controlled by kernel, in order to keep consistent.
  
  Well I think passing whatever we can't handle to userspace makes
  complete sense. We do this for address/data writes after all.
  If we wanted to do it *all* in kernel we would handle
  address/data there too. I think Avi suggested this at some point.
 
 I think Avi's point was we handle mask bit half in the kernel and half in the 
 userspace, which makes API complex. I agree with this so I moved all mask bit 
 handling to kernel.
 
 Data/address is another issue, due to QEmu own the routing table. We can 
 change it 
 in the future, but this is not related to this patch.
  
  The main point is msix mmio BAR handling is completely unrelated
  to the assigned device. Emulated devices have an msix BAR as well.
  So tying it to an assigned devices is suboptimal, we'll need to
  grow yet another interface for these.
 
 If you only means the name, we can change that.

I mean it shouldn't use the assigned device id. Use some other index:
I suggested adding a new GSI type for this, but can be something else
if you prefer.

 But I think all MSI-X entries 
 would have entry number, regardless of if it got GSI number. Keep entry 
 number as 
 parameter makes more sense to me.
 
 And still, the patch's only current user is assigned device. I am glad if it 
 can 
 cover the other future case along with it, but that's not the primary target 
 of 
 this patch.

It's a problem I think. Stop thinking about the patch for a moment
and think about kvm as a whole. We know we will want such an interface for
emulated and PV devices. What then? Add another interface and a bunch of
code to support it?  Point being, since userspace interfaces need to be
supported forever, they should be generic if possible.


  
 I am not sure about what PV one would looks like.
 I use kvm_assigned_msix_entry
 just because it can be easily reused.

Not only PV, emulated devices too. OK let's try to figure out:

What happens with userspace is, we inject interrupts either through
ioctls or eventfds. Avoiding an exit to userspace on MSIX
access is beneficial in exactly the same way.

We could have an ioctl to pass in the table addresses, and mapping
table to relevant GSIs. For example, add kvm_msix_mmio with table
start/end, also specify which GSIs are covered.
Maybe ask that userspace allocate all GSIs for a table consequitively?
Or add kvm_irq_routing_msix which is same as msi but
also has the mmio addresses? Bonus points for avoiding the need
to scan all table on each write. For example, don't scan at all,
instead just 

[PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-05 Thread Sheng Yang
This patch enable per-vector mask for assigned devices using MSI-X.

This patch provided two new APIs: one is for guest to specific device's MSI-X
table address in MMIO, the other is for userspace to get information about mask
bit.

All the mask bit operation are kept in kernel, in order to accelerate.
Userspace shouldn't access the device MMIO directly for the information,
instead it should uses provided API to do so.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 Documentation/kvm/api.txt |   48 
 arch/x86/kvm/x86.c|1 +
 include/linux/kvm.h   |   22 -
 include/linux/kvm_host.h  |4 +
 virt/kvm/assigned-dev.c   |  286 -
 5 files changed, 359 insertions(+), 2 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index b336266..76f800b 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1085,6 +1085,54 @@ of 4 instructions that make up a hypercall.
 If any additional field gets added to this structure later on, a bit for that
 additional piece of information will be set in the flags bitmap.
 
+4.47 KVM_ASSIGN_REG_MSIX_MMIO
+
+Capability: KVM_CAP_DEVICE_MSIX_MASK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_assigned_msix_mmio (in)
+Returns: 0 on success, !0 on error
+
+struct kvm_assigned_msix_mmio {
+   /* Assigned device's ID */
+   __u32 assigned_dev_id;
+   /* Must be 0 */
+   __u32 flags;
+   /* MSI-X table MMIO address */
+   __u64 base_addr;
+   /* Maximum entries contained in the table, = KVM_MAX_MSIX_PER_DEV */
+   __u32 max_entries_nr;
+   /* Must be 0, reserved for future use */
+   __u32 reserved;
+};
+
+This ioctl would enable in-kernel MSI-X emulation, which would handle MSI-X
+mask bit in the kernel.
+
+4.48 KVM_ASSIGN_GET_MSIX_ENTRY
+
+Capability: KVM_CAP_DEVICE_MSIX_MASK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_assigned_msix_entry (in and out)
+Returns: 0 on success, !0 on error
+
+struct kvm_assigned_msix_entry {
+   /* Assigned device's ID */
+   __u32 assigned_dev_id;
+   /* Ignored */
+   __u32 gsi;
+   /* The index of entry in the MSI-X table */
+   __u16 entry;
+   /* Querying flags and returning status */
+   __u16 flags;
+   /* Must be 0 */
+   __u16 padding[2];
+};
+
+This ioctl would allow userspace to get the status of one specific MSI-X
+entry. Currently we support mask bit status querying.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f3f86b2..8fd5121 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1926,6 +1926,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_DEBUGREGS:
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
+   case KVM_CAP_DEVICE_MSIX_MASK:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 919ae53..bfe5707 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
 #endif
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
+#ifdef __KVM_HAVE_MSIX
+#define KVM_CAP_DEVICE_MSIX_MASK 59
+#endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -671,6 +674,10 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+#define KVM_ASSIGN_GET_MSIX_ENTRY _IOWR(KVMIO,  0x7d, \
+   struct kvm_assigned_msix_entry)
+#define KVM_ASSIGN_REG_MSIX_MMIO  _IOW(KVMIO,  0x7e, \
+   struct kvm_assigned_msix_mmio)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -787,11 +794,24 @@ struct kvm_assigned_msix_nr {
 };
 
 #define KVM_MAX_MSIX_PER_DEV   256
+
+#define KVM_MSIX_FLAG_MASK (1  0)
+#define KVM_MSIX_FLAG_QUERY_MASK   (1  15)
+
 struct kvm_assigned_msix_entry {
__u32 assigned_dev_id;
__u32 gsi;
__u16 entry; /* The index of entry in the MSI-X table */
-   __u16 padding[3];
+   __u16 flags;
+   __u16 padding[2];
+};
+
+struct kvm_assigned_msix_mmio {
+   __u32 assigned_dev_id;
+   __u32 flags;
+   __u64 base_addr;
+   __u32 max_entries_nr;
+   __u32 reserved;
 };
 
 #endif /* __LINUX_KVM_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e2ecbac..f58aaca 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -464,6 +464,10 @@ struct kvm_assigned_dev_kernel {
struct pci_dev *dev;
struct kvm 

Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-05 Thread Michael S. Tsirkin
On Fri, Nov 05, 2010 at 10:44:19AM +0800, Sheng Yang wrote:
   +};
   +
   +This ioctl would enable in-kernel MSI-X emulation, which would handle
   MSI-X +mask bit in the kernel.
  
  What happens on repeated calls when it's already enabled?
  How does one disable after it's enabled?
 
 Suppose this should only be called once. But again would update the MMIO base.


So maybe add this all to documentation.

 It 
 would be disabled along with device deassignment.

So what are you going to do when guest enables and later disables MSIX?
Disable assignment completely?

 We enable it explicitly because 
 not all devices have MSI-X capability.



  
   +
  
  This is very specialized for assigned devices.  I would like an
  interface not tied to assigned devices explicitly
  (even if the implementation at the moment only works
  for assigned devices, I can patch that in later). E.g. vhost-net would
  benefit from such an extension too.  Maybe tie this to a special
  GSI and this GSI can be an assigned device or not?
 
 You're talking about KVM_ASSIGN_GET_MSIX_ENTRY?

Yes but also, and more importantly, about KVM_ASSIGN_REG_MSIX_MMIO
and kvm_assigned_msix_mmio really.

 We can't tie it to GSI, because we 
 should also cover entries without GSI. The entry number should be fine for 
 all 
 kinds of devices using MSI-X.

I don't completely understand: entries without GSI
never need to inject an interrupt. Why do we care
about them? Let's pass such accesses to userspace.

In any case, I think MSIX GSIs are inexpensive (we never search them
all), so we could create GSIs for all entries if we wanted to.

 I am not sure about what PV one would looks like.
 I use kvm_assigned_msix_entry 
 just because it can be easily reused.

Not only PV, emulated devices too. OK let's try to figure out:

What happens with userspace is, we inject interrupts either through
ioctls or eventfds. Avoiding an exit to userspace on MSIX
access is beneficial in exactly the same way.

We could have an ioctl to pass in the table addresses, and mapping
table to relevant GSIs. For example, add kvm_msix_mmio with table
start/end, also specify which GSIs are covered. Maybe ask that
userspace allocate all GSIs for a table consequitively?
Or add kvm_irq_routing_msix which is same as msi but
also has the mmio addresses? Bonus points for avoiding the need
to scan all table on each write. For example, don't scan at all,
instead just set a bit in KVM, and set irq entry on the next
interrupt only: we have KVM_MAX_MSIX_PER_DEV as 256 so maybe a small 32 byte 
bitmap
would be enough for this, avoding a linear scan.

When we pass in kvm_msix_mmio, kvm would start registering masked state.




  
   +4.48 KVM_ASSIGN_GET_MSIX_ENTRY
   +
   +Capability: KVM_CAP_DEVICE_MSIX_MASK
   +Architectures: x86
   +Type: vm ioctl
   +Parameters: struct kvm_assigned_msix_entry (in and out)
   +Returns: 0 on success, !0 on error
   +
   +struct kvm_assigned_msix_entry {
   + /* Assigned device's ID */
   + __u32 assigned_dev_id;
   + /* Ignored */
   + __u32 gsi;
   + /* The index of entry in the MSI-X table */
   + __u16 entry;
   + /* Querying flags and returning status */
   + __u16 flags;
   + /* Must be 0 */
   + __u16 padding[2];
   +};
   +
   +This ioctl would allow userspace to get the status of one specific MSI-X
   +entry. Currently we support mask bit status querying.
   +
   
5. The kvm_run structure

Application code obtains a pointer to the kvm_run structure by
   
   diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
   index f3f86b2..8fd5121 100644
   --- a/arch/x86/kvm/x86.c
   +++ b/arch/x86/kvm/x86.c
   @@ -1926,6 +1926,7 @@ int kvm_dev_ioctl_check_extension(long ext)
   
 case KVM_CAP_DEBUGREGS:
 case KVM_CAP_X86_ROBUST_SINGLESTEP:
   
 case KVM_CAP_XSAVE:
   + case KVM_CAP_DEVICE_MSIX_MASK:
 r = 1;
 break;
 
 case KVM_CAP_COALESCED_MMIO:
   diff --git a/include/linux/kvm.h b/include/linux/kvm.h
   index 919ae53..eafafb1 100644
   --- a/include/linux/kvm.h
   +++ b/include/linux/kvm.h
   @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
   
#endif
#define KVM_CAP_PPC_GET_PVINFO 57
#define KVM_CAP_PPC_IRQ_LEVEL 58
   
   +#ifdef __KVM_HAVE_MSIX
   +#define KVM_CAP_DEVICE_MSIX_MASK 59
   +#endif
  
  We seem to have these HAVE macros all over.
  Avi, what's the idea? Let's drop them?
 
 Um? This kind of marcos are used here to avoid vendor specific string in the 
 header 
 file. 

Am I the only one that dislikes the ifdefs we have all over this code then?
Rest of linux tries to keep ifdefs local by stubbing functionality out.

   + BUG_ON(adev-msix_mmio_base == 0);
  
  Below I see
  adev-msix_mmio_base = msix_mmio-base_addr;
  which comes from userspace. BUG_ON is an unfriendly way to
  tell user about a bug in qemu.
 
  Anyway, I don't think we should special-case 0 gpa.
  It's up to the user where to base the table.
  Use a separate flag to signal that table was initialized.
 
 Base_addr 

Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-05 Thread Michael S. Tsirkin
On Fri, Nov 05, 2010 at 03:16:23PM +0800, Sheng Yang wrote:
 This patch enable per-vector mask for assigned devices using MSI-X.
 
 This patch provided two new APIs: one is for guest to specific device's MSI-X
 table address in MMIO, the other is for userspace to get information about 
 mask
 bit.
 
 All the mask bit operation are kept in kernel, in order to accelerate.
 Userspace shouldn't access the device MMIO directly for the information,
 instead it should uses provided API to do so.
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com

Is this v2? Or repost?  If v2 you would want to supply a changelog.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-05 Thread Sheng Yang
On Friday 05 November 2010 16:43:33 Michael S. Tsirkin wrote:
 On Fri, Nov 05, 2010 at 10:44:19AM +0800, Sheng Yang wrote:
+};
+
+This ioctl would enable in-kernel MSI-X emulation, which would
handle MSI-X +mask bit in the kernel.
   
   What happens on repeated calls when it's already enabled?
   How does one disable after it's enabled?
  
  Suppose this should only be called once. But again would update the MMIO
  base.
 
 So maybe add this all to documentation.
 
  It
  would be disabled along with device deassignment.
 
 So what are you going to do when guest enables and later disables MSIX?
 Disable assignment completely?

This device goes with PCI resources allocation, rather than IRQ assignment.
 
  We enable it explicitly because
  not all devices have MSI-X capability.
  
+
   
   This is very specialized for assigned devices.  I would like an
   interface not tied to assigned devices explicitly
   (even if the implementation at the moment only works
   for assigned devices, I can patch that in later). E.g. vhost-net would
   benefit from such an extension too.  Maybe tie this to a special
   GSI and this GSI can be an assigned device or not?
  
  You're talking about KVM_ASSIGN_GET_MSIX_ENTRY?
 
 Yes but also, and more importantly, about KVM_ASSIGN_REG_MSIX_MMIO
 and kvm_assigned_msix_mmio really.
 
  We can't tie it to GSI, because we
  should also cover entries without GSI. The entry number should be fine
  for all kinds of devices using MSI-X.
 
 I don't completely understand: entries without GSI
 never need to inject an interrupt. Why do we care
 about them? Let's pass such accesses to userspace.

 In any case, I think MSIX GSIs are inexpensive (we never search them
 all), so we could create GSIs for all entries if we wanted to.

All mask bit accessing controlled by kernel, in order to keep consistent.
 
  I am not sure about what PV one would looks like.
  I use kvm_assigned_msix_entry
  just because it can be easily reused.
 
 Not only PV, emulated devices too. OK let's try to figure out:
 
 What happens with userspace is, we inject interrupts either through
 ioctls or eventfds. Avoiding an exit to userspace on MSIX
 access is beneficial in exactly the same way.
 
 We could have an ioctl to pass in the table addresses, and mapping
 table to relevant GSIs. For example, add kvm_msix_mmio with table
 start/end, also specify which GSIs are covered.
 Maybe ask that userspace allocate all GSIs for a table consequitively?
 Or add kvm_irq_routing_msix which is same as msi but
 also has the mmio addresses? Bonus points for avoiding the need
 to scan all table on each write. For example, don't scan at all,
 instead just set a bit in KVM, and set irq entry on the next
 interrupt only: we have KVM_MAX_MSIX_PER_DEV as 256 so maybe a small 32
 byte bitmap would be enough for this, avoding a linear scan.
 
 When we pass in kvm_msix_mmio, kvm would start registering masked state.
 
+4.48 KVM_ASSIGN_GET_MSIX_ENTRY
+
+Capability: KVM_CAP_DEVICE_MSIX_MASK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_assigned_msix_entry (in and out)
+Returns: 0 on success, !0 on error
+
+struct kvm_assigned_msix_entry {
+   /* Assigned device's ID */
+   __u32 assigned_dev_id;
+   /* Ignored */
+   __u32 gsi;
+   /* The index of entry in the MSI-X table */
+   __u16 entry;
+   /* Querying flags and returning status */
+   __u16 flags;
+   /* Must be 0 */
+   __u16 padding[2];
+};
+
+This ioctl would allow userspace to get the status of one specific
MSI-X +entry. Currently we support mask bit status querying.
+

 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f3f86b2..8fd5121 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1926,6 +1926,7 @@ int kvm_dev_ioctl_check_extension(long ext)

case KVM_CAP_DEBUGREGS:
case KVM_CAP_X86_ROBUST_SINGLESTEP:

case KVM_CAP_XSAVE:
+   case KVM_CAP_DEVICE_MSIX_MASK:
r = 1;
break;

case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 919ae53..eafafb1 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {

 #endif
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58

+#ifdef __KVM_HAVE_MSIX
+#define KVM_CAP_DEVICE_MSIX_MASK 59
+#endif
   
   We seem to have these HAVE macros all over.
   Avi, what's the idea? Let's drop them?
  
  Um? This kind of marcos are used here to avoid vendor specific string in
  the header file.
 
 Am I the only one that dislikes the ifdefs we have all over this code then?
 Rest of 

Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-05 Thread Sheng Yang
On Friday 05 November 2010 16:51:33 Michael S. Tsirkin wrote:
 On Fri, Nov 05, 2010 at 03:16:23PM +0800, Sheng Yang wrote:
  This patch enable per-vector mask for assigned devices using MSI-X.
  
  This patch provided two new APIs: one is for guest to specific device's
  MSI-X table address in MMIO, the other is for userspace to get
  information about mask bit.
  
  All the mask bit operation are kept in kernel, in order to accelerate.
  Userspace shouldn't access the device MMIO directly for the information,
  instead it should uses provided API to do so.
  
  Signed-off-by: Sheng Yang sh...@linux.intel.com
 
 Is this v2? Or repost?  If v2 you would want to supply a changelog.

Just an short update after your comments. I supposed the in-reply-to work?

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-05 Thread Sheng Yang
On Friday 05 November 2010 18:53:50 Sheng Yang wrote:
 On Friday 05 November 2010 16:43:33 Michael S. Tsirkin wrote:
  On Fri, Nov 05, 2010 at 10:44:19AM +0800, Sheng Yang wrote:
 +};
 +
 +This ioctl would enable in-kernel MSI-X emulation, which would
 handle MSI-X +mask bit in the kernel.

What happens on repeated calls when it's already enabled?
How does one disable after it's enabled?
   
   Suppose this should only be called once. But again would update the
   MMIO base.
  
  So maybe add this all to documentation.
  
   It
   would be disabled along with device deassignment.
  
  So what are you going to do when guest enables and later disables MSIX?
  Disable assignment completely?
 
 This device goes with PCI resources allocation, rather than IRQ assignment.
 
   We enable it explicitly because
   not all devices have MSI-X capability.
   
 +

This is very specialized for assigned devices.  I would like an
interface not tied to assigned devices explicitly
(even if the implementation at the moment only works
for assigned devices, I can patch that in later). E.g. vhost-net
would benefit from such an extension too.  Maybe tie this to a
special GSI and this GSI can be an assigned device or not?
   
   You're talking about KVM_ASSIGN_GET_MSIX_ENTRY?
  
  Yes but also, and more importantly, about KVM_ASSIGN_REG_MSIX_MMIO
  and kvm_assigned_msix_mmio really.
  
   We can't tie it to GSI, because we
   should also cover entries without GSI. The entry number should be fine
   for all kinds of devices using MSI-X.
  
  I don't completely understand: entries without GSI
  never need to inject an interrupt. Why do we care
  about them? Let's pass such accesses to userspace.
  
  In any case, I think MSIX GSIs are inexpensive (we never search them
  all), so we could create GSIs for all entries if we wanted to.
 
 All mask bit accessing controlled by kernel, in order to keep consistent.
 
   I am not sure about what PV one would looks like.
   I use kvm_assigned_msix_entry
   just because it can be easily reused.
  
  Not only PV, emulated devices too. OK let's try to figure out:
  
  What happens with userspace is, we inject interrupts either through
  ioctls or eventfds. Avoiding an exit to userspace on MSIX
  access is beneficial in exactly the same way.
  
  We could have an ioctl to pass in the table addresses, and mapping
  table to relevant GSIs. For example, add kvm_msix_mmio with table
  start/end, also specify which GSIs are covered.
  Maybe ask that userspace allocate all GSIs for a table consequitively?
  Or add kvm_irq_routing_msix which is same as msi but
  also has the mmio addresses? Bonus points for avoiding the need
  to scan all table on each write. For example, don't scan at all,
  instead just set a bit in KVM, and set irq entry on the next
  interrupt only: we have KVM_MAX_MSIX_PER_DEV as 256 so maybe a small 32
  byte bitmap would be enough for this, avoding a linear scan.
  
  When we pass in kvm_msix_mmio, kvm would start registering masked state.
  
 +4.48 KVM_ASSIGN_GET_MSIX_ENTRY
 +
 +Capability: KVM_CAP_DEVICE_MSIX_MASK
 +Architectures: x86
 +Type: vm ioctl
 +Parameters: struct kvm_assigned_msix_entry (in and out)
 +Returns: 0 on success, !0 on error
 +
 +struct kvm_assigned_msix_entry {
 + /* Assigned device's ID */
 + __u32 assigned_dev_id;
 + /* Ignored */
 + __u32 gsi;
 + /* The index of entry in the MSI-X table */
 + __u16 entry;
 + /* Querying flags and returning status */
 + __u16 flags;
 + /* Must be 0 */
 + __u16 padding[2];
 +};
 +
 +This ioctl would allow userspace to get the status of one specific
 MSI-X +entry. Currently we support mask bit status querying.
 +
 
  5. The kvm_run structure
  
  Application code obtains a pointer to the kvm_run structure by
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index f3f86b2..8fd5121 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1926,6 +1926,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 
   case KVM_CAP_DEBUGREGS:
   case KVM_CAP_X86_ROBUST_SINGLESTEP:
 
   case KVM_CAP_XSAVE:
 + case KVM_CAP_DEVICE_MSIX_MASK:
   r = 1;
   break;
   
   case KVM_CAP_COALESCED_MMIO:
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 919ae53..eafafb1 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
 
  #endif
  #define KVM_CAP_PPC_GET_PVINFO 57
  #define KVM_CAP_PPC_IRQ_LEVEL 58
 
 +#ifdef __KVM_HAVE_MSIX
 +#define KVM_CAP_DEVICE_MSIX_MASK 59
 +#endif

We seem to have these HAVE macros all over.
Avi, what's the idea? Let's drop them?
   
   Um? This kind of marcos are 

Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-05 Thread Michael S. Tsirkin
On Fri, Nov 05, 2010 at 06:53:50PM +0800, Sheng Yang wrote:
 On Friday 05 November 2010 16:43:33 Michael S. Tsirkin wrote:
  On Fri, Nov 05, 2010 at 10:44:19AM +0800, Sheng Yang wrote:
 +};
 +
 +This ioctl would enable in-kernel MSI-X emulation, which would
 handle MSI-X +mask bit in the kernel.

What happens on repeated calls when it's already enabled?
How does one disable after it's enabled?
   
   Suppose this should only be called once. But again would update the MMIO
   base.
  
  So maybe add this all to documentation.
  
   It
   would be disabled along with device deassignment.
  
  So what are you going to do when guest enables and later disables MSIX?
  Disable assignment completely?
 
 This device goes with PCI resources allocation, rather than IRQ assignment.

I see. I guess we can live with it, but it seems tied to a specific
mode of use. Would be better to support enabling together with msix too, this
requires an ability to disable.

  
   We enable it explicitly because
   not all devices have MSI-X capability.
   
 +

This is very specialized for assigned devices.  I would like an
interface not tied to assigned devices explicitly
(even if the implementation at the moment only works
for assigned devices, I can patch that in later). E.g. vhost-net would
benefit from such an extension too.  Maybe tie this to a special
GSI and this GSI can be an assigned device or not?
   
   You're talking about KVM_ASSIGN_GET_MSIX_ENTRY?
  
  Yes but also, and more importantly, about KVM_ASSIGN_REG_MSIX_MMIO
  and kvm_assigned_msix_mmio really.
  
   We can't tie it to GSI, because we
   should also cover entries without GSI. The entry number should be fine
   for all kinds of devices using MSI-X.
  
  I don't completely understand: entries without GSI
  never need to inject an interrupt. Why do we care
  about them? Let's pass such accesses to userspace.
 
  In any case, I think MSIX GSIs are inexpensive (we never search them
  all), so we could create GSIs for all entries if we wanted to.
 
 All mask bit accessing controlled by kernel, in order to keep consistent.


Well I think passing whatever we can't handle to userspace makes
complete sense. We do this for address/data writes after all.
If we wanted to do it *all* in kernel we would handle
address/data there too. I think Avi suggested this at some point.

The main point is msix mmio BAR handling is completely unrelated
to the assigned device. Emulated devices have an msix BAR as well.
So tying it to an assigned devices is suboptimal, we'll need to
grow yet another interface for these.

  
   I am not sure about what PV one would looks like.
   I use kvm_assigned_msix_entry
   just because it can be easily reused.
  
  Not only PV, emulated devices too. OK let's try to figure out:
  
  What happens with userspace is, we inject interrupts either through
  ioctls or eventfds. Avoiding an exit to userspace on MSIX
  access is beneficial in exactly the same way.
  
  We could have an ioctl to pass in the table addresses, and mapping
  table to relevant GSIs. For example, add kvm_msix_mmio with table
  start/end, also specify which GSIs are covered.
  Maybe ask that userspace allocate all GSIs for a table consequitively?
  Or add kvm_irq_routing_msix which is same as msi but
  also has the mmio addresses? Bonus points for avoiding the need
  to scan all table on each write. For example, don't scan at all,
  instead just set a bit in KVM, and set irq entry on the next
  interrupt only: we have KVM_MAX_MSIX_PER_DEV as 256 so maybe a small 32
  byte bitmap would be enough for this, avoding a linear scan.
  
  When we pass in kvm_msix_mmio, kvm would start registering masked state.
  
 +4.48 KVM_ASSIGN_GET_MSIX_ENTRY
 +
 +Capability: KVM_CAP_DEVICE_MSIX_MASK
 +Architectures: x86
 +Type: vm ioctl
 +Parameters: struct kvm_assigned_msix_entry (in and out)
 +Returns: 0 on success, !0 on error
 +
 +struct kvm_assigned_msix_entry {
 + /* Assigned device's ID */
 + __u32 assigned_dev_id;
 + /* Ignored */
 + __u32 gsi;
 + /* The index of entry in the MSI-X table */
 + __u16 entry;
 + /* Querying flags and returning status */
 + __u16 flags;
 + /* Must be 0 */
 + __u16 padding[2];
 +};
 +
 +This ioctl would allow userspace to get the status of one specific
 MSI-X +entry. Currently we support mask bit status querying.
 +
 
  5. The kvm_run structure
  
  Application code obtains a pointer to the kvm_run structure by
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index f3f86b2..8fd5121 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1926,6 +1926,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 
   case KVM_CAP_DEBUGREGS:
   case KVM_CAP_X86_ROBUST_SINGLESTEP:
 
   case KVM_CAP_XSAVE:

Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-05 Thread Michael S. Tsirkin
On Fri, Nov 05, 2010 at 08:01:53PM +0800, Sheng Yang wrote:
  Go back to me if someone want to implement MSI device assignment on
  big-endian machine.
 
 Sorry, just realized it's very likely that I don't have an big endian machine 
 to 
 test it even at that time...
 
 I think it's really should be done by someone would use and test it.

I'm not asking you do to that. Just tag endian-ness explicitly to
make it easier for whoever will do it.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-05 Thread Marcelo Tosatti
On Fri, Nov 05, 2010 at 06:53:50PM +0800, Sheng Yang wrote:
 On Friday 05 November 2010 16:43:33 Michael S. Tsirkin wrote:
  On Fri, Nov 05, 2010 at 10:44:19AM +0800, Sheng Yang wrote:
 +};
 +
 +This ioctl would enable in-kernel MSI-X emulation, which would
 handle MSI-X +mask bit in the kernel.

What happens on repeated calls when it's already enabled?
How does one disable after it's enabled?
   
   Suppose this should only be called once. But again would update the MMIO
   base.
  
  So maybe add this all to documentation.
  
   It
   would be disabled along with device deassignment.
  
  So what are you going to do when guest enables and later disables MSIX?
  Disable assignment completely?
 
 This device goes with PCI resources allocation, rather than IRQ assignment.

What about hot plug, for example? I can't see how this can function
without unregistering regions.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-04 Thread Sheng Yang
This patch enable per-vector mask for assigned devices using MSI-X.

This patch provided two new APIs: one is for guest to specific device's MSI-X
table address in MMIO, the other is for userspace to get information about mask
bit.

All the mask bit operation are kept in kernel, in order to accelerate.
Userspace shouldn't access the device MMIO directly for the information,
instead it should uses provided API to do so.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 Documentation/kvm/api.txt |   46 +++
 arch/x86/kvm/x86.c|1 +
 include/linux/kvm.h   |   21 +++-
 include/linux/kvm_host.h  |3 +
 virt/kvm/assigned-dev.c   |  285 -
 5 files changed, 354 insertions(+), 2 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index b336266..69b993c 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1085,6 +1085,52 @@ of 4 instructions that make up a hypercall.
 If any additional field gets added to this structure later on, a bit for that
 additional piece of information will be set in the flags bitmap.
 
+4.47 KVM_ASSIGN_REG_MSIX_MMIO
+
+Capability: KVM_CAP_DEVICE_MSIX_MASK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_assigned_msix_mmio (in)
+Returns: 0 on success, !0 on error
+
+struct kvm_assigned_msix_mmio {
+   /* Assigned device's ID */
+   __u32 assigned_dev_id;
+   /* Must be 0 */
+   __u32 flags;
+   /* MSI-X table MMIO address */
+   __u64 base_addr;
+   /* Must be 0, reserved for future use */
+   __u64 reserved;
+};
+
+This ioctl would enable in-kernel MSI-X emulation, which would handle MSI-X
+mask bit in the kernel.
+
+4.48 KVM_ASSIGN_GET_MSIX_ENTRY
+
+Capability: KVM_CAP_DEVICE_MSIX_MASK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_assigned_msix_entry (in and out)
+Returns: 0 on success, !0 on error
+
+struct kvm_assigned_msix_entry {
+   /* Assigned device's ID */
+   __u32 assigned_dev_id;
+   /* Ignored */
+   __u32 gsi;
+   /* The index of entry in the MSI-X table */
+   __u16 entry;
+   /* Querying flags and returning status */
+   __u16 flags;
+   /* Must be 0 */
+   __u16 padding[2];
+};
+
+This ioctl would allow userspace to get the status of one specific MSI-X
+entry. Currently we support mask bit status querying.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f3f86b2..8fd5121 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1926,6 +1926,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_DEBUGREGS:
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
+   case KVM_CAP_DEVICE_MSIX_MASK:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 919ae53..eafafb1 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
 #endif
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
+#ifdef __KVM_HAVE_MSIX
+#define KVM_CAP_DEVICE_MSIX_MASK 59
+#endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -671,6 +674,10 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+#define KVM_ASSIGN_GET_MSIX_ENTRY _IOWR(KVMIO,  0x7d, \
+   struct kvm_assigned_msix_entry)
+#define KVM_ASSIGN_REG_MSIX_MMIO  _IOW(KVMIO,  0x7e, \
+   struct kvm_assigned_msix_mmio)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -787,11 +794,23 @@ struct kvm_assigned_msix_nr {
 };
 
 #define KVM_MAX_MSIX_PER_DEV   256
+
+#define KVM_MSIX_FLAG_MASK (1  0)
+#define KVM_MSIX_FLAG_QUERY_MASK   (1  15)
+
 struct kvm_assigned_msix_entry {
__u32 assigned_dev_id;
__u32 gsi;
__u16 entry; /* The index of entry in the MSI-X table */
-   __u16 padding[3];
+   __u16 flags;
+   __u16 padding[2];
+};
+
+struct kvm_assigned_msix_mmio {
+   __u32 assigned_dev_id;
+   __u32 flags;
+   __u64 base_addr;
+   __u64 reserved;
 };
 
 #endif /* __LINUX_KVM_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e2ecbac..5ac4db9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -464,6 +464,9 @@ struct kvm_assigned_dev_kernel {
struct pci_dev *dev;
struct kvm *kvm;
spinlock_t assigned_dev_lock;
+   DECLARE_BITMAP(msix_mask_bitmap, KVM_MAX_MSIX_PER_DEV);
+   gpa_t msix_mmio_base;
+  

Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-04 Thread Michael S. Tsirkin
On Thu, Nov 04, 2010 at 02:15:21PM +0800, Sheng Yang wrote:
 This patch enable per-vector mask for assigned devices using MSI-X.
 
 This patch provided two new APIs: one is for guest to specific device's MSI-X
 table address in MMIO, the other is for userspace to get information about 
 mask
 bit.
 
 All the mask bit operation are kept in kernel, in order to accelerate.
 Userspace shouldn't access the device MMIO directly for the information,
 instead it should uses provided API to do so.
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  Documentation/kvm/api.txt |   46 +++
  arch/x86/kvm/x86.c|1 +
  include/linux/kvm.h   |   21 +++-
  include/linux/kvm_host.h  |3 +
  virt/kvm/assigned-dev.c   |  285 
 -
  5 files changed, 354 insertions(+), 2 deletions(-)
 
 diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
 index b336266..69b993c 100644
 --- a/Documentation/kvm/api.txt
 +++ b/Documentation/kvm/api.txt
 @@ -1085,6 +1085,52 @@ of 4 instructions that make up a hypercall.
  If any additional field gets added to this structure later on, a bit for that
  additional piece of information will be set in the flags bitmap.
  
 +4.47 KVM_ASSIGN_REG_MSIX_MMIO
 +
 +Capability: KVM_CAP_DEVICE_MSIX_MASK
 +Architectures: x86
 +Type: vm ioctl
 +Parameters: struct kvm_assigned_msix_mmio (in)
 +Returns: 0 on success, !0 on error
 +
 +struct kvm_assigned_msix_mmio {
 + /* Assigned device's ID */
 + __u32 assigned_dev_id;
 + /* Must be 0 */
 + __u32 flags;
 + /* MSI-X table MMIO address */
 + __u64 base_addr;
 + /* Must be 0, reserved for future use */
 + __u64 reserved;

We also want length I think.

 +};
 +
 +This ioctl would enable in-kernel MSI-X emulation, which would handle MSI-X
 +mask bit in the kernel.

What happens on repeated calls when it's already enabled?
How does one disable after it's enabled?

 +

This is very specialized for assigned devices.  I would like an
interface not tied to assigned devices explicitly
(even if the implementation at the moment only works
for assigned devices, I can patch that in later). E.g. vhost-net would
benefit from such an extension too.  Maybe tie this to a special
GSI and this GSI can be an assigned device or not?


 +4.48 KVM_ASSIGN_GET_MSIX_ENTRY
 +
 +Capability: KVM_CAP_DEVICE_MSIX_MASK
 +Architectures: x86
 +Type: vm ioctl
 +Parameters: struct kvm_assigned_msix_entry (in and out)
 +Returns: 0 on success, !0 on error
 +
 +struct kvm_assigned_msix_entry {
 + /* Assigned device's ID */
 + __u32 assigned_dev_id;
 + /* Ignored */
 + __u32 gsi;
 + /* The index of entry in the MSI-X table */
 + __u16 entry;
 + /* Querying flags and returning status */
 + __u16 flags;
 + /* Must be 0 */
 + __u16 padding[2];
 +};
 +
 +This ioctl would allow userspace to get the status of one specific MSI-X
 +entry. Currently we support mask bit status querying.
 +
  5. The kvm_run structure
  
  Application code obtains a pointer to the kvm_run structure by
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index f3f86b2..8fd5121 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1926,6 +1926,7 @@ int kvm_dev_ioctl_check_extension(long ext)
   case KVM_CAP_DEBUGREGS:
   case KVM_CAP_X86_ROBUST_SINGLESTEP:
   case KVM_CAP_XSAVE:
 + case KVM_CAP_DEVICE_MSIX_MASK:
   r = 1;
   break;
   case KVM_CAP_COALESCED_MMIO:
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 919ae53..eafafb1 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
  #endif
  #define KVM_CAP_PPC_GET_PVINFO 57
  #define KVM_CAP_PPC_IRQ_LEVEL 58
 +#ifdef __KVM_HAVE_MSIX
 +#define KVM_CAP_DEVICE_MSIX_MASK 59
 +#endif
  

We seem to have these HAVE macros all over.
Avi, what's the idea? Let's drop them?

  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -671,6 +674,10 @@ struct kvm_clock_data {
  #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct 
 kvm_xen_hvm_config)
  #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
  #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
 +#define KVM_ASSIGN_GET_MSIX_ENTRY _IOWR(KVMIO,  0x7d, \
 + struct kvm_assigned_msix_entry)
 +#define KVM_ASSIGN_REG_MSIX_MMIO  _IOW(KVMIO,  0x7e, \
 + struct kvm_assigned_msix_mmio)
  /* Available with KVM_CAP_PIT_STATE2 */
  #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
  #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
 @@ -787,11 +794,23 @@ struct kvm_assigned_msix_nr {
  };
  
  #define KVM_MAX_MSIX_PER_DEV 256
 +
 +#define KVM_MSIX_FLAG_MASK   (1  0)
 +#define KVM_MSIX_FLAG_QUERY_MASK (1  15)
 +
  struct kvm_assigned_msix_entry {
   __u32 assigned_dev_id;
   __u32 gsi;
   

Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-04 Thread Sheng Yang
On Thursday 04 November 2010 18:43:22 Michael S. Tsirkin wrote:
 On Thu, Nov 04, 2010 at 02:15:21PM +0800, Sheng Yang wrote:
  This patch enable per-vector mask for assigned devices using MSI-X.
  
  This patch provided two new APIs: one is for guest to specific device's
  MSI-X table address in MMIO, the other is for userspace to get
  information about mask bit.
  
  All the mask bit operation are kept in kernel, in order to accelerate.
  Userspace shouldn't access the device MMIO directly for the information,
  instead it should uses provided API to do so.
  
  Signed-off-by: Sheng Yang sh...@linux.intel.com
  ---
  
   Documentation/kvm/api.txt |   46 +++
   arch/x86/kvm/x86.c|1 +
   include/linux/kvm.h   |   21 +++-
   include/linux/kvm_host.h  |3 +
   virt/kvm/assigned-dev.c   |  285
   - 5 files changed, 
354
   insertions(+), 2 deletions(-)
  
  diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
  index b336266..69b993c 100644
  --- a/Documentation/kvm/api.txt
  +++ b/Documentation/kvm/api.txt
  @@ -1085,6 +1085,52 @@ of 4 instructions that make up a hypercall.
  
   If any additional field gets added to this structure later on, a bit for
   that additional piece of information will be set in the flags bitmap.
  
  +4.47 KVM_ASSIGN_REG_MSIX_MMIO
  +
  +Capability: KVM_CAP_DEVICE_MSIX_MASK
  +Architectures: x86
  +Type: vm ioctl
  +Parameters: struct kvm_assigned_msix_mmio (in)
  +Returns: 0 on success, !0 on error
  +
  +struct kvm_assigned_msix_mmio {
  +   /* Assigned device's ID */
  +   __u32 assigned_dev_id;
  +   /* Must be 0 */
  +   __u32 flags;
  +   /* MSI-X table MMIO address */
  +   __u64 base_addr;
  +   /* Must be 0, reserved for future use */
  +   __u64 reserved;
 
 We also want length I think.

OK, everybody ask for it...
 
  +};
  +
  +This ioctl would enable in-kernel MSI-X emulation, which would handle
  MSI-X +mask bit in the kernel.
 
 What happens on repeated calls when it's already enabled?
 How does one disable after it's enabled?

Suppose this should only be called once. But again would update the MMIO base. 
It 
would be disabled along with device deassignment. We enable it explicitly 
because 
not all devices have MSI-X capability.

 
  +
 
 This is very specialized for assigned devices.  I would like an
 interface not tied to assigned devices explicitly
 (even if the implementation at the moment only works
 for assigned devices, I can patch that in later). E.g. vhost-net would
 benefit from such an extension too.  Maybe tie this to a special
 GSI and this GSI can be an assigned device or not?

You're talking about KVM_ASSIGN_GET_MSIX_ENTRY? We can't tie it to GSI, because 
we 
should also cover entries without GSI. The entry number should be fine for all 
kinds of devices using MSI-X.

I am not sure about what PV one would looks like. I use kvm_assigned_msix_entry 
just because it can be easily reused.
 
  +4.48 KVM_ASSIGN_GET_MSIX_ENTRY
  +
  +Capability: KVM_CAP_DEVICE_MSIX_MASK
  +Architectures: x86
  +Type: vm ioctl
  +Parameters: struct kvm_assigned_msix_entry (in and out)
  +Returns: 0 on success, !0 on error
  +
  +struct kvm_assigned_msix_entry {
  +   /* Assigned device's ID */
  +   __u32 assigned_dev_id;
  +   /* Ignored */
  +   __u32 gsi;
  +   /* The index of entry in the MSI-X table */
  +   __u16 entry;
  +   /* Querying flags and returning status */
  +   __u16 flags;
  +   /* Must be 0 */
  +   __u16 padding[2];
  +};
  +
  +This ioctl would allow userspace to get the status of one specific MSI-X
  +entry. Currently we support mask bit status querying.
  +
  
   5. The kvm_run structure
   
   Application code obtains a pointer to the kvm_run structure by
  
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index f3f86b2..8fd5121 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -1926,6 +1926,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  
  case KVM_CAP_DEBUGREGS:
  case KVM_CAP_X86_ROBUST_SINGLESTEP:
  
  case KVM_CAP_XSAVE:
  +   case KVM_CAP_DEVICE_MSIX_MASK:
  r = 1;
  break;
  
  case KVM_CAP_COALESCED_MMIO:
  diff --git a/include/linux/kvm.h b/include/linux/kvm.h
  index 919ae53..eafafb1 100644
  --- a/include/linux/kvm.h
  +++ b/include/linux/kvm.h
  @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
  
   #endif
   #define KVM_CAP_PPC_GET_PVINFO 57
   #define KVM_CAP_PPC_IRQ_LEVEL 58
  
  +#ifdef __KVM_HAVE_MSIX
  +#define KVM_CAP_DEVICE_MSIX_MASK 59
  +#endif
 
 We seem to have these HAVE macros all over.
 Avi, what's the idea? Let's drop them?

Um? This kind of marcos are used here to avoid vendor specific string in the 
header 
file. 
 
   #ifdef KVM_CAP_IRQ_ROUTING
  
  @@ -671,6 +674,10 @@ struct kvm_clock_data {
  
   #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct
   kvm_xen_hvm_config) #define KVM_SET_CLOCK _IOW(KVMIO, 
   0x7b, struct kvm_clock_data) #define 

Re: [PATCH 5/5] KVM: assigned dev: MSI-X mask support

2010-11-04 Thread Sheng Yang
On Friday 05 November 2010 10:44:19 Sheng Yang wrote:
 On Thursday 04 November 2010 18:43:22 Michael S. Tsirkin wrote:
  On Thu, Nov 04, 2010 at 02:15:21PM +0800, Sheng Yang wrote:
   This patch enable per-vector mask for assigned devices using MSI-X.
   
   This patch provided two new APIs: one is for guest to specific device's
   MSI-X table address in MMIO, the other is for userspace to get
   information about mask bit.
   
   All the mask bit operation are kept in kernel, in order to accelerate.
   Userspace shouldn't access the device MMIO directly for the
   information, instead it should uses provided API to do so.
   
   Signed-off-by: Sheng Yang sh...@linux.intel.com
   ---
   
Documentation/kvm/api.txt |   46 +++
arch/x86/kvm/x86.c|1 +
include/linux/kvm.h   |   21 +++-
include/linux/kvm_host.h  |3 +
virt/kvm/assigned-dev.c   |  285
- 5 files changed,
 
 354
 
insertions(+), 2 deletions(-)
   
   diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
   index b336266..69b993c 100644
   --- a/Documentation/kvm/api.txt
   +++ b/Documentation/kvm/api.txt
   @@ -1085,6 +1085,52 @@ of 4 instructions that make up a hypercall.
   
If any additional field gets added to this structure later on, a bit
for that additional piece of information will be set in the flags
bitmap.
   
   +4.47 KVM_ASSIGN_REG_MSIX_MMIO
   +
   +Capability: KVM_CAP_DEVICE_MSIX_MASK
   +Architectures: x86
   +Type: vm ioctl
   +Parameters: struct kvm_assigned_msix_mmio (in)
   +Returns: 0 on success, !0 on error
   +
   +struct kvm_assigned_msix_mmio {
   + /* Assigned device's ID */
   + __u32 assigned_dev_id;
   + /* Must be 0 */
   + __u32 flags;
   + /* MSI-X table MMIO address */
   + __u64 base_addr;
   + /* Must be 0, reserved for future use */
   + __u64 reserved;
  
  We also want length I think.
 
 OK, everybody ask for it...
 
   +};
   +
   +This ioctl would enable in-kernel MSI-X emulation, which would handle
   MSI-X +mask bit in the kernel.
  
  What happens on repeated calls when it's already enabled?
  How does one disable after it's enabled?
 
 Suppose this should only be called once. But again would update the MMIO
 base. It would be disabled along with device deassignment. We enable it
 explicitly because not all devices have MSI-X capability.
 
   +
  
  This is very specialized for assigned devices.  I would like an
  interface not tied to assigned devices explicitly
  (even if the implementation at the moment only works
  for assigned devices, I can patch that in later). E.g. vhost-net would
  benefit from such an extension too.  Maybe tie this to a special
  GSI and this GSI can be an assigned device or not?
 
 You're talking about KVM_ASSIGN_GET_MSIX_ENTRY? We can't tie it to GSI,
 because we should also cover entries without GSI. The entry number should
 be fine for all kinds of devices using MSI-X.
 
 I am not sure about what PV one would looks like. I use
 kvm_assigned_msix_entry just because it can be easily reused.
 
   +4.48 KVM_ASSIGN_GET_MSIX_ENTRY
   +
   +Capability: KVM_CAP_DEVICE_MSIX_MASK
   +Architectures: x86
   +Type: vm ioctl
   +Parameters: struct kvm_assigned_msix_entry (in and out)
   +Returns: 0 on success, !0 on error
   +
   +struct kvm_assigned_msix_entry {
   + /* Assigned device's ID */
   + __u32 assigned_dev_id;
   + /* Ignored */
   + __u32 gsi;
   + /* The index of entry in the MSI-X table */
   + __u16 entry;
   + /* Querying flags and returning status */
   + __u16 flags;
   + /* Must be 0 */
   + __u16 padding[2];
   +};
   +
   +This ioctl would allow userspace to get the status of one specific
   MSI-X +entry. Currently we support mask bit status querying.
   +
   
5. The kvm_run structure

Application code obtains a pointer to the kvm_run structure by
   
   diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
   index f3f86b2..8fd5121 100644
   --- a/arch/x86/kvm/x86.c
   +++ b/arch/x86/kvm/x86.c
   @@ -1926,6 +1926,7 @@ int kvm_dev_ioctl_check_extension(long ext)
   
 case KVM_CAP_DEBUGREGS:
 case KVM_CAP_X86_ROBUST_SINGLESTEP:
   
 case KVM_CAP_XSAVE:
   + case KVM_CAP_DEVICE_MSIX_MASK:
 r = 1;
 break;
 
 case KVM_CAP_COALESCED_MMIO:
   diff --git a/include/linux/kvm.h b/include/linux/kvm.h
   index 919ae53..eafafb1 100644
   --- a/include/linux/kvm.h
   +++ b/include/linux/kvm.h
   @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
   
#endif
#define KVM_CAP_PPC_GET_PVINFO 57
#define KVM_CAP_PPC_IRQ_LEVEL 58
   
   +#ifdef __KVM_HAVE_MSIX
   +#define KVM_CAP_DEVICE_MSIX_MASK 59
   +#endif
  
  We seem to have these HAVE macros all over.
  Avi, what's the idea? Let's drop them?
 
 Um? This kind of marcos are used here to avoid vendor specific string in
 the header file.
 
#ifdef KVM_CAP_IRQ_ROUTING
   
   @@ -671,6 +674,10 @@ struct kvm_clock_data {
   
#define KVM_XEN_HVM_CONFIG