Re: [PATCH 0/4 v12] MSI-X MMIO support for KVM

2011-03-03 Thread Sheng Yang
On Wed, Mar 02, 2011 at 03:51:20PM -0300, Marcelo Tosatti wrote:
 On Wed, Mar 02, 2011 at 11:23:14AM +0200, Michael S. Tsirkin wrote:
  On Wed, Mar 02, 2011 at 03:26:53PM +0800, Sheng Yang wrote:
   Change from v10:
   1. Update according to the comments of Michael.
   2. Use mmio_needed to exit to userspace according to Marcelo's comments.
  
  PCI-wise, I don't see anything to complain about.
  So ack the PCI bits.
  You guys decide on the rest.
  
  Several things I suggested previously that are not
  related to the PCI point of view:
  
  1. In msix_table_mmio_write, we fill in ext_data even if
 we are not going to exit to userspace in the end.
 It seems a trivial optimization to only do it if we exit.
  2. Instead of filling in ext_data, and then copying to vcpu,
 we could fill the data in vcpu directly.
  3. MSIX is not an error. So returning -ENOTSYNC to signal
 it is ugly. It would be cleaner to return negative
 value on error, and positive exit code to trigger exit.
  4. Patch 4/4 adds whitespace errors that git complains about.
  
  With changes 2 and 3, arch/x86/kvm/x86.c would not
  need to know about msix at all.
  
  As I said these are all suggestions unrelated to pci,
  and I don't know what Avi/Marcelo think about 1 and 2.
  3 and 4 are easy to fix though.
 
 All minor IMO (i prefer ENOTSYNC as its meaningful), whitespace
 can be fixed while applying.
 
 Avi, can you please ACK?
 
 Sheng, we will fix any further comments. Thanks!

Thanks! It's my pleasure to work with you guys. Hope we can work together in
the future. :)

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: Emulate MSI-X table in kernel

2011-03-01 Thread Sheng Yang
On Tue, Mar 01, 2011 at 02:20:02PM +0200, Michael S. Tsirkin wrote:
 On Tue, Mar 01, 2011 at 02:10:37PM +0800, Sheng Yang wrote:
  On Monday 28 February 2011 19:27:29 Michael S. Tsirkin wrote:
   On Mon, Feb 28, 2011 at 03:20:04PM +0800, Sheng Yang wrote:
Then we can support mask bit operation of assigned devices now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
   
   A general question: we implement mmio read and write
   operation here, but seem to do nothing
   about ordering. In particular, pci mmio reads are normally
   assumed to flush all writes out to the device
   and all device writes in to the CPU.
   
   How exactly does it work here?
  
  I don't understand your problem... We emulate all operation here, where is 
  ordering issue?
  
  And Michael, thanks for you detail comments, but could you give comments 
  all at 
  once? Now I've tried my best to address comments, but still feeling endless 
  comments are coming.
 
 Hmm, sorry about that. At least some of the comments are in
 the new code, so I could not have commented on it earlier ...
 E.g. the ext_data thing only appeared in the latest version
 or the one before that. In some cases such as the non-standard
 error codes used, I don't always udnerstand the logic, as
 the error handling is switched to standard conventions
 so comments appear as the logic becomes apparent. This
 applies to EFAULT handling below.
 
  And I would leave Intel this Friday, I want to get it done 
  before I leave.
 
 Permanently?
 I think it would be helpful, in that case, if you publish some
 testing data detailing how best to test the patch,
 which hardware etc.  We will then do my best to carry on, fix up
 the remaining nits if any, test and apply the patch.

Yes. I would relocate to California, and work for another company start next
month.

Since the patch is already v11 now, if there is not big logic issue, I really
want to get it done before I leave. So I would still try my best to address
all comments and get it checked in ASAP.
 
   
---

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/Makefile   |2 +-
 arch/x86/kvm/mmu.c  |2 +
 arch/x86/kvm/x86.c  |   40 -
 include/linux/kvm.h |   28 
 include/linux/kvm_host.h|   36 +
 virt/kvm/assigned-dev.c |   44 ++
 virt/kvm/kvm_main.c |   38 +-
 virt/kvm/msix_mmio.c|  301
 +++ virt/kvm/msix_mmio.h   
 |   25 
 10 files changed, 504 insertions(+), 13 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/include/asm/kvm_host.h
b/arch/x86/include/asm/kvm_host.h index aa75f21..4a390a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -635,6 +635,7 @@ enum emulation_result {

EMULATE_DONE,   /* no further processing */
EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
EMULATE_FAIL, /* can't emulate this instruction */

+   EMULATE_USERSPACE_EXIT, /* we need exit to userspace */

 };
 
 #define EMULTYPE_NO_DECODE (1  0)

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.

 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o 
ioapic.o \
 
coalesced_mmio.o irq_comm.o eventfd.o \

-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)

 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, 
iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/,
 async_pf.o)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9cafbb4..912dca4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3358,6 +3358,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, 
gva_t
cr2, u32 error_code,

case EMULATE_DO_MMIO:
++vcpu-stat.mmio_exits;
/* fall through */

+   case EMULATE_USERSPACE_EXIT:
+   /* fall through */

case EMULATE_FAIL:
return 0;

default:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21b84e2..87308eb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)

case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:

case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1

Re: [PATCH 3/4] KVM: Emulate MSI-X table in kernel

2011-03-01 Thread Sheng Yang
On Tue, Mar 01, 2011 at 02:20:02PM +0200, Michael S. Tsirkin wrote:
 On Tue, Mar 01, 2011 at 02:10:37PM +0800, Sheng Yang wrote:
  On Monday 28 February 2011 19:27:29 Michael S. Tsirkin wrote:
   On Mon, Feb 28, 2011 at 03:20:04PM +0800, Sheng Yang wrote:
@@ -1877,6 +1879,24 @@ static long kvm_vm_ioctl(struct file *filp,

mutex_unlock(kvm-lock);
break;
 
 #endif

+   case KVM_REGISTER_MSIX_MMIO: {
+   struct kvm_msix_mmio_user mmio_user;
+
+   r = -EFAULT;
+   if (copy_from_user(mmio_user, argp, sizeof mmio_user))
+   goto out;
+   r = kvm_vm_ioctl_register_msix_mmio(kvm, mmio_user);
+   break;
+   }
+   case KVM_UNREGISTER_MSIX_MMIO: {
+   struct kvm_msix_mmio_user mmio_user;
+
+   r = -EFAULT;
+   if (copy_from_user(mmio_user, argp, sizeof mmio_user))
+   goto out;
+   r = kvm_vm_ioctl_unregister_msix_mmio(kvm, mmio_user);
+   break;
+   }

default:
r = kvm_arch_vm_ioctl(filp, ioctl, arg);
if (r == -ENOTTY)

@@ -1988,6 +2008,12 @@ static int kvm_dev_ioctl_create_vm(void)

return r;

}
 
 #endif

+   r = kvm_register_msix_mmio_dev(kvm);
+   if (r  0) {
+   kvm_put_kvm(kvm);
+   return r;
+   }
+
   
   Need to fix error handling below as well?
   Better do it with chained gotos if yes.
  
  Let's make it another separate patch. 
 
 Well you add a new failure mode, you need to cleanup
 properly ...

Oh, I think kvm_put_kvm() should take care of this?

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: Emulate MSI-X table in kernel

2011-03-01 Thread Sheng Yang
On Wednesday 02 March 2011 04:18:58 Marcelo Tosatti wrote:
 On Fri, Feb 25, 2011 at 10:29:38AM +0200, Michael S. Tsirkin wrote:
  On Fri, Feb 25, 2011 at 02:28:02PM +0800, Sheng Yang wrote:
   On Thursday 24 February 2011 18:45:08 Michael S. Tsirkin wrote:
On Thu, Feb 24, 2011 at 05:51:04PM +0800, Sheng Yang wrote:
 Then we can support mask bit operation of assigned devices now.
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com

Doesn't look like all comments got addressed.
E.g. gpa_t entry_base is still there and in reality
you said it's a host virtual address so
should be void __user *;
   
   Would update it.
   
And ENOTSYNC meaning 'MSIX' is pretty hacky.
   
   I'd like to discuss it later. We may need some work on all MMIO
   handling side to make it more straightforward. But I don't want to
   bundle it with this one...
  
  It's not PCI related so I'll defer to Avi/Marcelo on this.
  Are you guys happy with the ENOTSYNC meaning 'MSIX'
 
 What would be a better alternative to ENOTSYNC? Can't see any.
 
  and userspace_exit_needed hacks in this code?
 
 I thought this was handled by mmio_needed in a previous patch?
 
 Since x86_emulate_instruction does
 
 } else if (vcpu-mmio_needed) {
 if (vcpu-mmio_is_write)
 vcpu-mmio_needed = 0;
 r = EMULATE_DO_MMIO;
 
 It should be fine. Sheng why did you introduce userspace_exit_needed?

Because strictly speaking it's not MMIO exit, I don't know if Avi would object 
the 
confusing concept here, so I introduced another type of exit.

But if it's OK, I still would use mmio_needed in the next version, which is 
also  
more simple.

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: Move struct kvm_io_device to kvm_host.h

2011-03-01 Thread Sheng Yang
Then it can be used by other struct in kvm_host.h

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |   23 +++
 virt/kvm/iodev.h |   25 +
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b5021db..7d313e0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -98,6 +98,29 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
+struct kvm_io_device;
+
+/**
+ * kvm_io_device_ops are called under kvm slots_lock.
+ * read and write handlers return 0 if the transaction has been handled,
+ * or non-zero to have it passed to the next device.
+ **/
+struct kvm_io_device_ops {
+   int (*read)(struct kvm_io_device *this,
+   gpa_t addr,
+   int len,
+   void *val);
+   int (*write)(struct kvm_io_device *this,
+gpa_t addr,
+int len,
+const void *val);
+   void (*destructor)(struct kvm_io_device *this);
+};
+
+struct kvm_io_device {
+   const struct kvm_io_device_ops *ops;
+};
+
 struct kvm_vcpu {
struct kvm *kvm;
 #ifdef CONFIG_PREEMPT_NOTIFIERS
diff --git a/virt/kvm/iodev.h b/virt/kvm/iodev.h
index 12fd3ca..d1f5651 100644
--- a/virt/kvm/iodev.h
+++ b/virt/kvm/iodev.h
@@ -17,32 +17,9 @@
 #define __KVM_IODEV_H__
 
 #include linux/kvm_types.h
+#include linux/kvm_host.h
 #include asm/errno.h
 
-struct kvm_io_device;
-
-/**
- * kvm_io_device_ops are called under kvm slots_lock.
- * read and write handlers return 0 if the transaction has been handled,
- * or non-zero to have it passed to the next device.
- **/
-struct kvm_io_device_ops {
-   int (*read)(struct kvm_io_device *this,
-   gpa_t addr,
-   int len,
-   void *val);
-   int (*write)(struct kvm_io_device *this,
-gpa_t addr,
-int len,
-const void *val);
-   void (*destructor)(struct kvm_io_device *this);
-};
-
-
-struct kvm_io_device {
-   const struct kvm_io_device_ops *ops;
-};
-
 static inline void kvm_iodevice_init(struct kvm_io_device *dev,
 const struct kvm_io_device_ops *ops)
 {
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: Emulate MSI-X table in kernel

2011-03-01 Thread Sheng Yang
Then we can support mask bit operation of assigned devices now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/Makefile|2 +-
 arch/x86/kvm/x86.c   |   31 -
 include/linux/kvm.h  |   28 +
 include/linux/kvm_host.h |   34 ++
 virt/kvm/assigned-dev.c  |   41 +++
 virt/kvm/kvm_main.c  |   38 ++-
 virt/kvm/msix_mmio.c |  291 ++
 virt/kvm/msix_mmio.h |   26 
 8 files changed, 479 insertions(+), 12 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
 
 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o)
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21b84e2..9bafaca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -3809,6 +3810,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
 {
gpa_t gpa;
struct kvm_io_ext_data ext_data;
+   int r;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3824,18 +3826,33 @@ static int emulator_write_emulated_onepage(unsigned 
long addr,
 
 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
+   r = vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data);
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
+   if (!r)
return X86EMUL_CONTINUE;
 
-   vcpu-mmio_needed = 1;
-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
-   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
-   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
-   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
-   memcpy(vcpu-run-mmio.data, val, bytes);
+   if (r == -ENOTSYNC) {
+   vcpu-mmio_needed = 1;
+   vcpu-mmio_is_write = 1;
+   vcpu-run-exit_reason = KVM_EXIT_MSIX_ROUTING_UPDATE;
+   vcpu-run-msix_routing.dev_id =
+   ext_data.msix_routing.dev_id;
+   vcpu-run-msix_routing.type =
+   ext_data.msix_routing.type;
+   vcpu-run-msix_routing.entry_idx =
+   ext_data.msix_routing.entry_idx;
+   vcpu-run-msix_routing.flags =
+   ext_data.msix_routing.flags;
+   } else  {
+   vcpu-mmio_needed = 1;
+   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
+   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
+   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
+   memcpy(vcpu-run-mmio.data, val, bytes);
+   }
 
return X86EMUL_CONTINUE;
 }
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..4393e4e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
+#define KVM_EXIT_MSIX_ROUTING_UPDATE 19
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -264,6 +265,13 @@ struct kvm_run {
struct {
__u64 gprs[32];
} osi;
+   /* KVM_EXIT_MSIX_ROUTING_UPDATE*/
+   struct {
+   __u32 dev_id;
+   __u16 type;
+   __u16 entry_idx;
+   __u64 flags;
+   } msix_routing;
/* Fix the size of the union. */
char padding[256];
};
@@ -541,6 +549,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
 #define KVM_CAP_ASYNC_PF 59
+#define KVM_CAP_MSIX_MMIO 60
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -672,6 +681,9 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_MSIX_MMIO */
+#define

[PATCH 2/4] KVM: Add kvm_io_ext_data to IO handler

2011-03-01 Thread Sheng Yang
Add a new parameter to IO writing handler, so that we can transfer information
from IO handler to caller.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/i8254.c  |6 --
 arch/x86/kvm/i8259.c  |3 ++-
 arch/x86/kvm/lapic.c  |3 ++-
 arch/x86/kvm/x86.c|   13 -
 include/linux/kvm_host.h  |9 +++--
 virt/kvm/coalesced_mmio.c |3 ++-
 virt/kvm/eventfd.c|2 +-
 virt/kvm/ioapic.c |2 +-
 virt/kvm/iodev.h  |6 --
 virt/kvm/kvm_main.c   |4 ++--
 10 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index efad723..bd8f0c5 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -439,7 +439,8 @@ static inline int pit_in_range(gpa_t addr)
 }
 
 static int pit_ioport_write(struct kvm_io_device *this,
-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_pit *pit = dev_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;
@@ -585,7 +586,8 @@ static int pit_ioport_read(struct kvm_io_device *this,
 }
 
 static int speaker_ioport_write(struct kvm_io_device *this,
-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_pit *pit = speaker_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 3cece05..96b1070 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -480,7 +480,8 @@ static inline struct kvm_pic *to_pic(struct kvm_io_device 
*dev)
 }
 
 static int picdev_write(struct kvm_io_device *this,
-gpa_t addr, int len, const void *val)
+gpa_t addr, int len, const void *val,
+struct kvm_io_ext_data *ext_data)
 {
struct kvm_pic *s = to_pic(this);
unsigned char data = *(unsigned char *)val;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 93cf9d0..f413e9c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -836,7 +836,8 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
 }
 
 static int apic_mmio_write(struct kvm_io_device *this,
-   gpa_t address, int len, const void *data)
+   gpa_t address, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_lapic *apic = to_lapic(this);
unsigned int offset = address - apic-base_address;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa708c9..21b84e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3571,13 +3571,14 @@ static void kvm_init_msr_list(void)
 }
 
 static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
-  const void *v)
+  const void *v, struct kvm_io_ext_data *ext_data)
 {
if (vcpu-arch.apic 
-   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v))
+   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v, ext_data))
return 0;
 
-   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, addr, len, v);
+   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS,
+   addr, len, v, ext_data);
 }
 
 static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
@@ -3807,6 +3808,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
   struct kvm_vcpu *vcpu)
 {
gpa_t gpa;
+   struct kvm_io_ext_data ext_data;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3825,7 +3827,7 @@ mmio:
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
return X86EMUL_CONTINUE;
 
vcpu-mmio_needed = 1;
@@ -3940,6 +3942,7 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
 {
/* TODO: String I/O for in kernel device */
int r;
+   struct kvm_io_ext_data ext_data;
 
if (vcpu-arch.pio.in)
r = kvm_io_bus_read(vcpu-kvm, KVM_PIO_BUS, vcpu-arch.pio.port,
@@ -3947,7 +3950,7 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
else
r = kvm_io_bus_write(vcpu-kvm, KVM_PIO_BUS,
 vcpu-arch.pio.port, vcpu-arch.pio.size,
-pd);
+pd, ext_data);
return r;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7d313e0..a32c53e 100644

[PATCH 0/4 v12] MSI-X MMIO support for KVM

2011-03-01 Thread Sheng Yang
Change from v10:
1. Update according to the comments of Michael.
2. Use mmio_needed to exit to userspace according to Marcelo's comments.

Sheng Yang (4):
  KVM: Move struct kvm_io_device to kvm_host.h
  KVM: Add kvm_io_ext_data to IO handler
  KVM: Emulate MSI-X table in kernel
  KVM: Add documents for MSI-X MMIO API

 Documentation/kvm/api.txt |   58 +
 arch/x86/kvm/Makefile |2 +-
 arch/x86/kvm/i8254.c  |6 +-
 arch/x86/kvm/i8259.c  |3 +-
 arch/x86/kvm/lapic.c  |3 +-
 arch/x86/kvm/x86.c|   42 +--
 include/linux/kvm.h   |   28 +
 include/linux/kvm_host.h  |   64 ++-
 virt/kvm/assigned-dev.c   |   41 +++
 virt/kvm/coalesced_mmio.c |3 +-
 virt/kvm/eventfd.c|2 +-
 virt/kvm/ioapic.c |2 +-
 virt/kvm/iodev.h  |   31 +
 virt/kvm/kvm_main.c   |   40 ++-
 virt/kvm/msix_mmio.c  |  291 +
 virt/kvm/msix_mmio.h  |   26 
 16 files changed, 591 insertions(+), 51 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] KVM: Add documents for MSI-X MMIO API

2011-03-01 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 Documentation/kvm/api.txt |   58 +
 1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index e1a9297..dd10c3b 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1263,6 +1263,53 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+4.54 KVM_REGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API indicates an MSI-X MMIO address of a guest device. Then all MMIO
+operation would be handled by kernel. When necessary(e.g. MSI data/address
+changed), KVM would exit to userspace using KVM_EXIT_MSIX_ROUTING_UPDATE to
+indicate the MMIO modification and require userspace to update IRQ routing
+table.
+
+NOTICE: Writing the MSI-X MMIO page after it was registered with this API may
+be dangerous for userspace program. The writing during VM running may result
+in synchronization issue therefore the assigned device can't work properly.
+The writing is allowed when VM is not running and can be used as save/restore
+mechanism.
+
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type; /* Device type and MMIO address type */
+   __u16 max_entries_nr;   /* Maximum entries supported */
+   __u64 base_addr;/* Guest physical address of MMIO */
+   __u64 base_va;  /* Host virtual address of MMIO mapping */
+   __u64 flags;/* Reserved for now */
+   __u64 reserved[4];
+};
+
+Current device type can be:
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+Current MMIO type can be:
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+
+4.55 KVM_UNREGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API would unregister the specific MSI-X MMIO, indicated by dev_id and
+type fields of struct kvm_msix_mmio_user.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
@@ -1445,6 +1492,17 @@ Userspace can now handle the hypercall and when it's 
done modify the gprs as
 necessary. Upon guest entry all guest GPRs will then be replaced by the values
 in this struct.
 
+   /* KVM_EXIT_MSIX_ROUTING_UPDATE*/
+   struct {
+   __u32 dev_id;
+   __u16 type;
+   __u16 entry_idx;
+   __u64 flags;
+   } msix_routing;
+
+KVM_EXIT_MSIX_ROUTING_UPDATE indicates one MSI-X entry has been modified, and
+userspace need to update the correlated routing table.
+
/* Fix the size of the union. */
char padding[256];
};
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: Emulate MSI-X table in kernel

2011-02-28 Thread Sheng Yang
On Monday 28 February 2011 19:27:29 Michael S. Tsirkin wrote:
 On Mon, Feb 28, 2011 at 03:20:04PM +0800, Sheng Yang wrote:
  Then we can support mask bit operation of assigned devices now.
  
  Signed-off-by: Sheng Yang sh...@linux.intel.com
 
 A general question: we implement mmio read and write
 operation here, but seem to do nothing
 about ordering. In particular, pci mmio reads are normally
 assumed to flush all writes out to the device
 and all device writes in to the CPU.
 
 How exactly does it work here?

I don't understand your problem... We emulate all operation here, where is 
ordering issue?

And Michael, thanks for you detail comments, but could you give comments all at 
once? Now I've tried my best to address comments, but still feeling endless 
comments are coming. And I would leave Intel this Friday, I want to get it done 
before I leave.
 
  ---
  
   arch/x86/include/asm/kvm_host.h |1 +
   arch/x86/kvm/Makefile   |2 +-
   arch/x86/kvm/mmu.c  |2 +
   arch/x86/kvm/x86.c  |   40 -
   include/linux/kvm.h |   28 
   include/linux/kvm_host.h|   36 +
   virt/kvm/assigned-dev.c |   44 ++
   virt/kvm/kvm_main.c |   38 +-
   virt/kvm/msix_mmio.c|  301
   +++ virt/kvm/msix_mmio.h   
   |   25 
   10 files changed, 504 insertions(+), 13 deletions(-)
   create mode 100644 virt/kvm/msix_mmio.c
   create mode 100644 virt/kvm/msix_mmio.h
  
  diff --git a/arch/x86/include/asm/kvm_host.h
  b/arch/x86/include/asm/kvm_host.h index aa75f21..4a390a4 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -635,6 +635,7 @@ enum emulation_result {
  
  EMULATE_DONE,   /* no further processing */
  EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
  EMULATE_FAIL, /* can't emulate this instruction */
  
  +   EMULATE_USERSPACE_EXIT, /* we need exit to userspace */
  
   };
   
   #define EMULTYPE_NO_DECODE (1  0)
  
  diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
  index f15501f..3a0d851 100644
  --- a/arch/x86/kvm/Makefile
  +++ b/arch/x86/kvm/Makefile
  @@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
  
   kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o 
  ioapic.o \
   
  coalesced_mmio.o irq_comm.o eventfd.o \
  
  -   assigned-dev.o)
  +   assigned-dev.o msix_mmio.o)
  
   kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
   kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/,
   async_pf.o)
  
  diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
  index 9cafbb4..912dca4 100644
  --- a/arch/x86/kvm/mmu.c
  +++ b/arch/x86/kvm/mmu.c
  @@ -3358,6 +3358,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t
  cr2, u32 error_code,
  
  case EMULATE_DO_MMIO:
  ++vcpu-stat.mmio_exits;
  /* fall through */
  
  +   case EMULATE_USERSPACE_EXIT:
  +   /* fall through */
  
  case EMULATE_FAIL:
  return 0;
  
  default:
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index 21b84e2..87308eb 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  
  case KVM_CAP_X86_ROBUST_SINGLESTEP:
  case KVM_CAP_XSAVE:
  
  case KVM_CAP_ASYNC_PF:
  +   case KVM_CAP_MSIX_MMIO:
  r = 1;
  break;
  
  case KVM_CAP_COALESCED_MMIO:
  @@ -3809,6 +3810,7 @@ static int emulator_write_emulated_onepage(unsigned
  long addr,
  
   {
   
  gpa_t gpa;
  struct kvm_io_ext_data ext_data;
  
  +   int r;
  
  gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
  
  @@ -3824,18 +3826,32 @@ static int
  emulator_write_emulated_onepage(unsigned long addr,
  
   mmio:
  trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
  
  +   r = vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data);
  
  /*
  
   * Is this MMIO handled locally?
   */
  
  -   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
  +   if (!r)
  
  return X86EMUL_CONTINUE;
  
  -   vcpu-mmio_needed = 1;
  -   vcpu-run-exit_reason = KVM_EXIT_MMIO;
  -   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
  -   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
  -   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
  -   memcpy(vcpu-run-mmio.data, val, bytes);
  +   if (r == -ENOTSYNC) {
 
 Can you replace -ENOTSYNC with KVM_EXIT_MSIX_ROUTING_UPDATE all over
 please?

How about let Avi/Marcelo decide?
 
  +   vcpu-userspace_exit_needed = 1;
  +   vcpu-run-exit_reason = KVM_EXIT_MSIX_ROUTING_UPDATE;
  +   vcpu-run-msix_routing.dev_id =
  +   ext_data.msix_routing.dev_id;
  +   vcpu-run-msix_routing.type

Re: [PATCH 2/4] KVM: Add kvm_io_ext_data to IO handler

2011-02-27 Thread Sheng Yang
On Friday 25 February 2011 16:12:30 Michael S. Tsirkin wrote:
 On Fri, Feb 25, 2011 at 11:23:30AM +0800, Sheng Yang wrote:
  On Thursday 24 February 2011 18:22:19 Michael S. Tsirkin wrote:
   On Thu, Feb 24, 2011 at 05:51:03PM +0800, Sheng Yang wrote:
Add a new parameter to IO writing handler, so that we can transfer
information from IO handler to caller.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---

 arch/x86/kvm/i8254.c  |6 --
 arch/x86/kvm/i8259.c  |3 ++-
 arch/x86/kvm/lapic.c  |3 ++-
 arch/x86/kvm/x86.c|   13 -
 include/linux/kvm_host.h  |   12 ++--
 virt/kvm/coalesced_mmio.c |3 ++-
 virt/kvm/eventfd.c|2 +-
 virt/kvm/ioapic.c |2 +-
 virt/kvm/iodev.h  |6 --
 virt/kvm/kvm_main.c   |4 ++--
 10 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index efad723..bd8f0c5 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -439,7 +439,8 @@ static inline int pit_in_range(gpa_t addr)

 }
 
 static int pit_ioport_write(struct kvm_io_device *this,

-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)

 {
 
struct kvm_pit *pit = dev_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;

@@ -585,7 +586,8 @@ static int pit_ioport_read(struct kvm_io_device
*this,

 }
 
 static int speaker_ioport_write(struct kvm_io_device *this,

-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)

 {
 
struct kvm_pit *pit = speaker_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 3cece05..96b1070 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -480,7 +480,8 @@ static inline struct kvm_pic *to_pic(struct
kvm_io_device *dev)

 }
 
 static int picdev_write(struct kvm_io_device *this,

-gpa_t addr, int len, const void *val)
+gpa_t addr, int len, const void *val,
+struct kvm_io_ext_data *ext_data)

 {
 
struct kvm_pic *s = to_pic(this);
unsigned char data = *(unsigned char *)val;

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 93cf9d0..f413e9c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -836,7 +836,8 @@ static int apic_reg_write(struct kvm_lapic *apic,
u32 reg, u32 val)

 }
 
 static int apic_mmio_write(struct kvm_io_device *this,

-   gpa_t address, int len, const void *data)
+   gpa_t address, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)

 {
 
struct kvm_lapic *apic = to_lapic(this);
unsigned int offset = address - apic-base_address;

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa708c9..21b84e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3571,13 +3571,14 @@ static void kvm_init_msr_list(void)

 }
 
 static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int
 len,

-  const void *v)
+  const void *v, struct kvm_io_ext_data 
*ext_data)

 {
 
if (vcpu-arch.apic 

-   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v))
+   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v,
ext_data))

return 0;

-   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, addr, len, v);
+   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS,
+   addr, len, v, ext_data);

 }
 
 static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int
 len, void *v)

@@ -3807,6 +3808,7 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

   struct kvm_vcpu *vcpu)
 
 {
 
gpa_t gpa;

+   struct kvm_io_ext_data ext_data;

gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);

@@ -3825,7 +3827,7 @@ mmio:
/*

 * Is this MMIO handled locally?
 */

-   if (!vcpu_mmio_write(vcpu, gpa

Re: [PATCH 3/4] KVM: Emulate MSI-X table in kernel

2011-02-27 Thread Sheng Yang
On Friday 25 February 2011 16:29:38 Michael S. Tsirkin wrote:
 On Fri, Feb 25, 2011 at 02:28:02PM +0800, Sheng Yang wrote:
  On Thursday 24 February 2011 18:45:08 Michael S. Tsirkin wrote:
   On Thu, Feb 24, 2011 at 05:51:04PM +0800, Sheng Yang wrote:
Then we can support mask bit operation of assigned devices now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
   
   Doesn't look like all comments got addressed.
   E.g. gpa_t entry_base is still there and in reality
   you said it's a host virtual address so
   should be void __user *;
  
  Would update it.
  
   And ENOTSYNC meaning 'MSIX' is pretty hacky.
  
  I'd like to discuss it later. We may need some work on all MMIO handling
  side to make it more straightforward. But I don't want to bundle it with
  this one...
 
 It's not PCI related so I'll defer to Avi/Marcelo on this.
 Are you guys happy with the ENOTSYNC meaning 'MSIX'
 and userspace_exit_needed hacks in this code?
 
---

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/Makefile   |2 +-
 arch/x86/kvm/mmu.c  |2 +
 arch/x86/kvm/x86.c  |   40 -
 include/linux/kvm.h |   28 
 include/linux/kvm_host.h|   34 +
 virt/kvm/assigned-dev.c |   44 ++
 virt/kvm/kvm_main.c |   38 +-
 virt/kvm/msix_mmio.c|  296
 +++ virt/kvm/msix_mmio.h
 
 |   25 
 
 10 files changed, 497 insertions(+), 13 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/include/asm/kvm_host.h
b/arch/x86/include/asm/kvm_host.h index aa75f21..4a390a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -635,6 +635,7 @@ enum emulation_result {

EMULATE_DONE,   /* no further processing */
EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
EMULATE_FAIL, /* can't emulate this instruction */

+   EMULATE_USERSPACE_EXIT, /* we need exit to userspace */

 };
 
 #define EMULTYPE_NO_DECODE (1  0)

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.

 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o 
ioapic.o 
\
 
coalesced_mmio.o irq_comm.o eventfd.o \

-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)

 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, 
iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/,
 async_pf.o)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9cafbb4..912dca4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3358,6 +3358,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu,
gva_t cr2, u32 error_code,

case EMULATE_DO_MMIO:
++vcpu-stat.mmio_exits;
/* fall through */

+   case EMULATE_USERSPACE_EXIT:
+   /* fall through */

case EMULATE_FAIL:
return 0;

default:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21b84e2..87308eb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)

case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:

case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;

case KVM_CAP_COALESCED_MMIO:
@@ -3809,6 +3810,7 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

 {
 
gpa_t gpa;
struct kvm_io_ext_data ext_data;

+   int r;

gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);

@@ -3824,18 +3826,32 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);

+   r = vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data);

/*

 * Is this MMIO handled locally?
 */

-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
+   if (!r)

return X86EMUL_CONTINUE;

-   vcpu-mmio_needed = 1;
-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
-   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
-   vcpu-run-mmio.len = vcpu-mmio_size = bytes

[PATCH 2/4] KVM: Add kvm_io_ext_data to IO handler

2011-02-27 Thread Sheng Yang
Add a new parameter to IO writing handler, so that we can transfer information
from IO handler to caller.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/i8254.c  |6 --
 arch/x86/kvm/i8259.c  |3 ++-
 arch/x86/kvm/lapic.c  |3 ++-
 arch/x86/kvm/x86.c|   13 -
 include/linux/kvm_host.h  |9 +++--
 virt/kvm/coalesced_mmio.c |3 ++-
 virt/kvm/eventfd.c|2 +-
 virt/kvm/ioapic.c |2 +-
 virt/kvm/iodev.h  |6 --
 virt/kvm/kvm_main.c   |4 ++--
 10 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index efad723..bd8f0c5 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -439,7 +439,8 @@ static inline int pit_in_range(gpa_t addr)
 }
 
 static int pit_ioport_write(struct kvm_io_device *this,
-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_pit *pit = dev_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;
@@ -585,7 +586,8 @@ static int pit_ioport_read(struct kvm_io_device *this,
 }
 
 static int speaker_ioport_write(struct kvm_io_device *this,
-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_pit *pit = speaker_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 3cece05..96b1070 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -480,7 +480,8 @@ static inline struct kvm_pic *to_pic(struct kvm_io_device 
*dev)
 }
 
 static int picdev_write(struct kvm_io_device *this,
-gpa_t addr, int len, const void *val)
+gpa_t addr, int len, const void *val,
+struct kvm_io_ext_data *ext_data)
 {
struct kvm_pic *s = to_pic(this);
unsigned char data = *(unsigned char *)val;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 93cf9d0..f413e9c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -836,7 +836,8 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
 }
 
 static int apic_mmio_write(struct kvm_io_device *this,
-   gpa_t address, int len, const void *data)
+   gpa_t address, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_lapic *apic = to_lapic(this);
unsigned int offset = address - apic-base_address;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa708c9..21b84e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3571,13 +3571,14 @@ static void kvm_init_msr_list(void)
 }
 
 static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
-  const void *v)
+  const void *v, struct kvm_io_ext_data *ext_data)
 {
if (vcpu-arch.apic 
-   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v))
+   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v, ext_data))
return 0;
 
-   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, addr, len, v);
+   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS,
+   addr, len, v, ext_data);
 }
 
 static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
@@ -3807,6 +3808,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
   struct kvm_vcpu *vcpu)
 {
gpa_t gpa;
+   struct kvm_io_ext_data ext_data;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3825,7 +3827,7 @@ mmio:
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
return X86EMUL_CONTINUE;
 
vcpu-mmio_needed = 1;
@@ -3940,6 +3942,7 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
 {
/* TODO: String I/O for in kernel device */
int r;
+   struct kvm_io_ext_data ext_data;
 
if (vcpu-arch.pio.in)
r = kvm_io_bus_read(vcpu-kvm, KVM_PIO_BUS, vcpu-arch.pio.port,
@@ -3947,7 +3950,7 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
else
r = kvm_io_bus_write(vcpu-kvm, KVM_PIO_BUS,
 vcpu-arch.pio.port, vcpu-arch.pio.size,
-pd);
+pd, ext_data);
return r;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7d313e0..a32c53e 100644

[PATCH 0/4 v11] MSI-X MMIO support for KVM

2011-02-27 Thread Sheng Yang
Change from v9:
Update according to the comments of Alex and Michael.

Notice this patchset still based on 2.6.37 due to a block bug on assigned
device in the upstream now.

Sheng Yang (4):
  KVM: Move struct kvm_io_device to kvm_host.h
  KVM: Add kvm_io_ext_data to IO handler
  KVM: Emulate MSI-X table in kernel
  KVM: Add documents for MSI-X MMIO API

 Documentation/kvm/api.txt   |   58 
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/Makefile   |2 +-
 arch/x86/kvm/i8254.c|6 +-
 arch/x86/kvm/i8259.c|3 +-
 arch/x86/kvm/lapic.c|3 +-
 arch/x86/kvm/mmu.c  |2 +
 arch/x86/kvm/x86.c  |   51 +--
 include/linux/kvm.h |   28 
 include/linux/kvm_host.h|   66 +-
 virt/kvm/assigned-dev.c |   44 ++
 virt/kvm/coalesced_mmio.c   |3 +-
 virt/kvm/eventfd.c  |2 +-
 virt/kvm/ioapic.c   |2 +-
 virt/kvm/iodev.h|   31 +
 virt/kvm/kvm_main.c |   40 +-
 virt/kvm/msix_mmio.c|  301 +++
 virt/kvm/msix_mmio.h|   25 
 18 files changed, 616 insertions(+), 52 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: Emulate MSI-X table in kernel

2011-02-27 Thread Sheng Yang
Then we can support mask bit operation of assigned devices now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/Makefile   |2 +-
 arch/x86/kvm/mmu.c  |2 +
 arch/x86/kvm/x86.c  |   40 -
 include/linux/kvm.h |   28 
 include/linux/kvm_host.h|   36 +
 virt/kvm/assigned-dev.c |   44 ++
 virt/kvm/kvm_main.c |   38 +-
 virt/kvm/msix_mmio.c|  301 +++
 virt/kvm/msix_mmio.h|   25 
 10 files changed, 504 insertions(+), 13 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa75f21..4a390a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -635,6 +635,7 @@ enum emulation_result {
EMULATE_DONE,   /* no further processing */
EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
EMULATE_FAIL, /* can't emulate this instruction */
+   EMULATE_USERSPACE_EXIT, /* we need exit to userspace */
 };
 
 #define EMULTYPE_NO_DECODE (1  0)
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
 
 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o)
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9cafbb4..912dca4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3358,6 +3358,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, 
u32 error_code,
case EMULATE_DO_MMIO:
++vcpu-stat.mmio_exits;
/* fall through */
+   case EMULATE_USERSPACE_EXIT:
+   /* fall through */
case EMULATE_FAIL:
return 0;
default:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21b84e2..87308eb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -3809,6 +3810,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
 {
gpa_t gpa;
struct kvm_io_ext_data ext_data;
+   int r;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3824,18 +3826,32 @@ static int emulator_write_emulated_onepage(unsigned 
long addr,
 
 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
+   r = vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data);
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
+   if (!r)
return X86EMUL_CONTINUE;
 
-   vcpu-mmio_needed = 1;
-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
-   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
-   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
-   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
-   memcpy(vcpu-run-mmio.data, val, bytes);
+   if (r == -ENOTSYNC) {
+   vcpu-userspace_exit_needed = 1;
+   vcpu-run-exit_reason = KVM_EXIT_MSIX_ROUTING_UPDATE;
+   vcpu-run-msix_routing.dev_id =
+   ext_data.msix_routing.dev_id;
+   vcpu-run-msix_routing.type =
+   ext_data.msix_routing.type;
+   vcpu-run-msix_routing.entry_idx =
+   ext_data.msix_routing.entry_idx;
+   vcpu-run-msix_routing.flags =
+   ext_data.msix_routing.flags;
+   } else  {
+   vcpu-mmio_needed = 1;
+   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
+   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
+   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
+   memcpy(vcpu-run-mmio.data, val, bytes);
+   }
 
return X86EMUL_CONTINUE;
 }
@@ -4469,6 +4485,8 @@ done:
r = EMULATE_DO_MMIO;
} else if (r == EMULATION_RESTART)
goto restart;
+   else if (vcpu-userspace_exit_needed)
+   r = EMULATE_USERSPACE_EXIT;
else
r = EMULATE_DONE;
 
@@ -5397,12 +5415,18 @@ int

[PATCH 4/4] KVM: Add documents for MSI-X MMIO API

2011-02-27 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 Documentation/kvm/api.txt |   58 +
 1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index e1a9297..dd10c3b 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1263,6 +1263,53 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+4.54 KVM_REGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API indicates an MSI-X MMIO address of a guest device. Then all MMIO
+operation would be handled by kernel. When necessary(e.g. MSI data/address
+changed), KVM would exit to userspace using KVM_EXIT_MSIX_ROUTING_UPDATE to
+indicate the MMIO modification and require userspace to update IRQ routing
+table.
+
+NOTICE: Writing the MSI-X MMIO page after it was registered with this API may
+be dangerous for userspace program. The writing during VM running may result
+in synchronization issue therefore the assigned device can't work properly.
+The writing is allowed when VM is not running and can be used as save/restore
+mechanism.
+
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type; /* Device type and MMIO address type */
+   __u16 max_entries_nr;   /* Maximum entries supported */
+   __u64 base_addr;/* Guest physical address of MMIO */
+   __u64 base_va;  /* Host virtual address of MMIO mapping */
+   __u64 flags;/* Reserved for now */
+   __u64 reserved[4];
+};
+
+Current device type can be:
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+Current MMIO type can be:
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+
+4.55 KVM_UNREGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API would unregister the specific MSI-X MMIO, indicated by dev_id and
+type fields of struct kvm_msix_mmio_user.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
@@ -1445,6 +1492,17 @@ Userspace can now handle the hypercall and when it's 
done modify the gprs as
 necessary. Upon guest entry all guest GPRs will then be replaced by the values
 in this struct.
 
+   /* KVM_EXIT_MSIX_ROUTING_UPDATE*/
+   struct {
+   __u32 dev_id;
+   __u16 type;
+   __u16 entry_idx;
+   __u64 flags;
+   } msix_routing;
+
+KVM_EXIT_MSIX_ROUTING_UPDATE indicates one MSI-X entry has been modified, and
+userspace need to update the correlated routing table.
+
/* Fix the size of the union. */
char padding[256];
};
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: Move struct kvm_io_device to kvm_host.h

2011-02-27 Thread Sheng Yang
Then it can be used by other struct in kvm_host.h

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |   23 +++
 virt/kvm/iodev.h |   25 +
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b5021db..7d313e0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -98,6 +98,29 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
+struct kvm_io_device;
+
+/**
+ * kvm_io_device_ops are called under kvm slots_lock.
+ * read and write handlers return 0 if the transaction has been handled,
+ * or non-zero to have it passed to the next device.
+ **/
+struct kvm_io_device_ops {
+   int (*read)(struct kvm_io_device *this,
+   gpa_t addr,
+   int len,
+   void *val);
+   int (*write)(struct kvm_io_device *this,
+gpa_t addr,
+int len,
+const void *val);
+   void (*destructor)(struct kvm_io_device *this);
+};
+
+struct kvm_io_device {
+   const struct kvm_io_device_ops *ops;
+};
+
 struct kvm_vcpu {
struct kvm *kvm;
 #ifdef CONFIG_PREEMPT_NOTIFIERS
diff --git a/virt/kvm/iodev.h b/virt/kvm/iodev.h
index 12fd3ca..d1f5651 100644
--- a/virt/kvm/iodev.h
+++ b/virt/kvm/iodev.h
@@ -17,32 +17,9 @@
 #define __KVM_IODEV_H__
 
 #include linux/kvm_types.h
+#include linux/kvm_host.h
 #include asm/errno.h
 
-struct kvm_io_device;
-
-/**
- * kvm_io_device_ops are called under kvm slots_lock.
- * read and write handlers return 0 if the transaction has been handled,
- * or non-zero to have it passed to the next device.
- **/
-struct kvm_io_device_ops {
-   int (*read)(struct kvm_io_device *this,
-   gpa_t addr,
-   int len,
-   void *val);
-   int (*write)(struct kvm_io_device *this,
-gpa_t addr,
-int len,
-const void *val);
-   void (*destructor)(struct kvm_io_device *this);
-};
-
-
-struct kvm_io_device {
-   const struct kvm_io_device_ops *ops;
-};
-
 static inline void kvm_iodevice_init(struct kvm_io_device *dev,
 const struct kvm_io_device_ops *ops)
 {
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-02-24 Thread Sheng Yang
On Wednesday 23 February 2011 16:45:37 Michael S. Tsirkin wrote:
 On Wed, Feb 23, 2011 at 02:59:04PM +0800, Sheng Yang wrote:
  On Wednesday 23 February 2011 08:19:21 Alex Williamson wrote:
   On Sun, 2011-01-30 at 13:11 +0800, Sheng Yang wrote:
Then we can support mask bit operation of assigned devices now.
   
   Looks pretty good overall.  A few comments below.  It seems like we
   should be able to hook this into vfio with a small stub in kvm.  We
   just need to be able to communicate disabling and enabling of
   individual msix vectors.  For brute force, we could do this with a lot
   of eventfds, triggered by kvm and consumed by vfio, two per MSI-X
   vector.  Not sure if there's something smaller that could do it. 
   Thanks,
  
  Alex, thanks for your comments. See my comments below:
   Alex
   
Signed-off-by: Sheng Yang sh...@linux.intel.com
---

 arch/x86/kvm/Makefile|2 +-
 arch/x86/kvm/x86.c   |8 +-
 include/linux/kvm.h  |   21 
 include/linux/kvm_host.h |   25 
 virt/kvm/assigned-dev.c  |   44 +++
 virt/kvm/kvm_main.c  |   38 ++-
 virt/kvm/msix_mmio.c |  286
 ++ virt/kvm/msix_mmio.h
 
 |   25 
 
 8 files changed, 442 insertions(+), 7 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.

 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o 
ioapic.o 
\
 
coalesced_mmio.o irq_comm.o eventfd.o \

-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)

 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, 
iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/,
 async_pf.o)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa708c9..89bf12c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)

case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:

case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;

case KVM_CAP_COALESCED_MMIO:
@@ -3807,6 +3808,7 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

   struct kvm_vcpu *vcpu)
 
 {
 
gpa_t gpa;

+   int r;

gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);

@@ -3822,14 +3824,16 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);

+   r = vcpu_mmio_write(vcpu, gpa, bytes, val);

/*

 * Is this MMIO handled locally?
 */

-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!r)

return X86EMUL_CONTINUE;

vcpu-mmio_needed = 1;

-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
+   KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
 
 This use of -ENOTSYNC is IMO confusing.
 How about we make vcpu_mmio_write return the positive exit reason?
 Negative value will mean an error.

Make sense. I would update it.
 
vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
vcpu-run-mmio.len = vcpu-mmio_size = bytes;
vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..ad9df4b 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {

 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18

+#define KVM_EXIT_MSIX_ROUTING_UPDATE 19

 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1

@@ -541,6 +542,7 @@ struct kvm_ppc_pvinfo {

 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
 #define KVM_CAP_ASYNC_PF 59

+#define KVM_CAP_MSIX_MMIO 60

 #ifdef KVM_CAP_IRQ_ROUTING

@@ -672,6 +674,9 @@ struct kvm_clock_data {

 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct
 kvm_xen_hvm_config) #define KVM_SET_CLOCK _IOW(KVMIO,
 0x7b, struct kvm_clock_data) #define KVM_GET_CLOCK
 _IOR(KVMIO,  0x7c, struct

Re: [PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-02-24 Thread Sheng Yang
On Wednesday 23 February 2011 16:45:37 Michael S. Tsirkin wrote:
 On Wed, Feb 23, 2011 at 02:59:04PM +0800, Sheng Yang wrote:
  On Wednesday 23 February 2011 08:19:21 Alex Williamson wrote:
   On Sun, 2011-01-30 at 13:11 +0800, Sheng Yang wrote:
Then we can support mask bit operation of assigned devices now.
   
   Looks pretty good overall.  A few comments below.  It seems like we
   should be able to hook this into vfio with a small stub in kvm.  We
   just need to be able to communicate disabling and enabling of
   individual msix vectors.  For brute force, we could do this with a lot
   of eventfds, triggered by kvm and consumed by vfio, two per MSI-X
   vector.  Not sure if there's something smaller that could do it. 
   Thanks,
  
  Alex, thanks for your comments. See my comments below:
   Alex
   
Signed-off-by: Sheng Yang sh...@linux.intel.com
---

 arch/x86/kvm/Makefile|2 +-
 arch/x86/kvm/x86.c   |8 +-
 include/linux/kvm.h  |   21 
 include/linux/kvm_host.h |   25 
 virt/kvm/assigned-dev.c  |   44 +++
 virt/kvm/kvm_main.c  |   38 ++-
 virt/kvm/msix_mmio.c |  286
 ++ virt/kvm/msix_mmio.h
 
 |   25 
 
 8 files changed, 442 insertions(+), 7 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.

 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o 
ioapic.o 
\
 
coalesced_mmio.o irq_comm.o eventfd.o \

-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)

 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, 
iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/,
 async_pf.o)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa708c9..89bf12c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)

case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:

case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;

case KVM_CAP_COALESCED_MMIO:
@@ -3807,6 +3808,7 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

   struct kvm_vcpu *vcpu)
 
 {
 
gpa_t gpa;

+   int r;

gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);

@@ -3822,14 +3824,16 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);

+   r = vcpu_mmio_write(vcpu, gpa, bytes, val);

/*

 * Is this MMIO handled locally?
 */

-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!r)

return X86EMUL_CONTINUE;

vcpu-mmio_needed = 1;

-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
+   KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
 
 This use of -ENOTSYNC is IMO confusing.
 How about we make vcpu_mmio_write return the positive exit reason?
 Negative value will mean an error.

In fact currently nagative value means something more need to be done, the same 
as 
MMIO exit. Now I think we can keep it, or update them all later.

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4 v10] MSI-X MMIO support for KVM

2011-02-24 Thread Sheng Yang
Change from v8:
1. Fix one MSI-X routing update exit bug.
2. Update according to the comments of Alex and Michael.

Notice this patchset still based on 2.6.37 due to a block bug on assigned
device in the upstream now.

Sheng Yang (4):
  KVM: Move struct kvm_io_device to kvm_host.h
  KVM: Add kvm_io_ext_data to IO handler
  KVM: Emulate MSI-X table in kernel
  KVM: Add documents for MSI-X MMIO API

 Documentation/kvm/api.txt   |   58 
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/Makefile   |2 +-
 arch/x86/kvm/i8254.c|6 +-
 arch/x86/kvm/i8259.c|3 +-
 arch/x86/kvm/lapic.c|3 +-
 arch/x86/kvm/mmu.c  |2 +
 arch/x86/kvm/x86.c  |   51 +--
 include/linux/kvm.h |   28 
 include/linux/kvm_host.h|   67 +-
 virt/kvm/assigned-dev.c |   44 ++
 virt/kvm/coalesced_mmio.c   |3 +-
 virt/kvm/eventfd.c  |2 +-
 virt/kvm/ioapic.c   |2 +-
 virt/kvm/iodev.h|   31 +
 virt/kvm/kvm_main.c |   40 +-
 virt/kvm/msix_mmio.c|  296 +++
 virt/kvm/msix_mmio.h|   25 
 18 files changed, 612 insertions(+), 52 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: Add kvm_io_ext_data to IO handler

2011-02-24 Thread Sheng Yang
Add a new parameter to IO writing handler, so that we can transfer information
from IO handler to caller.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/i8254.c  |6 --
 arch/x86/kvm/i8259.c  |3 ++-
 arch/x86/kvm/lapic.c  |3 ++-
 arch/x86/kvm/x86.c|   13 -
 include/linux/kvm_host.h  |   12 ++--
 virt/kvm/coalesced_mmio.c |3 ++-
 virt/kvm/eventfd.c|2 +-
 virt/kvm/ioapic.c |2 +-
 virt/kvm/iodev.h  |6 --
 virt/kvm/kvm_main.c   |4 ++--
 10 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index efad723..bd8f0c5 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -439,7 +439,8 @@ static inline int pit_in_range(gpa_t addr)
 }
 
 static int pit_ioport_write(struct kvm_io_device *this,
-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_pit *pit = dev_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;
@@ -585,7 +586,8 @@ static int pit_ioport_read(struct kvm_io_device *this,
 }
 
 static int speaker_ioport_write(struct kvm_io_device *this,
-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_pit *pit = speaker_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 3cece05..96b1070 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -480,7 +480,8 @@ static inline struct kvm_pic *to_pic(struct kvm_io_device 
*dev)
 }
 
 static int picdev_write(struct kvm_io_device *this,
-gpa_t addr, int len, const void *val)
+gpa_t addr, int len, const void *val,
+struct kvm_io_ext_data *ext_data)
 {
struct kvm_pic *s = to_pic(this);
unsigned char data = *(unsigned char *)val;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 93cf9d0..f413e9c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -836,7 +836,8 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
 }
 
 static int apic_mmio_write(struct kvm_io_device *this,
-   gpa_t address, int len, const void *data)
+   gpa_t address, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_lapic *apic = to_lapic(this);
unsigned int offset = address - apic-base_address;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa708c9..21b84e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3571,13 +3571,14 @@ static void kvm_init_msr_list(void)
 }
 
 static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
-  const void *v)
+  const void *v, struct kvm_io_ext_data *ext_data)
 {
if (vcpu-arch.apic 
-   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v))
+   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v, ext_data))
return 0;
 
-   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, addr, len, v);
+   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS,
+   addr, len, v, ext_data);
 }
 
 static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
@@ -3807,6 +3808,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
   struct kvm_vcpu *vcpu)
 {
gpa_t gpa;
+   struct kvm_io_ext_data ext_data;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3825,7 +3827,7 @@ mmio:
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
return X86EMUL_CONTINUE;
 
vcpu-mmio_needed = 1;
@@ -3940,6 +3942,7 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
 {
/* TODO: String I/O for in kernel device */
int r;
+   struct kvm_io_ext_data ext_data;
 
if (vcpu-arch.pio.in)
r = kvm_io_bus_read(vcpu-kvm, KVM_PIO_BUS, vcpu-arch.pio.port,
@@ -3947,7 +3950,7 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
else
r = kvm_io_bus_write(vcpu-kvm, KVM_PIO_BUS,
 vcpu-arch.pio.port, vcpu-arch.pio.size,
-pd);
+pd, ext_data);
return r;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7d313e0..6bb211d 100644

[PATCH 3/4] KVM: Emulate MSI-X table in kernel

2011-02-24 Thread Sheng Yang
Then we can support mask bit operation of assigned devices now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/Makefile   |2 +-
 arch/x86/kvm/mmu.c  |2 +
 arch/x86/kvm/x86.c  |   40 -
 include/linux/kvm.h |   28 
 include/linux/kvm_host.h|   34 +
 virt/kvm/assigned-dev.c |   44 ++
 virt/kvm/kvm_main.c |   38 +-
 virt/kvm/msix_mmio.c|  296 +++
 virt/kvm/msix_mmio.h|   25 
 10 files changed, 497 insertions(+), 13 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa75f21..4a390a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -635,6 +635,7 @@ enum emulation_result {
EMULATE_DONE,   /* no further processing */
EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
EMULATE_FAIL, /* can't emulate this instruction */
+   EMULATE_USERSPACE_EXIT, /* we need exit to userspace */
 };
 
 #define EMULTYPE_NO_DECODE (1  0)
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
 
 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o)
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9cafbb4..912dca4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3358,6 +3358,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, 
u32 error_code,
case EMULATE_DO_MMIO:
++vcpu-stat.mmio_exits;
/* fall through */
+   case EMULATE_USERSPACE_EXIT:
+   /* fall through */
case EMULATE_FAIL:
return 0;
default:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21b84e2..87308eb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -3809,6 +3810,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
 {
gpa_t gpa;
struct kvm_io_ext_data ext_data;
+   int r;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3824,18 +3826,32 @@ static int emulator_write_emulated_onepage(unsigned 
long addr,
 
 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
+   r = vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data);
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
+   if (!r)
return X86EMUL_CONTINUE;
 
-   vcpu-mmio_needed = 1;
-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
-   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
-   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
-   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
-   memcpy(vcpu-run-mmio.data, val, bytes);
+   if (r == -ENOTSYNC) {
+   vcpu-userspace_exit_needed = 1;
+   vcpu-run-exit_reason = KVM_EXIT_MSIX_ROUTING_UPDATE;
+   vcpu-run-msix_routing.dev_id =
+   ext_data.msix_routing.dev_id;
+   vcpu-run-msix_routing.type =
+   ext_data.msix_routing.type;
+   vcpu-run-msix_routing.entry_idx =
+   ext_data.msix_routing.entry_idx;
+   vcpu-run-msix_routing.flags =
+   ext_data.msix_routing.flags;
+   } else  {
+   vcpu-mmio_needed = 1;
+   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
+   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
+   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
+   memcpy(vcpu-run-mmio.data, val, bytes);
+   }
 
return X86EMUL_CONTINUE;
 }
@@ -4469,6 +4485,8 @@ done:
r = EMULATE_DO_MMIO;
} else if (r == EMULATION_RESTART)
goto restart;
+   else if (vcpu-userspace_exit_needed)
+   r = EMULATE_USERSPACE_EXIT;
else
r = EMULATE_DONE;
 
@@ -5397,12 +5415,18 @@ int

[PATCH 4/4] KVM: Add documents for MSI-X MMIO API

2011-02-24 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 Documentation/kvm/api.txt |   58 +
 1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index e1a9297..dd10c3b 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1263,6 +1263,53 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+4.54 KVM_REGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API indicates an MSI-X MMIO address of a guest device. Then all MMIO
+operation would be handled by kernel. When necessary(e.g. MSI data/address
+changed), KVM would exit to userspace using KVM_EXIT_MSIX_ROUTING_UPDATE to
+indicate the MMIO modification and require userspace to update IRQ routing
+table.
+
+NOTICE: Writing the MSI-X MMIO page after it was registered with this API may
+be dangerous for userspace program. The writing during VM running may result
+in synchronization issue therefore the assigned device can't work properly.
+The writing is allowed when VM is not running and can be used as save/restore
+mechanism.
+
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type; /* Device type and MMIO address type */
+   __u16 max_entries_nr;   /* Maximum entries supported */
+   __u64 base_addr;/* Guest physical address of MMIO */
+   __u64 base_va;  /* Host virtual address of MMIO mapping */
+   __u64 flags;/* Reserved for now */
+   __u64 reserved[4];
+};
+
+Current device type can be:
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+Current MMIO type can be:
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+
+4.55 KVM_UNREGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API would unregister the specific MSI-X MMIO, indicated by dev_id and
+type fields of struct kvm_msix_mmio_user.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
@@ -1445,6 +1492,17 @@ Userspace can now handle the hypercall and when it's 
done modify the gprs as
 necessary. Upon guest entry all guest GPRs will then be replaced by the values
 in this struct.
 
+   /* KVM_EXIT_MSIX_ROUTING_UPDATE*/
+   struct {
+   __u32 dev_id;
+   __u16 type;
+   __u16 entry_idx;
+   __u64 flags;
+   } msix_routing;
+
+KVM_EXIT_MSIX_ROUTING_UPDATE indicates one MSI-X entry has been modified, and
+userspace need to update the correlated routing table.
+
/* Fix the size of the union. */
char padding[256];
};
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: Move struct kvm_io_device to kvm_host.h

2011-02-24 Thread Sheng Yang
Then it can be used by other struct in kvm_host.h

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |   23 +++
 virt/kvm/iodev.h |   25 +
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b5021db..7d313e0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -98,6 +98,29 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
+struct kvm_io_device;
+
+/**
+ * kvm_io_device_ops are called under kvm slots_lock.
+ * read and write handlers return 0 if the transaction has been handled,
+ * or non-zero to have it passed to the next device.
+ **/
+struct kvm_io_device_ops {
+   int (*read)(struct kvm_io_device *this,
+   gpa_t addr,
+   int len,
+   void *val);
+   int (*write)(struct kvm_io_device *this,
+gpa_t addr,
+int len,
+const void *val);
+   void (*destructor)(struct kvm_io_device *this);
+};
+
+struct kvm_io_device {
+   const struct kvm_io_device_ops *ops;
+};
+
 struct kvm_vcpu {
struct kvm *kvm;
 #ifdef CONFIG_PREEMPT_NOTIFIERS
diff --git a/virt/kvm/iodev.h b/virt/kvm/iodev.h
index 12fd3ca..d1f5651 100644
--- a/virt/kvm/iodev.h
+++ b/virt/kvm/iodev.h
@@ -17,32 +17,9 @@
 #define __KVM_IODEV_H__
 
 #include linux/kvm_types.h
+#include linux/kvm_host.h
 #include asm/errno.h
 
-struct kvm_io_device;
-
-/**
- * kvm_io_device_ops are called under kvm slots_lock.
- * read and write handlers return 0 if the transaction has been handled,
- * or non-zero to have it passed to the next device.
- **/
-struct kvm_io_device_ops {
-   int (*read)(struct kvm_io_device *this,
-   gpa_t addr,
-   int len,
-   void *val);
-   int (*write)(struct kvm_io_device *this,
-gpa_t addr,
-int len,
-const void *val);
-   void (*destructor)(struct kvm_io_device *this);
-};
-
-
-struct kvm_io_device {
-   const struct kvm_io_device_ops *ops;
-};
-
 static inline void kvm_iodevice_init(struct kvm_io_device *dev,
 const struct kvm_io_device_ops *ops)
 {
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: Add kvm_io_ext_data to IO handler

2011-02-24 Thread Sheng Yang
On Thursday 24 February 2011 18:22:19 Michael S. Tsirkin wrote:
 On Thu, Feb 24, 2011 at 05:51:03PM +0800, Sheng Yang wrote:
  Add a new parameter to IO writing handler, so that we can transfer
  information from IO handler to caller.
  
  Signed-off-by: Sheng Yang sh...@linux.intel.com
  ---
  
   arch/x86/kvm/i8254.c  |6 --
   arch/x86/kvm/i8259.c  |3 ++-
   arch/x86/kvm/lapic.c  |3 ++-
   arch/x86/kvm/x86.c|   13 -
   include/linux/kvm_host.h  |   12 ++--
   virt/kvm/coalesced_mmio.c |3 ++-
   virt/kvm/eventfd.c|2 +-
   virt/kvm/ioapic.c |2 +-
   virt/kvm/iodev.h  |6 --
   virt/kvm/kvm_main.c   |4 ++--
   10 files changed, 36 insertions(+), 18 deletions(-)
  
  diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
  index efad723..bd8f0c5 100644
  --- a/arch/x86/kvm/i8254.c
  +++ b/arch/x86/kvm/i8254.c
  @@ -439,7 +439,8 @@ static inline int pit_in_range(gpa_t addr)
  
   }
   
   static int pit_ioport_write(struct kvm_io_device *this,
  
  -   gpa_t addr, int len, const void *data)
  +   gpa_t addr, int len, const void *data,
  +   struct kvm_io_ext_data *ext_data)
  
   {
   
  struct kvm_pit *pit = dev_to_pit(this);
  struct kvm_kpit_state *pit_state = pit-pit_state;
  
  @@ -585,7 +586,8 @@ static int pit_ioport_read(struct kvm_io_device
  *this,
  
   }
   
   static int speaker_ioport_write(struct kvm_io_device *this,
  
  -   gpa_t addr, int len, const void *data)
  +   gpa_t addr, int len, const void *data,
  +   struct kvm_io_ext_data *ext_data)
  
   {
   
  struct kvm_pit *pit = speaker_to_pit(this);
  struct kvm_kpit_state *pit_state = pit-pit_state;
  
  diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
  index 3cece05..96b1070 100644
  --- a/arch/x86/kvm/i8259.c
  +++ b/arch/x86/kvm/i8259.c
  @@ -480,7 +480,8 @@ static inline struct kvm_pic *to_pic(struct
  kvm_io_device *dev)
  
   }
   
   static int picdev_write(struct kvm_io_device *this,
  
  -gpa_t addr, int len, const void *val)
  +gpa_t addr, int len, const void *val,
  +struct kvm_io_ext_data *ext_data)
  
   {
   
  struct kvm_pic *s = to_pic(this);
  unsigned char data = *(unsigned char *)val;
  
  diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
  index 93cf9d0..f413e9c 100644
  --- a/arch/x86/kvm/lapic.c
  +++ b/arch/x86/kvm/lapic.c
  @@ -836,7 +836,8 @@ static int apic_reg_write(struct kvm_lapic *apic, u32
  reg, u32 val)
  
   }
   
   static int apic_mmio_write(struct kvm_io_device *this,
  
  -   gpa_t address, int len, const void *data)
  +   gpa_t address, int len, const void *data,
  +   struct kvm_io_ext_data *ext_data)
  
   {
   
  struct kvm_lapic *apic = to_lapic(this);
  unsigned int offset = address - apic-base_address;
  
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index fa708c9..21b84e2 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -3571,13 +3571,14 @@ static void kvm_init_msr_list(void)
  
   }
   
   static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
  
  -  const void *v)
  +  const void *v, struct kvm_io_ext_data *ext_data)
  
   {
   
  if (vcpu-arch.apic 
  
  -   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v))
  +   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v, ext_data))
  
  return 0;
  
  -   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, addr, len, v);
  +   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS,
  +   addr, len, v, ext_data);
  
   }
   
   static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len,
   void *v)
  
  @@ -3807,6 +3808,7 @@ static int emulator_write_emulated_onepage(unsigned
  long addr,
  
 struct kvm_vcpu *vcpu)
   
   {
   
  gpa_t gpa;
  
  +   struct kvm_io_ext_data ext_data;
  
  gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
  
  @@ -3825,7 +3827,7 @@ mmio:
  /*
  
   * Is this MMIO handled locally?
   */
  
  -   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
  +   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
  
  return X86EMUL_CONTINUE;
  
  vcpu-mmio_needed = 1;
  
  @@ -3940,6 +3942,7 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void
  *pd)
  
   {
   
  /* TODO: String I/O for in kernel device */
  int r;
  
  +   struct kvm_io_ext_data ext_data;
  
  if (vcpu-arch.pio.in)
  
  r = kvm_io_bus_read(vcpu-kvm, KVM_PIO_BUS, vcpu-arch.pio.port,
  
  @@ -3947,7 +3950,7 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void
  *pd)
  
  else

Re: [PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-02-24 Thread Sheng Yang
On Thursday 24 February 2011 18:11:44 Michael S. Tsirkin wrote:
 On Thu, Feb 24, 2011 at 04:08:22PM +0800, Sheng Yang wrote:
  On Wednesday 23 February 2011 16:45:37 Michael S. Tsirkin wrote:
   On Wed, Feb 23, 2011 at 02:59:04PM +0800, Sheng Yang wrote:
On Wednesday 23 February 2011 08:19:21 Alex Williamson wrote:
 On Sun, 2011-01-30 at 13:11 +0800, Sheng Yang wrote:
  Then we can support mask bit operation of assigned devices now.
 
 Looks pretty good overall.  A few comments below.  It seems like we
 should be able to hook this into vfio with a small stub in kvm.  We
 just need to be able to communicate disabling and enabling of
 individual msix vectors.  For brute force, we could do this with a
 lot of eventfds, triggered by kvm and consumed by vfio, two per
 MSI-X vector.  Not sure if there's something smaller that could do
 it. Thanks,

Alex, thanks for your comments. See my comments below:
 Alex
 
  Signed-off-by: Sheng Yang sh...@linux.intel.com
  ---
  
   arch/x86/kvm/Makefile|2 +-
   arch/x86/kvm/x86.c   |8 +-
   include/linux/kvm.h  |   21 
   include/linux/kvm_host.h |   25 
   virt/kvm/assigned-dev.c  |   44 +++
   virt/kvm/kvm_main.c  |   38 ++-
   virt/kvm/msix_mmio.c |  286
   ++
   virt/kvm/msix_mmio.h
   
   |   25 
   
   8 files changed, 442 insertions(+), 7 deletions(-)
   create mode 100644 virt/kvm/msix_mmio.c
   create mode 100644 virt/kvm/msix_mmio.h
  
  diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
  index f15501f..3a0d851 100644
  --- a/arch/x86/kvm/Makefile
  +++ b/arch/x86/kvm/Makefile
  @@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
  
   kvm-y  += $(addprefix ../../../virt/kvm/, 
  kvm_main.o 
ioapic.o
  
  \
  
  coalesced_mmio.o irq_comm.o eventfd.o \
  
  -   assigned-dev.o)
  +   assigned-dev.o msix_mmio.o)
  
   kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/,
   iommu.o) kvm-$(CONFIG_KVM_ASYNC_PF)+= $(addprefix
   ../../../virt/kvm/, async_pf.o)
  
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index fa708c9..89bf12c 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  
  case KVM_CAP_X86_ROBUST_SINGLESTEP:
  case KVM_CAP_XSAVE:
  
  case KVM_CAP_ASYNC_PF:
  +   case KVM_CAP_MSIX_MMIO:
  r = 1;
  break;
  
  case KVM_CAP_COALESCED_MMIO:
  @@ -3807,6 +3808,7 @@ static int
  emulator_write_emulated_onepage(unsigned long addr,
  
 struct kvm_vcpu *vcpu)
   
   {
   
  gpa_t gpa;
  
  +   int r;
  
  gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
  
  @@ -3822,14 +3824,16 @@ static int
  emulator_write_emulated_onepage(unsigned long addr,
  
   mmio:
  trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
  
  +   r = vcpu_mmio_write(vcpu, gpa, bytes, val);
  
  /*
  
   * Is this MMIO handled locally?
   */
  
  -   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
  +   if (!r)
  
  return X86EMUL_CONTINUE;
  
  vcpu-mmio_needed = 1;
  
  -   vcpu-run-exit_reason = KVM_EXIT_MMIO;
  +   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
  +   KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
   
   This use of -ENOTSYNC is IMO confusing.
   How about we make vcpu_mmio_write return the positive exit reason?
   Negative value will mean an error.
  
  Make sense. I would update it.
  
  vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
  vcpu-run-mmio.len = vcpu-mmio_size = bytes;
  vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
  
  diff --git a/include/linux/kvm.h b/include/linux/kvm.h
  index ea2dc1a..ad9df4b 100644
  --- a/include/linux/kvm.h
  +++ b/include/linux/kvm.h
  @@ -161,6 +161,7 @@ struct kvm_pit_config {
  
   #define KVM_EXIT_NMI  16
   #define KVM_EXIT_INTERNAL_ERROR   17
   #define KVM_EXIT_OSI  18
  
  +#define KVM_EXIT_MSIX_ROUTING_UPDATE 19
  
   /* For KVM_EXIT_INTERNAL_ERROR */
   #define KVM_INTERNAL_ERROR_EMULATION 1
  
  @@ -541,6 +542,7 @@ struct kvm_ppc_pvinfo {
  
   #define KVM_CAP_PPC_GET_PVINFO 57
   #define KVM_CAP_PPC_IRQ_LEVEL 58
   #define KVM_CAP_ASYNC_PF 59
  
  +#define KVM_CAP_MSIX_MMIO 60
  
   #ifdef KVM_CAP_IRQ_ROUTING

Re: [PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-02-24 Thread Sheng Yang
On Thursday 24 February 2011 18:17:34 Michael S. Tsirkin wrote:
 On Thu, Feb 24, 2011 at 05:44:20PM +0800, Sheng Yang wrote:
  On Wednesday 23 February 2011 16:45:37 Michael S. Tsirkin wrote:
   On Wed, Feb 23, 2011 at 02:59:04PM +0800, Sheng Yang wrote:
On Wednesday 23 February 2011 08:19:21 Alex Williamson wrote:
 On Sun, 2011-01-30 at 13:11 +0800, Sheng Yang wrote:
  Then we can support mask bit operation of assigned devices now.
 
 Looks pretty good overall.  A few comments below.  It seems like we
 should be able to hook this into vfio with a small stub in kvm.  We
 just need to be able to communicate disabling and enabling of
 individual msix vectors.  For brute force, we could do this with a
 lot of eventfds, triggered by kvm and consumed by vfio, two per
 MSI-X vector.  Not sure if there's something smaller that could do
 it. Thanks,

Alex, thanks for your comments. See my comments below:
 Alex
 
  Signed-off-by: Sheng Yang sh...@linux.intel.com
  ---
  
   arch/x86/kvm/Makefile|2 +-
   arch/x86/kvm/x86.c   |8 +-
   include/linux/kvm.h  |   21 
   include/linux/kvm_host.h |   25 
   virt/kvm/assigned-dev.c  |   44 +++
   virt/kvm/kvm_main.c  |   38 ++-
   virt/kvm/msix_mmio.c |  286
   ++
   virt/kvm/msix_mmio.h
   
   |   25 
   
   8 files changed, 442 insertions(+), 7 deletions(-)
   create mode 100644 virt/kvm/msix_mmio.c
   create mode 100644 virt/kvm/msix_mmio.h
  
  diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
  index f15501f..3a0d851 100644
  --- a/arch/x86/kvm/Makefile
  +++ b/arch/x86/kvm/Makefile
  @@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
  
   kvm-y  += $(addprefix ../../../virt/kvm/, 
  kvm_main.o 
ioapic.o
  
  \
  
  coalesced_mmio.o irq_comm.o eventfd.o \
  
  -   assigned-dev.o)
  +   assigned-dev.o msix_mmio.o)
  
   kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/,
   iommu.o) kvm-$(CONFIG_KVM_ASYNC_PF)+= $(addprefix
   ../../../virt/kvm/, async_pf.o)
  
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index fa708c9..89bf12c 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  
  case KVM_CAP_X86_ROBUST_SINGLESTEP:
  case KVM_CAP_XSAVE:
  
  case KVM_CAP_ASYNC_PF:
  +   case KVM_CAP_MSIX_MMIO:
  r = 1;
  break;
  
  case KVM_CAP_COALESCED_MMIO:
  @@ -3807,6 +3808,7 @@ static int
  emulator_write_emulated_onepage(unsigned long addr,
  
 struct kvm_vcpu *vcpu)
   
   {
   
  gpa_t gpa;
  
  +   int r;
  
  gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
  
  @@ -3822,14 +3824,16 @@ static int
  emulator_write_emulated_onepage(unsigned long addr,
  
   mmio:
  trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
  
  +   r = vcpu_mmio_write(vcpu, gpa, bytes, val);
  
  /*
  
   * Is this MMIO handled locally?
   */
  
  -   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
  +   if (!r)
  
  return X86EMUL_CONTINUE;
  
  vcpu-mmio_needed = 1;
  
  -   vcpu-run-exit_reason = KVM_EXIT_MMIO;
  +   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
  +   KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
   
   This use of -ENOTSYNC is IMO confusing.
   How about we make vcpu_mmio_write return the positive exit reason?
   Negative value will mean an error.
  
  In fact currently nagative value means something more need to be done,
  the same as MMIO exit.
 
 So it would be
   if (!r)
   return X86EMUL_CONTINUE;
 
   vcpu-run-exit_reason = r;
 
  Now I think we can keep it, or update them all later.
 
 The way to do this would be
 1. patch to return KVM_EXIT_MMIO on mmio
 2. your patch that returns KVM_EXIT_MSIX_ROUTING_UPDATE on top

It's not that straightforward. In most condition, the reason vcpu_mmio_write() 
 0 
because KVM itself unable to complete the request. That's quite 
straightforward. 
But each handler in the chain can't decided it would be KVM_EXIT_MMIO, they 
can 
only know when all of them fail to handle the accessing.

I am not sure if we like every single handler said I want KVM_EXIT_MMIO 
instead 
of a error return. We can discuss more on this, but since it's not API/ABI 
change, 
I think we can get the patch in first.

--
regards
Yang, Sheng

 
  --
  regards
  Yang, Sheng
--
To unsubscribe from

Re: [PATCH 3/4] KVM: Emulate MSI-X table in kernel

2011-02-24 Thread Sheng Yang
On Thursday 24 February 2011 18:45:08 Michael S. Tsirkin wrote:
 On Thu, Feb 24, 2011 at 05:51:04PM +0800, Sheng Yang wrote:
  Then we can support mask bit operation of assigned devices now.
  
  Signed-off-by: Sheng Yang sh...@linux.intel.com
 
 Doesn't look like all comments got addressed.
 E.g. gpa_t entry_base is still there and in reality
 you said it's a host virtual address so
 should be void __user *;

Would update it.

 And ENOTSYNC meaning 'MSIX' is pretty hacky.

I'd like to discuss it later. We may need some work on all MMIO handling side 
to 
make it more straightforward. But I don't want to bundle it with this one... 
 
  ---
  
   arch/x86/include/asm/kvm_host.h |1 +
   arch/x86/kvm/Makefile   |2 +-
   arch/x86/kvm/mmu.c  |2 +
   arch/x86/kvm/x86.c  |   40 -
   include/linux/kvm.h |   28 
   include/linux/kvm_host.h|   34 +
   virt/kvm/assigned-dev.c |   44 ++
   virt/kvm/kvm_main.c |   38 +-
   virt/kvm/msix_mmio.c|  296
   +++ virt/kvm/msix_mmio.h   
   |   25 
   10 files changed, 497 insertions(+), 13 deletions(-)
   create mode 100644 virt/kvm/msix_mmio.c
   create mode 100644 virt/kvm/msix_mmio.h
  
  diff --git a/arch/x86/include/asm/kvm_host.h
  b/arch/x86/include/asm/kvm_host.h index aa75f21..4a390a4 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -635,6 +635,7 @@ enum emulation_result {
  
  EMULATE_DONE,   /* no further processing */
  EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
  EMULATE_FAIL, /* can't emulate this instruction */
  
  +   EMULATE_USERSPACE_EXIT, /* we need exit to userspace */
  
   };
   
   #define EMULTYPE_NO_DECODE (1  0)
  
  diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
  index f15501f..3a0d851 100644
  --- a/arch/x86/kvm/Makefile
  +++ b/arch/x86/kvm/Makefile
  @@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
  
   kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o 
  ioapic.o \
   
  coalesced_mmio.o irq_comm.o eventfd.o \
  
  -   assigned-dev.o)
  +   assigned-dev.o msix_mmio.o)
  
   kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
   kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/,
   async_pf.o)
  
  diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
  index 9cafbb4..912dca4 100644
  --- a/arch/x86/kvm/mmu.c
  +++ b/arch/x86/kvm/mmu.c
  @@ -3358,6 +3358,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t
  cr2, u32 error_code,
  
  case EMULATE_DO_MMIO:
  ++vcpu-stat.mmio_exits;
  /* fall through */
  
  +   case EMULATE_USERSPACE_EXIT:
  +   /* fall through */
  
  case EMULATE_FAIL:
  return 0;
  
  default:
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index 21b84e2..87308eb 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  
  case KVM_CAP_X86_ROBUST_SINGLESTEP:
  case KVM_CAP_XSAVE:
  
  case KVM_CAP_ASYNC_PF:
  +   case KVM_CAP_MSIX_MMIO:
  r = 1;
  break;
  
  case KVM_CAP_COALESCED_MMIO:
  @@ -3809,6 +3810,7 @@ static int emulator_write_emulated_onepage(unsigned
  long addr,
  
   {
   
  gpa_t gpa;
  struct kvm_io_ext_data ext_data;
  
  +   int r;
  
  gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
  
  @@ -3824,18 +3826,32 @@ static int
  emulator_write_emulated_onepage(unsigned long addr,
  
   mmio:
  trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
  
  +   r = vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data);
  
  /*
  
   * Is this MMIO handled locally?
   */
  
  -   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
  +   if (!r)
  
  return X86EMUL_CONTINUE;
  
  -   vcpu-mmio_needed = 1;
  -   vcpu-run-exit_reason = KVM_EXIT_MMIO;
  -   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
  -   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
  -   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
  -   memcpy(vcpu-run-mmio.data, val, bytes);
  +   if (r == -ENOTSYNC) {
  +   vcpu-userspace_exit_needed = 1;
  +   vcpu-run-exit_reason = KVM_EXIT_MSIX_ROUTING_UPDATE;
  +   vcpu-run-msix_routing.dev_id =
  +   ext_data.msix_routing.dev_id;
  +   vcpu-run-msix_routing.type =
  +   ext_data.msix_routing.type;
  +   vcpu-run-msix_routing.entry_idx =
  +   ext_data.msix_routing.entry_idx;
  +   vcpu-run-msix_routing.flags =
  +   ext_data.msix_routing.flags;
  +   } else  {
  +   vcpu-mmio_needed = 1;
  +   vcpu-run-exit_reason = KVM_EXIT_MMIO

[PATCH 3/4] KVM: Emulate MSI-X table in kernel

2011-02-24 Thread Sheng Yang
Then we can support mask bit operation of assigned devices now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/Makefile   |2 +-
 arch/x86/kvm/mmu.c  |2 +
 arch/x86/kvm/x86.c  |   40 -
 include/linux/kvm.h |   28 
 include/linux/kvm_host.h|   34 +
 virt/kvm/assigned-dev.c |   44 ++
 virt/kvm/kvm_main.c |   38 +-
 virt/kvm/msix_mmio.c|  302 +++
 virt/kvm/msix_mmio.h|   25 
 10 files changed, 503 insertions(+), 13 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa75f21..4a390a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -635,6 +635,7 @@ enum emulation_result {
EMULATE_DONE,   /* no further processing */
EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
EMULATE_FAIL, /* can't emulate this instruction */
+   EMULATE_USERSPACE_EXIT, /* we need exit to userspace */
 };
 
 #define EMULTYPE_NO_DECODE (1  0)
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
 
 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o)
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9cafbb4..912dca4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3358,6 +3358,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, 
u32 error_code,
case EMULATE_DO_MMIO:
++vcpu-stat.mmio_exits;
/* fall through */
+   case EMULATE_USERSPACE_EXIT:
+   /* fall through */
case EMULATE_FAIL:
return 0;
default:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21b84e2..87308eb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -3809,6 +3810,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
 {
gpa_t gpa;
struct kvm_io_ext_data ext_data;
+   int r;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3824,18 +3826,32 @@ static int emulator_write_emulated_onepage(unsigned 
long addr,
 
 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
+   r = vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data);
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
+   if (!r)
return X86EMUL_CONTINUE;
 
-   vcpu-mmio_needed = 1;
-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
-   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
-   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
-   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
-   memcpy(vcpu-run-mmio.data, val, bytes);
+   if (r == -ENOTSYNC) {
+   vcpu-userspace_exit_needed = 1;
+   vcpu-run-exit_reason = KVM_EXIT_MSIX_ROUTING_UPDATE;
+   vcpu-run-msix_routing.dev_id =
+   ext_data.msix_routing.dev_id;
+   vcpu-run-msix_routing.type =
+   ext_data.msix_routing.type;
+   vcpu-run-msix_routing.entry_idx =
+   ext_data.msix_routing.entry_idx;
+   vcpu-run-msix_routing.flags =
+   ext_data.msix_routing.flags;
+   } else  {
+   vcpu-mmio_needed = 1;
+   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
+   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
+   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
+   memcpy(vcpu-run-mmio.data, val, bytes);
+   }
 
return X86EMUL_CONTINUE;
 }
@@ -4469,6 +4485,8 @@ done:
r = EMULATE_DO_MMIO;
} else if (r == EMULATION_RESTART)
goto restart;
+   else if (vcpu-userspace_exit_needed)
+   r = EMULATE_USERSPACE_EXIT;
else
r = EMULATE_DONE;
 
@@ -5397,12 +5415,18 @@ int

[PATCH 3/4 v10 UPDATED] KVM: Emulate MSI-X table in kernel

2011-02-24 Thread Sheng Yang
Then we can support mask bit operation of assigned devices now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/Makefile   |2 +-
 arch/x86/kvm/mmu.c  |2 +
 arch/x86/kvm/x86.c  |   40 -
 include/linux/kvm.h |   28 
 include/linux/kvm_host.h|   34 +
 virt/kvm/assigned-dev.c |   44 ++
 virt/kvm/kvm_main.c |   38 +-
 virt/kvm/msix_mmio.c|  302 +++
 virt/kvm/msix_mmio.h|   25 
 10 files changed, 503 insertions(+), 13 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa75f21..4a390a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -635,6 +635,7 @@ enum emulation_result {
EMULATE_DONE,   /* no further processing */
EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
EMULATE_FAIL, /* can't emulate this instruction */
+   EMULATE_USERSPACE_EXIT, /* we need exit to userspace */
 };
 
 #define EMULTYPE_NO_DECODE (1  0)
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
 
 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o)
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9cafbb4..912dca4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3358,6 +3358,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, 
u32 error_code,
case EMULATE_DO_MMIO:
++vcpu-stat.mmio_exits;
/* fall through */
+   case EMULATE_USERSPACE_EXIT:
+   /* fall through */
case EMULATE_FAIL:
return 0;
default:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21b84e2..87308eb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -3809,6 +3810,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
 {
gpa_t gpa;
struct kvm_io_ext_data ext_data;
+   int r;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3824,18 +3826,32 @@ static int emulator_write_emulated_onepage(unsigned 
long addr,
 
 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
+   r = vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data);
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
+   if (!r)
return X86EMUL_CONTINUE;
 
-   vcpu-mmio_needed = 1;
-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
-   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
-   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
-   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
-   memcpy(vcpu-run-mmio.data, val, bytes);
+   if (r == -ENOTSYNC) {
+   vcpu-userspace_exit_needed = 1;
+   vcpu-run-exit_reason = KVM_EXIT_MSIX_ROUTING_UPDATE;
+   vcpu-run-msix_routing.dev_id =
+   ext_data.msix_routing.dev_id;
+   vcpu-run-msix_routing.type =
+   ext_data.msix_routing.type;
+   vcpu-run-msix_routing.entry_idx =
+   ext_data.msix_routing.entry_idx;
+   vcpu-run-msix_routing.flags =
+   ext_data.msix_routing.flags;
+   } else  {
+   vcpu-mmio_needed = 1;
+   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
+   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
+   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
+   memcpy(vcpu-run-mmio.data, val, bytes);
+   }
 
return X86EMUL_CONTINUE;
 }
@@ -4469,6 +4485,8 @@ done:
r = EMULATE_DO_MMIO;
} else if (r == EMULATION_RESTART)
goto restart;
+   else if (vcpu-userspace_exit_needed)
+   r = EMULATE_USERSPACE_EXIT;
else
r = EMULATE_DONE;
 
@@ -5397,12 +5415,18 @@ int

Re: [PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-02-22 Thread Sheng Yang
On Wednesday 23 February 2011 08:19:21 Alex Williamson wrote:
 On Sun, 2011-01-30 at 13:11 +0800, Sheng Yang wrote:
  Then we can support mask bit operation of assigned devices now.
 
 Looks pretty good overall.  A few comments below.  It seems like we
 should be able to hook this into vfio with a small stub in kvm.  We just
 need to be able to communicate disabling and enabling of individual msix
 vectors.  For brute force, we could do this with a lot of eventfds,
 triggered by kvm and consumed by vfio, two per MSI-X vector.  Not sure
 if there's something smaller that could do it.  Thanks,

Alex, thanks for your comments. See my comments below:

 
 Alex
 
  Signed-off-by: Sheng Yang sh...@linux.intel.com
  ---
  
   arch/x86/kvm/Makefile|2 +-
   arch/x86/kvm/x86.c   |8 +-
   include/linux/kvm.h  |   21 
   include/linux/kvm_host.h |   25 
   virt/kvm/assigned-dev.c  |   44 +++
   virt/kvm/kvm_main.c  |   38 ++-
   virt/kvm/msix_mmio.c |  286
   ++ virt/kvm/msix_mmio.h
   |   25 
   8 files changed, 442 insertions(+), 7 deletions(-)
   create mode 100644 virt/kvm/msix_mmio.c
   create mode 100644 virt/kvm/msix_mmio.h
  
  diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
  index f15501f..3a0d851 100644
  --- a/arch/x86/kvm/Makefile
  +++ b/arch/x86/kvm/Makefile
  @@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
  
   kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o 
  ioapic.o \
   
  coalesced_mmio.o irq_comm.o eventfd.o \
  
  -   assigned-dev.o)
  +   assigned-dev.o msix_mmio.o)
  
   kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
   kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/,
   async_pf.o)
  
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index fa708c9..89bf12c 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  
  case KVM_CAP_X86_ROBUST_SINGLESTEP:
  case KVM_CAP_XSAVE:
  
  case KVM_CAP_ASYNC_PF:
  +   case KVM_CAP_MSIX_MMIO:
  r = 1;
  break;
  
  case KVM_CAP_COALESCED_MMIO:
  @@ -3807,6 +3808,7 @@ static int emulator_write_emulated_onepage(unsigned
  long addr,
  
 struct kvm_vcpu *vcpu)
   
   {
   
  gpa_t gpa;
  
  +   int r;
  
  gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
  
  @@ -3822,14 +3824,16 @@ static int
  emulator_write_emulated_onepage(unsigned long addr,
  
   mmio:
  trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
  
  +   r = vcpu_mmio_write(vcpu, gpa, bytes, val);
  
  /*
  
   * Is this MMIO handled locally?
   */
  
  -   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
  +   if (!r)
  
  return X86EMUL_CONTINUE;
  
  vcpu-mmio_needed = 1;
  
  -   vcpu-run-exit_reason = KVM_EXIT_MMIO;
  +   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
  +   KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
  
  vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
  vcpu-run-mmio.len = vcpu-mmio_size = bytes;
  vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
  
  diff --git a/include/linux/kvm.h b/include/linux/kvm.h
  index ea2dc1a..ad9df4b 100644
  --- a/include/linux/kvm.h
  +++ b/include/linux/kvm.h
  @@ -161,6 +161,7 @@ struct kvm_pit_config {
  
   #define KVM_EXIT_NMI  16
   #define KVM_EXIT_INTERNAL_ERROR   17
   #define KVM_EXIT_OSI  18
  
  +#define KVM_EXIT_MSIX_ROUTING_UPDATE 19
  
   /* For KVM_EXIT_INTERNAL_ERROR */
   #define KVM_INTERNAL_ERROR_EMULATION 1
  
  @@ -541,6 +542,7 @@ struct kvm_ppc_pvinfo {
  
   #define KVM_CAP_PPC_GET_PVINFO 57
   #define KVM_CAP_PPC_IRQ_LEVEL 58
   #define KVM_CAP_ASYNC_PF 59
  
  +#define KVM_CAP_MSIX_MMIO 60
  
   #ifdef KVM_CAP_IRQ_ROUTING
  
  @@ -672,6 +674,9 @@ struct kvm_clock_data {
  
   #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct
   kvm_xen_hvm_config) #define KVM_SET_CLOCK _IOW(KVMIO, 
   0x7b, struct kvm_clock_data) #define KVM_GET_CLOCK
   _IOR(KVMIO,  0x7c, struct kvm_clock_data)
  
  +/* Available with KVM_CAP_MSIX_MMIO */
  +#define KVM_REGISTER_MSIX_MMIO_IOW(KVMIO,  0x7d, struct
  kvm_msix_mmio_user) +#define KVM_UNREGISTER_MSIX_MMIO  _IOW(KVMIO, 
  0x7e, struct kvm_msix_mmio_user)
  
   /* Available with KVM_CAP_PIT_STATE2 */
   #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct
   kvm_pit_state2) #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0,
   struct kvm_pit_state2)
  
  @@ -795,4 +800,20 @@ struct kvm_assigned_msix_entry {
  
  __u16 padding[3];
   
   };
  
  +#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
  +
  +#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
  +
  +#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff

Re: [PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-02-18 Thread Sheng Yang
On Thursday 03 February 2011 09:05:55 Marcelo Tosatti wrote:
 On Sun, Jan 30, 2011 at 01:11:15PM +0800, Sheng Yang wrote:
  Then we can support mask bit operation of assigned devices now.
  
  Signed-off-by: Sheng Yang sh...@linux.intel.com
  
  +int kvm_vm_ioctl_register_msix_mmio(struct kvm *kvm,
  +   struct kvm_msix_mmio_user *mmio_user)
  +{
  +   struct kvm_msix_mmio_dev *mmio_dev = kvm-msix_mmio_dev;
  +   struct kvm_msix_mmio *mmio = NULL;
  +   int r = 0, i;
  +
  +   mutex_lock(mmio_dev-lock);
  +   for (i = 0; i  mmio_dev-mmio_nr; i++) {
  +   if (mmio_dev-mmio[i].dev_id == mmio_user-dev_id 
  +   (mmio_dev-mmio[i].type  KVM_MSIX_MMIO_TYPE_DEV_MASK) ==
  +   (mmio_user-type  KVM_MSIX_MMIO_TYPE_DEV_MASK)) {
  +   mmio = mmio_dev-mmio[i];
  +   if (mmio-max_entries_nr != mmio_user-max_entries_nr) {
  +   r = -EINVAL;
  +   goto out;
  +   }
  +   break;
  +   }
 
 Why allow this ioctl to succeed if there's an entry already present?
 This case is broken as mmio_dev-mmio_nr is increased below.

Oh, It's a bug to let mmio_nr increase even with MMIO found. I've fixed it.

The reason we allow multiply callings is userspace can register different types 
of 
address here(table address and PBA address).

 PCI bits must be reviewed...

Pardon? PCI related things are already in 2.6.38-rc.

--
regards
Yang, Sheng


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4 v9] MSI-X MMIO support for KVM

2011-02-18 Thread Sheng Yang
Sorry for the long delay, just come back from vacation...

Change from v8:
1. Update struct kvm_run to contain MSI-X routing update exit specific
information.
2. Fix a mmio_nr counting bug.

Notice this patchset still based on 2.6.37 due to a block bug on assigned
device in the upstream now.

Sheng Yang (4):
  KVM: Move struct kvm_io_device to kvm_host.h
  KVM: Add kvm_io_ext_data to IO handler
  KVM: Emulate MSI-X table in kernel
  KVM: Add documents for MSI-X MMIO API

 Documentation/kvm/api.txt |   58 +
 arch/x86/kvm/Makefile |2 +-
 arch/x86/kvm/i8254.c  |6 +-
 arch/x86/kvm/i8259.c  |3 +-
 arch/x86/kvm/lapic.c  |3 +-
 arch/x86/kvm/x86.c|   40 +--
 include/linux/kvm.h   |   28 +
 include/linux/kvm_host.h  |   65 ++-
 virt/kvm/assigned-dev.c   |   44 +++
 virt/kvm/coalesced_mmio.c |3 +-
 virt/kvm/eventfd.c|2 +-
 virt/kvm/ioapic.c |2 +-
 virt/kvm/iodev.h  |   31 +
 virt/kvm/kvm_main.c   |   40 ++-
 virt/kvm/msix_mmio.c  |  293 +
 virt/kvm/msix_mmio.h  |   25 
 16 files changed, 594 insertions(+), 51 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: Move struct kvm_io_device to kvm_host.h

2011-02-18 Thread Sheng Yang
Then it can be used by other struct in kvm_host.h

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |   23 +++
 virt/kvm/iodev.h |   25 +
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b5021db..7d313e0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -98,6 +98,29 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
+struct kvm_io_device;
+
+/**
+ * kvm_io_device_ops are called under kvm slots_lock.
+ * read and write handlers return 0 if the transaction has been handled,
+ * or non-zero to have it passed to the next device.
+ **/
+struct kvm_io_device_ops {
+   int (*read)(struct kvm_io_device *this,
+   gpa_t addr,
+   int len,
+   void *val);
+   int (*write)(struct kvm_io_device *this,
+gpa_t addr,
+int len,
+const void *val);
+   void (*destructor)(struct kvm_io_device *this);
+};
+
+struct kvm_io_device {
+   const struct kvm_io_device_ops *ops;
+};
+
 struct kvm_vcpu {
struct kvm *kvm;
 #ifdef CONFIG_PREEMPT_NOTIFIERS
diff --git a/virt/kvm/iodev.h b/virt/kvm/iodev.h
index 12fd3ca..d1f5651 100644
--- a/virt/kvm/iodev.h
+++ b/virt/kvm/iodev.h
@@ -17,32 +17,9 @@
 #define __KVM_IODEV_H__
 
 #include linux/kvm_types.h
+#include linux/kvm_host.h
 #include asm/errno.h
 
-struct kvm_io_device;
-
-/**
- * kvm_io_device_ops are called under kvm slots_lock.
- * read and write handlers return 0 if the transaction has been handled,
- * or non-zero to have it passed to the next device.
- **/
-struct kvm_io_device_ops {
-   int (*read)(struct kvm_io_device *this,
-   gpa_t addr,
-   int len,
-   void *val);
-   int (*write)(struct kvm_io_device *this,
-gpa_t addr,
-int len,
-const void *val);
-   void (*destructor)(struct kvm_io_device *this);
-};
-
-
-struct kvm_io_device {
-   const struct kvm_io_device_ops *ops;
-};
-
 static inline void kvm_iodevice_init(struct kvm_io_device *dev,
 const struct kvm_io_device_ops *ops)
 {
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: Add kvm_io_ext_data to IO handler

2011-02-18 Thread Sheng Yang
Add a new parameter to IO writing handler, so that we can transfer information
from IO handler to caller.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/i8254.c  |6 --
 arch/x86/kvm/i8259.c  |3 ++-
 arch/x86/kvm/lapic.c  |3 ++-
 arch/x86/kvm/x86.c|   13 -
 include/linux/kvm_host.h  |   12 ++--
 virt/kvm/coalesced_mmio.c |3 ++-
 virt/kvm/eventfd.c|2 +-
 virt/kvm/ioapic.c |2 +-
 virt/kvm/iodev.h  |6 --
 virt/kvm/kvm_main.c   |4 ++--
 10 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index efad723..bd8f0c5 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -439,7 +439,8 @@ static inline int pit_in_range(gpa_t addr)
 }
 
 static int pit_ioport_write(struct kvm_io_device *this,
-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_pit *pit = dev_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;
@@ -585,7 +586,8 @@ static int pit_ioport_read(struct kvm_io_device *this,
 }
 
 static int speaker_ioport_write(struct kvm_io_device *this,
-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_pit *pit = speaker_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 3cece05..96b1070 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -480,7 +480,8 @@ static inline struct kvm_pic *to_pic(struct kvm_io_device 
*dev)
 }
 
 static int picdev_write(struct kvm_io_device *this,
-gpa_t addr, int len, const void *val)
+gpa_t addr, int len, const void *val,
+struct kvm_io_ext_data *ext_data)
 {
struct kvm_pic *s = to_pic(this);
unsigned char data = *(unsigned char *)val;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 93cf9d0..f413e9c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -836,7 +836,8 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
 }
 
 static int apic_mmio_write(struct kvm_io_device *this,
-   gpa_t address, int len, const void *data)
+   gpa_t address, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_lapic *apic = to_lapic(this);
unsigned int offset = address - apic-base_address;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa708c9..21b84e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3571,13 +3571,14 @@ static void kvm_init_msr_list(void)
 }
 
 static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
-  const void *v)
+  const void *v, struct kvm_io_ext_data *ext_data)
 {
if (vcpu-arch.apic 
-   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v))
+   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v, ext_data))
return 0;
 
-   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, addr, len, v);
+   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS,
+   addr, len, v, ext_data);
 }
 
 static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
@@ -3807,6 +3808,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
   struct kvm_vcpu *vcpu)
 {
gpa_t gpa;
+   struct kvm_io_ext_data ext_data;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3825,7 +3827,7 @@ mmio:
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
return X86EMUL_CONTINUE;
 
vcpu-mmio_needed = 1;
@@ -3940,6 +3942,7 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
 {
/* TODO: String I/O for in kernel device */
int r;
+   struct kvm_io_ext_data ext_data;
 
if (vcpu-arch.pio.in)
r = kvm_io_bus_read(vcpu-kvm, KVM_PIO_BUS, vcpu-arch.pio.port,
@@ -3947,7 +3950,7 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
else
r = kvm_io_bus_write(vcpu-kvm, KVM_PIO_BUS,
 vcpu-arch.pio.port, vcpu-arch.pio.size,
-pd);
+pd, ext_data);
return r;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7d313e0..6bb211d 100644

[PATCH 4/4] KVM: Add documents for MSI-X MMIO API

2011-02-18 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 Documentation/kvm/api.txt |   58 +
 1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index e1a9297..dd10c3b 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1263,6 +1263,53 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+4.54 KVM_REGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API indicates an MSI-X MMIO address of a guest device. Then all MMIO
+operation would be handled by kernel. When necessary(e.g. MSI data/address
+changed), KVM would exit to userspace using KVM_EXIT_MSIX_ROUTING_UPDATE to
+indicate the MMIO modification and require userspace to update IRQ routing
+table.
+
+NOTICE: Writing the MSI-X MMIO page after it was registered with this API may
+be dangerous for userspace program. The writing during VM running may result
+in synchronization issue therefore the assigned device can't work properly.
+The writing is allowed when VM is not running and can be used as save/restore
+mechanism.
+
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type; /* Device type and MMIO address type */
+   __u16 max_entries_nr;   /* Maximum entries supported */
+   __u64 base_addr;/* Guest physical address of MMIO */
+   __u64 base_va;  /* Host virtual address of MMIO mapping */
+   __u64 flags;/* Reserved for now */
+   __u64 reserved[4];
+};
+
+Current device type can be:
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+Current MMIO type can be:
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+
+4.55 KVM_UNREGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API would unregister the specific MSI-X MMIO, indicated by dev_id and
+type fields of struct kvm_msix_mmio_user.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
@@ -1445,6 +1492,17 @@ Userspace can now handle the hypercall and when it's 
done modify the gprs as
 necessary. Upon guest entry all guest GPRs will then be replaced by the values
 in this struct.
 
+   /* KVM_EXIT_MSIX_ROUTING_UPDATE*/
+   struct {
+   __u32 dev_id;
+   __u16 type;
+   __u16 entry_idx;
+   __u64 flags;
+   } msix_routing;
+
+KVM_EXIT_MSIX_ROUTING_UPDATE indicates one MSI-X entry has been modified, and
+userspace need to update the correlated routing table.
+
/* Fix the size of the union. */
char padding[256];
};
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4 v9] qemu-kvm: MSI-X MMIO support for assigned device

2011-02-18 Thread Sheng Yang
Update with kernel patches v9.

Sheng Yang (4):
  qemu-kvm: device assignment: Enabling MSI-X according to the entries'
mask bit
  qemu-kvm: Ioctl for MSIX MMIO support
  qemu-kvm: Header file update for MSI-X MMIO support
  qemu-kvm: MSI-X MMIO support for assigned device

 hw/device-assignment.c  |  284 +--
 hw/device-assignment.h  |5 +-
 kvm/include/linux/kvm.h |   28 +
 qemu-kvm.c  |   60 ++
 qemu-kvm.h  |   26 +
 5 files changed, 366 insertions(+), 37 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] qemu-kvm: MSI-X MMIO support for assigned device

2011-02-18 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 hw/device-assignment.c |  106 +--
 hw/device-assignment.h |3 +
 qemu-kvm.c |   46 +
 qemu-kvm.h |   19 +
 4 files changed, 160 insertions(+), 14 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 5c162c4..09e3b99 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -71,6 +71,11 @@ static void assigned_device_pci_cap_write_config(PCIDevice 
*pci_dev,
 static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
 uint32_t address, int len);
 
+static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
+{
+return (uint32_t)seg  16 | (uint32_t)bus  8 | (uint32_t)devfn;
+}
+
 static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
uint32_t addr, int len, uint32_t *val)
 {
@@ -274,6 +279,10 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int 
region_num,
 AssignedDevRegion *region = r_dev-v_addrs[region_num];
 PCIRegion *real_region = r_dev-real_device.regions[region_num];
 int ret = 0;
+#ifdef KVM_CAP_MSIX_MMIO
+int cap_mask = kvm_check_extension(kvm_state, KVM_CAP_MSIX_MMIO);
+struct kvm_msix_mmio_user msix_mmio;
+#endif
 
 DEBUG(e_phys=%08 FMT_PCIBUS  r_virt=%p type=%d len=%08 FMT_PCIBUS  
region_num=%d \n,
   e_phys, region-u.r_virtbase, type, e_size, region_num);
@@ -292,6 +301,23 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int 
region_num,
 
 cpu_register_physical_memory(e_phys + offset,
 TARGET_PAGE_SIZE, r_dev-mmio_index);
+#ifdef KVM_CAP_MSIX_MMIO
+if (cap_mask) {
+r_dev-guest_msix_table_addr = e_phys + offset;
+memset(msix_mmio, 0, sizeof msix_mmio);
+msix_mmio.dev_id = calc_assigned_dev_id(r_dev-h_segnr,
+r_dev-h_busnr, r_dev-h_devfn);
+msix_mmio.type = KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV |
+   KVM_MSIX_MMIO_TYPE_BASE_TABLE;
+msix_mmio.base_addr = e_phys + offset;
+msix_mmio.base_va = (unsigned long)r_dev-msix_table_page;
+msix_mmio.max_entries_nr = r_dev-max_msix_entries_nr;
+msix_mmio.flags = 0;
+ret = kvm_register_msix_mmio(kvm_context, msix_mmio);
+if (ret)
+fprintf(stderr, fail to register in-kernel msix_mmio!\n);
+}
+#endif
 }
 }
 
@@ -854,11 +880,6 @@ static void free_assigned_device(AssignedDevice *dev)
 }
 }
 
-static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
-{
-return (uint32_t)seg  16 | (uint32_t)bus  8 | (uint32_t)devfn;
-}
-
 static void assign_failed_examine(AssignedDevice *dev)
 {
 char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
@@ -1268,6 +1289,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev,
 return r;
 }
 
+static int assigned_dev_update_routing_handler(void *opaque,
+struct kvm_msix_routing_data *data);
+
 static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
 {
 struct kvm_assigned_irq assigned_irq_data;
@@ -1494,7 +1518,9 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 msix_table_entry = pci_get_long(pci_dev-config + pos + 
PCI_MSIX_TABLE);
 bar_nr = msix_table_entry  PCI_MSIX_BIR;
 msix_table_entry = ~PCI_MSIX_BIR;
-dev-msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
+dev-msix_table_addr = pci_region[bar_nr].base_addr +
+   msix_table_entry;
+
 dev-max_msix_entries_nr = get_msix_entries_max_nr(dev);
 }
 #endif
@@ -1678,11 +1704,10 @@ static uint32_t msix_mmio_readw(void *opaque, 
target_phys_addr_t addr)
 (8 * (addr  3)))  0x;
 }
 
-static void msix_mmio_writel(void *opaque,
- target_phys_addr_t addr, uint32_t val)
+static void assigned_dev_update_routing(void *opaque,
+struct kvm_msix_routing_data *data)
 {
 AssignedDevice *adev = opaque;
-unsigned int offset = addr  0xfff;
 void *page = adev-msix_table_page;
 int ctrl_word, index;
 struct kvm_irq_routing_entry new_entry = {};
@@ -1691,11 +1716,7 @@ static void msix_mmio_writel(void *opaque,
 struct PCIDevice *pci_dev = adev-dev;
 uint8_t cap = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
-DEBUG(write to MSI-X entry table mmio offset 0x%lx, val 0x%x\n,
-   addr, val);
-memcpy((void *)((char *)page + offset), val, 4);
-
-index = offset / 16;
+index = data-entry_idx;
 
 /* Check if mask bit is being accessed */
 memcpy(msg_addr, (char *)page + index * 16, 4

[PATCH 2/4] qemu-kvm: Ioctl for MSIX MMIO support

2011-02-18 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 qemu-kvm.c |   14 ++
 qemu-kvm.h |7 +++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 49cd683..d282c95 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1050,6 +1050,20 @@ int kvm_assign_set_msix_entry(kvm_context_t kvm,
 }
 #endif
 
+#ifdef KVM_CAP_MSIX_MMIO
+int kvm_register_msix_mmio(kvm_context_t kvm,
+   struct kvm_msix_mmio_user *mmio_user)
+{
+return kvm_vm_ioctl(kvm_state, KVM_REGISTER_MSIX_MMIO, mmio_user);
+}
+
+int kvm_unregister_msix_mmio(kvm_context_t kvm,
+ struct kvm_msix_mmio_user *mmio_user)
+{
+return kvm_vm_ioctl(kvm_state, KVM_UNREGISTER_MSIX_MMIO, mmio_user);
+}
+#endif
+
 #if defined(KVM_CAP_IRQFD)  defined(CONFIG_EVENTFD)
 
 #include sys/eventfd.h
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 88cf276..48ff52d 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -602,6 +602,13 @@ int kvm_assign_set_msix_entry(kvm_context_t kvm,
   struct kvm_assigned_msix_entry *entry);
 #endif
 
+#ifdef KVM_CAP_MSIX_MMIO
+int kvm_register_msix_mmio(kvm_context_t kvm,
+   struct kvm_msix_mmio_user *mmio_user);
+int kvm_unregister_msix_mmio(kvm_context_t kvm,
+ struct kvm_msix_mmio_user *mmio_user);
+#endif
+
 #else   /* !CONFIG_KVM */
 
 typedef struct kvm_context *kvm_context_t;
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] qemu-kvm: Header file update for MSI-X MMIO support

2011-02-18 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 kvm/include/linux/kvm.h |   28 
 1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h
index e46729e..dcb8f54 100644
--- a/kvm/include/linux/kvm.h
+++ b/kvm/include/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
+#define KVM_EXIT_MSIX_ROUTING_UPDATE 19
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -264,6 +265,13 @@ struct kvm_run {
struct {
__u64 gprs[32];
} osi;
+   /* KVM_EXIT_MSIX_ROUTING_UPDATE*/
+   struct {
+   __u32 dev_id;
+   __u16 type;
+   __u16 entry_idx;
+   __u64 flags;
+   } msix_routing;
/* Fix the size of the union. */
char padding[256];
};
@@ -530,6 +538,7 @@ struct kvm_enable_cap {
 #ifdef __KVM_HAVE_XCRS
 #define KVM_CAP_XCRS 56
 #endif
+#define KVM_CAP_MSIX_MMIO 60
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -660,6 +669,9 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_MSIX_MMIO */
+#define KVM_REGISTER_MSIX_MMIO_IOW(KVMIO, 0x7d, struct kvm_msix_mmio_user)
+#define KVM_UNREGISTER_MSIX_MMIO  _IOW(KVMIO, 0x7e, struct kvm_msix_mmio_user)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -781,4 +793,20 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+
+#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff
+#define KVM_MSIX_MMIO_TYPE_BASE_MASK   0xff00
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type;
+   __u16 max_entries_nr;
+   __u64 base_addr;
+   __u64 base_va;
+   __u64 flags;
+   __u64 reserved[4];
+};
+
 #endif /* __LINUX_KVM_H */
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] qemu-kvm: device assignment: Enabling MSI-X according to the entries' mask bit

2011-02-18 Thread Sheng Yang
The old MSI-X enabling method assume the entries are written before MSI-X
enabled, but some OS didn't obey this, e.g. FreeBSD. This patch would fix
this.

Also, according to the PCI spec, mask bit of MSI-X table should be set
after reset.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 hw/device-assignment.c |  188 +---
 hw/device-assignment.h |2 +-
 2 files changed, 162 insertions(+), 28 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index e5205cf..5c162c4 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1146,15 +1146,12 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, 
unsigned int ctrl_pos)
 #endif
 
 #ifdef KVM_CAP_DEVICE_MSIX
-static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
+
+#define PCI_MSIX_CTRL_MASKBIT  1ul
+static int get_msix_entries_max_nr(AssignedDevice *adev)
 {
-AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
-uint16_t entries_nr = 0, entries_max_nr;
-int pos = 0, i, r = 0;
-uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
-struct kvm_assigned_msix_nr msix_nr;
-struct kvm_assigned_msix_entry msix_entry;
-void *va = adev-msix_table_page;
+int pos, entries_max_nr;
+PCIDevice *pci_dev = adev-dev;
 
 pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
@@ -1162,20 +1159,48 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 entries_max_nr = PCI_MSIX_TABSIZE;
 entries_max_nr += 1;
 
+return entries_max_nr;
+}
+
+static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int entry)
+{
+uint32_t msg_ctrl;
+void *va = adev-msix_table_page;
+
+memcpy(msg_ctrl, va + entry * 16 + 12, 4);
+return (msg_ctrl  PCI_MSIX_CTRL_MASKBIT);
+}
+
+static int get_msix_valid_entries_nr(AssignedDevice *adev,
+uint16_t entries_max_nr)
+{
+void *va = adev-msix_table_page;
+uint32_t msg_ctrl;
+uint16_t entries_nr = 0;
+int i;
+
 /* Get the usable entry number for allocating */
 for (i = 0; i  entries_max_nr; i++) {
 memcpy(msg_ctrl, va + i * 16 + 12, 4);
-memcpy(msg_data, va + i * 16 + 8, 4);
 /* Ignore unused entry even it's unmasked */
-if (msg_data == 0)
+if (assigned_dev_msix_entry_masked(adev, i))
 continue;
 entries_nr ++;
 }
+return entries_nr;
+}
+
+static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
+ uint16_t entries_nr,
+ uint16_t entries_max_nr)
+{
+AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
+int i, r = 0;
+uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
+struct kvm_assigned_msix_nr msix_nr;
+struct kvm_assigned_msix_entry msix_entry;
+void *va = adev-msix_table_page;
 
-if (entries_nr == 0) {
-fprintf(stderr, MSI-X entry number is zero!\n);
-return -EINVAL;
-}
 msix_nr.assigned_dev_id = calc_assigned_dev_id(adev-h_segnr, 
adev-h_busnr,
   (uint8_t)adev-h_devfn);
 msix_nr.entry_nr = entries_nr;
@@ -1187,6 +1212,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 }
 
 free_dev_irq_entries(adev);
+memset(pci_dev-msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
+sizeof(*pci_dev-msix_entry_used));
 adev-irq_entries_nr = entries_nr;
 adev-entry = calloc(entries_nr, sizeof(struct kvm_irq_routing_entry));
 if (!adev-entry) {
@@ -1200,10 +1227,10 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 if (entries_nr = msix_nr.entry_nr)
 break;
 memcpy(msg_ctrl, va + i * 16 + 12, 4);
-memcpy(msg_data, va + i * 16 + 8, 4);
-if (msg_data == 0)
+if (assigned_dev_msix_entry_masked(adev, i))
 continue;
 
+memcpy(msg_data, va + i * 16 + 8, 4);
 memcpy(msg_addr, va + i * 16, 4);
 memcpy(msg_upper_addr, va + i * 16 + 4, 4);
 
@@ -1217,17 +1244,18 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 adev-entry[entries_nr].u.msi.address_lo = msg_addr;
 adev-entry[entries_nr].u.msi.address_hi = msg_upper_addr;
 adev-entry[entries_nr].u.msi.data = msg_data;
-DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, msg_data, msg_addr);
-   kvm_add_routing_entry(adev-entry[entries_nr]);
+DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x!\n, msg_data, msg_addr);
+kvm_add_routing_entry(adev-entry[entries_nr]);
 
 msix_entry.gsi = adev-entry[entries_nr].gsi;
 msix_entry.entry = i;
+pci_dev-msix_entry_used[i] = 1;
 r = kvm_assign_set_msix_entry(kvm_context, msix_entry);
 if (r) {
 fprintf(stderr, fail to set MSI-X entry! %s\n, strerror(-r));
 break;
 }
-DEBUG(MSI-X

Re: [PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-01-31 Thread Sheng Yang
On Mon, Jan 31, 2011 at 03:09:09PM +0200, Avi Kivity wrote:
 On 01/30/2011 06:38 AM, Sheng Yang wrote:
 (Sorry, missed this mail...)
 
 On Mon, Jan 17, 2011 at 02:29:44PM +0200, Avi Kivity wrote:
   On 01/06/2011 12:19 PM, Sheng Yang wrote:
   Then we can support mask bit operation of assigned devices now.
   
   
   
   +int kvm_assigned_device_update_msix_mask_bit(struct kvm *kvm,
   +int assigned_dev_id, int entry, bool 
  mask)
   +{
   +int r = -EFAULT;
   +struct kvm_assigned_dev_kernel *adev;
   +int i;
   +
   +if (!irqchip_in_kernel(kvm))
   +return r;
   +
   +mutex_lock(kvm-lock);
   +adev = kvm_find_assigned_dev(kvm-arch.assigned_dev_head,
   +  assigned_dev_id);
   +if (!adev)
   +goto out;
   +
   +for (i = 0; i   adev-entries_nr; i++)
   +if (adev-host_msix_entries[i].entry == entry) {
   +if (mask)
   +disable_irq_nosync(
   +
  adev-host_msix_entries[i].vector);
 
   Is it okay to call disable_irq_nosync() here?  IIRC we don't check
   the mask bit on irq delivery, so we may forward an interrupt to the
   guest after the mask bit was set.
 
   What does pci say about the mask bit?  when does it take effect?
 
   Another question is whether disable_irq_nosync() actually programs
   the device mask bit, or not.  If it does, then it's slow, and it may
   be better to leave interrupts enabled but have an internal pending
   bit.  If it doesn't program the mask bit, it's fine.
 
 I think Michael and Jan had explained this.
 
   +else
   +
  enable_irq(adev-host_msix_entries[i].vector);
   +r = 0;
   +break;
   +}
   +out:
   +mutex_unlock(kvm-lock);
   +return r;
   +}
   
   +
   +static int msix_table_mmio_read(struct kvm_io_device *this, gpa_t addr, 
  int len,
   +void *val)
   +{
   +struct kvm_msix_mmio_dev *mmio_dev =
   +container_of(this, struct kvm_msix_mmio_dev, table_dev);
   +struct kvm_msix_mmio *mmio;
   +int idx, ret = 0, entry, offset, r;
   +
   +mutex_lock(mmio_dev-lock);
   +idx = get_mmio_table_index(mmio_dev, addr, len);
   +if (idx   0) {
   +ret = -EOPNOTSUPP;
   +goto out;
   +}
   +if ((addr   0x3) || (len != 4   len != 8))
   +goto out;
   +
   +offset = addr   0xf;
   +if (offset == PCI_MSIX_ENTRY_VECTOR_CTRL   len == 8)
   +goto out;
   +
   +mmio =mmio_dev-mmio[idx];
   +entry = (addr - mmio-table_base_addr) / PCI_MSIX_ENTRY_SIZE;
   +r = copy_from_user(val, (void __user *)(mmio-table_base_va +
   +entry * PCI_MSIX_ENTRY_SIZE + offset), len);
   +if (r)
   +goto out;
 
   and return ret == 0?
 
 Yes. This operation should be handled by in-kernel MSI-X MMIO. So we return 0
 in order to omit this action. We can add warning to it later.
 
 But it failed.  We need to return -EFAULT.

So it would return to QEmu. OK, let QEmu prints warning about it.

-- 
regards
Yang, Sheng
 
 The same as above.
 
   +
   +if ((offset   PCI_MSIX_ENTRY_VECTOR_CTRL   len == 4) ||
   +(offset   PCI_MSIX_ENTRY_DATA   len == 8))
   +ret = -ENOTSYNC;
 
   goto out?
 
 No. This judgement only check if MSI data/address was touched. And the line
 below would check if we need to operate mask bit. Because in theory guest can
 use len=8 to modify MSI-X data and ctrl at the same time.
 
 
 Ok, makes sense.
 
 -- 
 error compiling committee.c: too many arguments to function
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: Add documents for MSI-X MMIO API

2011-01-31 Thread Sheng Yang
On Mon, Jan 31, 2011 at 03:24:27PM +0200, Avi Kivity wrote:
 On 01/26/2011 11:05 AM, Sheng Yang wrote:
 On Tuesday 25 January 2011 20:47:38 Avi Kivity wrote:
   On 01/19/2011 10:21 AM, Sheng Yang wrote:
   We already got an guest MMIO address for that in the exit
   information. I've created a chain of handler in qemu to handle 
  it.
 
But we already decoded the table and entry...
   
 But the handler is still wrapped by vcpu_mmio_write(), as a part of 
  MMIO.
 So it's not quite handy to get the table and entry out.
 
   The kernel handler can create a new kvm_run exit description.
 
   Also the updater in the userspace
   
 can share the most logic with ordinary userspace MMIO handler, which 
  take
 address as parameter. So I think we don't need to pass the decoded
 table_id and entry to userspace.
 
   It's mixing layers, which always leads to trouble.  For one, the user
   handler shouldn't do anything with the write since the kernel already
   wrote it into the table.  For another, if two vcpus write to the same
   entry simultaneously, you could see different ordering in the kernel and
   userspace, and get inconsistent results.
 
 The shared logic is not about writing, but about interpret what's written. 
 Old
 MMIO handler would write the data, then interpret it; and our new MMIO would 
 only
 share the logic of interpretation. I think that's fair enough?
 
 It dosn't make sense for an API point of view.  You registered a
 table of entries, you expect an exit on that table to point to the
 table and entry that got changed.

OK, I would update this when I come back to work(about two weeks later...).

-- 
regards
Yang, Sheng

 
 -- 
 error compiling committee.c: too many arguments to function
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-01-29 Thread Sheng Yang
(Sorry, missed this mail...)

On Mon, Jan 17, 2011 at 02:29:44PM +0200, Avi Kivity wrote:
 On 01/06/2011 12:19 PM, Sheng Yang wrote:
 Then we can support mask bit operation of assigned devices now.
 
 
 
 +int kvm_assigned_device_update_msix_mask_bit(struct kvm *kvm,
 +int assigned_dev_id, int entry, bool mask)
 +{
 +int r = -EFAULT;
 +struct kvm_assigned_dev_kernel *adev;
 +int i;
 +
 +if (!irqchip_in_kernel(kvm))
 +return r;
 +
 +mutex_lock(kvm-lock);
 +adev = kvm_find_assigned_dev(kvm-arch.assigned_dev_head,
 +  assigned_dev_id);
 +if (!adev)
 +goto out;
 +
 +for (i = 0; i  adev-entries_nr; i++)
 +if (adev-host_msix_entries[i].entry == entry) {
 +if (mask)
 +disable_irq_nosync(
 +adev-host_msix_entries[i].vector);
 
 Is it okay to call disable_irq_nosync() here?  IIRC we don't check
 the mask bit on irq delivery, so we may forward an interrupt to the
 guest after the mask bit was set.
 
 What does pci say about the mask bit?  when does it take effect?
 
 Another question is whether disable_irq_nosync() actually programs
 the device mask bit, or not.  If it does, then it's slow, and it may
 be better to leave interrupts enabled but have an internal pending
 bit.  If it doesn't program the mask bit, it's fine.

I think Michael and Jan had explained this.
 
 +else
 +enable_irq(adev-host_msix_entries[i].vector);
 +r = 0;
 +break;
 +}
 +out:
 +mutex_unlock(kvm-lock);
 +return r;
 +}
 
 +
 +static int msix_table_mmio_read(struct kvm_io_device *this, gpa_t addr, int 
 len,
 +void *val)
 +{
 +struct kvm_msix_mmio_dev *mmio_dev =
 +container_of(this, struct kvm_msix_mmio_dev, table_dev);
 +struct kvm_msix_mmio *mmio;
 +int idx, ret = 0, entry, offset, r;
 +
 +mutex_lock(mmio_dev-lock);
 +idx = get_mmio_table_index(mmio_dev, addr, len);
 +if (idx  0) {
 +ret = -EOPNOTSUPP;
 +goto out;
 +}
 +if ((addr  0x3) || (len != 4  len != 8))
 +goto out;
 +
 +offset = addr  0xf;
 +if (offset == PCI_MSIX_ENTRY_VECTOR_CTRL  len == 8)
 +goto out;
 +
 +mmio =mmio_dev-mmio[idx];
 +entry = (addr - mmio-table_base_addr) / PCI_MSIX_ENTRY_SIZE;
 +r = copy_from_user(val, (void __user *)(mmio-table_base_va +
 +entry * PCI_MSIX_ENTRY_SIZE + offset), len);
 +if (r)
 +goto out;
 
 and return ret == 0?

Yes. This operation should be handled by in-kernel MSI-X MMIO. So we return 0
in order to omit this action. We can add warning to it later.
 
 +out:
 +mutex_unlock(mmio_dev-lock);
 +return ret;
 +}
 +
 +static int msix_table_mmio_write(struct kvm_io_device *this, gpa_t addr,
 +int len, const void *val)
 +{
 +struct kvm_msix_mmio_dev *mmio_dev =
 +container_of(this, struct kvm_msix_mmio_dev, table_dev);
 +struct kvm_msix_mmio *mmio;
 +int idx, entry, offset, ret = 0, r = 0;
 +gpa_t entry_base;
 +u32 old_ctrl, new_ctrl;
 +u32 *ctrl_pos;
 +
 +mutex_lock(mmio_dev-lock);
 +idx = get_mmio_table_index(mmio_dev, addr, len);
 +if (idx  0) {
 +ret = -EOPNOTSUPP;
 +goto out;
 +}
 +if ((addr  0x3) || (len != 4  len != 8))
 +goto out;
 +
 +offset = addr  0xF;
 +if (offset == PCI_MSIX_ENTRY_VECTOR_CTRL  len == 8)
 +goto out;
 +
 +mmio =mmio_dev-mmio[idx];
 +entry = (addr - mmio-table_base_addr) / PCI_MSIX_ENTRY_SIZE;
 +entry_base = mmio-table_base_va + entry * PCI_MSIX_ENTRY_SIZE;
 +ctrl_pos = (u32 *)(entry_base + PCI_MSIX_ENTRY_VECTOR_CTRL);
 +
 +if (get_user(old_ctrl, ctrl_pos))
 +goto out;
 +
 +/* No allow writing to other fields when entry is unmasked */
 +if (!(old_ctrl  PCI_MSIX_ENTRY_CTRL_MASKBIT)
 +offset != PCI_MSIX_ENTRY_VECTOR_CTRL)
 +goto out;
 +
 +if (copy_to_user((void __user *)(entry_base + offset), val, len))
 +goto out;
 +
 +if (get_user(new_ctrl, ctrl_pos))
 +goto out;
 
 here, too.

The same as above.
 
 +
 +if ((offset  PCI_MSIX_ENTRY_VECTOR_CTRL  len == 4) ||
 +(offset  PCI_MSIX_ENTRY_DATA  len == 8))
 +ret = -ENOTSYNC;
 
 goto out?

No. This judgement only check if MSI data/address was touched. And the line
below would check if we need to operate mask bit. Because in theory guest can
use len=8 to modify MSI-X data and ctrl at the same time.

-- 
regards
Yang, Sheng
 
 +if (old_ctrl == new_ctrl)
 +goto out;
 +if (!(old_ctrl  PCI_MSIX_ENTRY_CTRL_MASKBIT)
 +(new_ctrl  PCI_MSIX_ENTRY_CTRL_MASKBIT))
 +r = update_msix_mask_bit(mmio_dev-kvm

[PATCH 3/3] KVM: Add documents for MSI-X MMIO API

2011-01-29 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 Documentation/kvm/api.txt |   47 +
 1 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index e1a9297..e6b7a1d 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1263,6 +1263,53 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+4.54 KVM_REGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API indicates an MSI-X MMIO address of a guest device. Then all MMIO
+operation would be handled by kernel. When necessary(e.g. MSI data/address
+changed), KVM would exit to userspace using KVM_EXIT_MSIX_ROUTING_UPDATE to
+indicate the MMIO modification and require userspace to update IRQ routing
+table.
+
+NOTICE: Writing the MSI-X MMIO page after it was registered with this API may
+be dangerous for userspace program. The writing during VM running may result
+in synchronization issue therefore the assigned device can't work properly.
+The writing is allowed when VM is not running and can be used as save/restore
+mechanism.
+
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type; /* Device type and MMIO address type */
+   __u16 max_entries_nr;   /* Maximum entries supported */
+   __u64 base_addr;/* Guest physical address of MMIO */
+   __u64 base_va;  /* Host virtual address of MMIO mapping */
+   __u64 flags;/* Reserved for now */
+   __u64 reserved[4];
+};
+
+Current device type can be:
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+Current MMIO type can be:
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+
+4.55 KVM_UNREGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API would unregister the specific MSI-X MMIO, indicated by dev_id and
+type fields of struct kvm_msix_mmio_user.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: Move struct kvm_io_device to kvm_host.h

2011-01-29 Thread Sheng Yang
Then it can be used by other struct in kvm_host.h

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |   23 +++
 virt/kvm/iodev.h |   25 +
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b5021db..7d313e0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -98,6 +98,29 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
+struct kvm_io_device;
+
+/**
+ * kvm_io_device_ops are called under kvm slots_lock.
+ * read and write handlers return 0 if the transaction has been handled,
+ * or non-zero to have it passed to the next device.
+ **/
+struct kvm_io_device_ops {
+   int (*read)(struct kvm_io_device *this,
+   gpa_t addr,
+   int len,
+   void *val);
+   int (*write)(struct kvm_io_device *this,
+gpa_t addr,
+int len,
+const void *val);
+   void (*destructor)(struct kvm_io_device *this);
+};
+
+struct kvm_io_device {
+   const struct kvm_io_device_ops *ops;
+};
+
 struct kvm_vcpu {
struct kvm *kvm;
 #ifdef CONFIG_PREEMPT_NOTIFIERS
diff --git a/virt/kvm/iodev.h b/virt/kvm/iodev.h
index 12fd3ca..d1f5651 100644
--- a/virt/kvm/iodev.h
+++ b/virt/kvm/iodev.h
@@ -17,32 +17,9 @@
 #define __KVM_IODEV_H__
 
 #include linux/kvm_types.h
+#include linux/kvm_host.h
 #include asm/errno.h
 
-struct kvm_io_device;
-
-/**
- * kvm_io_device_ops are called under kvm slots_lock.
- * read and write handlers return 0 if the transaction has been handled,
- * or non-zero to have it passed to the next device.
- **/
-struct kvm_io_device_ops {
-   int (*read)(struct kvm_io_device *this,
-   gpa_t addr,
-   int len,
-   void *val);
-   int (*write)(struct kvm_io_device *this,
-gpa_t addr,
-int len,
-const void *val);
-   void (*destructor)(struct kvm_io_device *this);
-};
-
-
-struct kvm_io_device {
-   const struct kvm_io_device_ops *ops;
-};
-
 static inline void kvm_iodevice_init(struct kvm_io_device *dev,
 const struct kvm_io_device_ops *ops)
 {
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-01-29 Thread Sheng Yang
Then we can support mask bit operation of assigned devices now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/Makefile|2 +-
 arch/x86/kvm/x86.c   |8 +-
 include/linux/kvm.h  |   21 
 include/linux/kvm_host.h |   25 
 virt/kvm/assigned-dev.c  |   44 +++
 virt/kvm/kvm_main.c  |   38 ++-
 virt/kvm/msix_mmio.c |  286 ++
 virt/kvm/msix_mmio.h |   25 
 8 files changed, 442 insertions(+), 7 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
 
 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o)
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa708c9..89bf12c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -3807,6 +3808,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
   struct kvm_vcpu *vcpu)
 {
gpa_t gpa;
+   int r;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3822,14 +3824,16 @@ static int emulator_write_emulated_onepage(unsigned 
long addr,
 
 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
+   r = vcpu_mmio_write(vcpu, gpa, bytes, val);
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!r)
return X86EMUL_CONTINUE;
 
vcpu-mmio_needed = 1;
-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
+   KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
vcpu-run-mmio.len = vcpu-mmio_size = bytes;
vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..ad9df4b 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
+#define KVM_EXIT_MSIX_ROUTING_UPDATE 19
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -541,6 +542,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
 #define KVM_CAP_ASYNC_PF 59
+#define KVM_CAP_MSIX_MMIO 60
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -672,6 +674,9 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_MSIX_MMIO */
+#define KVM_REGISTER_MSIX_MMIO_IOW(KVMIO,  0x7d, struct kvm_msix_mmio_user)
+#define KVM_UNREGISTER_MSIX_MMIO  _IOW(KVMIO,  0x7e, struct kvm_msix_mmio_user)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -795,4 +800,20 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+
+#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff
+#define KVM_MSIX_MMIO_TYPE_BASE_MASK   0xff00
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type;
+   __u16 max_entries_nr;
+   __u64 base_addr;
+   __u64 base_va;
+   __u64 flags;
+   __u64 reserved[4];
+};
+
 #endif /* __LINUX_KVM_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7d313e0..c10670c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -233,6 +233,27 @@ struct kvm_memslots {
KVM_PRIVATE_MEM_SLOTS];
 };
 
+#define KVM_MSIX_MMIO_MAX32
+
+struct kvm_msix_mmio {
+   u32 dev_id;
+   u16 type;
+   u16 max_entries_nr;
+   u64 flags;
+   gpa_t table_base_addr;
+   hva_t table_base_va;
+   gpa_t pba_base_addr;
+   hva_t pba_base_va

[PATCH 0/3 v8] MSI-X MMIO support for KVM

2011-01-29 Thread Sheng Yang
Change from v7:
Update according to Marcelo and Avi's comments.

BTW: I would be on vacation for Chinese New Year soon, and would be back mid 
Feb.

Sheng Yang (3):
  KVM: Move struct kvm_io_device to kvm_host.h
  KVM: Emulate MSI-X table in kernel
  KVM: Add documents for MSI-X MMIO API

 Documentation/kvm/api.txt |   47 
 arch/x86/kvm/Makefile |2 +-
 arch/x86/kvm/x86.c|8 +-
 include/linux/kvm.h   |   21 
 include/linux/kvm_host.h  |   48 
 virt/kvm/assigned-dev.c   |   44 +++
 virt/kvm/iodev.h  |   25 +
 virt/kvm/kvm_main.c   |   38 ++-
 virt/kvm/msix_mmio.c  |  286 +
 virt/kvm/msix_mmio.h  |   25 
 10 files changed, 513 insertions(+), 31 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4 v8] qemu-kvm: MSI-X MMIO support for assigned device

2011-01-29 Thread Sheng Yang
Update with kernel patches v8.

Sheng Yang (4):
  qemu-kvm: device assignment: Enabling MSI-X according to the entries'
mask bit
  qemu-kvm: Ioctl for MSIX MMIO support
  qemu-kvm: Header file update for MSI-X MMIO support
  qemu-kvm: MSI-X MMIO support for assigned device

 hw/device-assignment.c  |  275 --
 hw/device-assignment.h  |5 +-
 kvm/include/linux/kvm.h |   21 
 qemu-kvm.c  |   54 +
 qemu-kvm.h  |   18 +++
 5 files changed, 336 insertions(+), 37 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] qemu-kvm: MSI-X MMIO support for assigned device

2011-01-29 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 hw/device-assignment.c |   93 +--
 hw/device-assignment.h |3 ++
 qemu-kvm.c |   40 
 qemu-kvm.h |   11 ++
 4 files changed, 135 insertions(+), 12 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index f81050f..bddee2a 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -70,6 +70,11 @@ static void assigned_device_pci_cap_write_config(PCIDevice 
*pci_dev,
 static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
 uint32_t address, int len);
 
+static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
+{
+return (uint32_t)seg  16 | (uint32_t)bus  8 | (uint32_t)devfn;
+}
+
 static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
uint32_t addr, int len, uint32_t *val)
 {
@@ -272,6 +277,10 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int 
region_num,
 AssignedDevRegion *region = r_dev-v_addrs[region_num];
 PCIRegion *real_region = r_dev-real_device.regions[region_num];
 int ret = 0;
+#ifdef KVM_CAP_MSIX_MMIO
+int cap_mask = kvm_check_extension(kvm_state, KVM_CAP_MSIX_MMIO);
+struct kvm_msix_mmio_user msix_mmio;
+#endif
 
 DEBUG(e_phys=%08 FMT_PCIBUS  r_virt=%p type=%d len=%08 FMT_PCIBUS  
region_num=%d \n,
   e_phys, region-u.r_virtbase, type, e_size, region_num);
@@ -290,6 +299,23 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int 
region_num,
 
 cpu_register_physical_memory(e_phys + offset,
 TARGET_PAGE_SIZE, r_dev-mmio_index);
+#ifdef KVM_CAP_MSIX_MMIO
+if (cap_mask) {
+r_dev-guest_msix_table_addr = e_phys + offset;
+memset(msix_mmio, 0, sizeof msix_mmio);
+msix_mmio.dev_id = calc_assigned_dev_id(r_dev-h_segnr,
+r_dev-h_busnr, r_dev-h_devfn);
+msix_mmio.type = KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV |
+   KVM_MSIX_MMIO_TYPE_BASE_TABLE;
+msix_mmio.base_addr = e_phys + offset;
+msix_mmio.base_va = (unsigned long)r_dev-msix_table_page;
+msix_mmio.max_entries_nr = r_dev-max_msix_entries_nr;
+msix_mmio.flags = 0;
+ret = kvm_register_msix_mmio(kvm_context, msix_mmio);
+if (ret)
+fprintf(stderr, fail to register in-kernel msix_mmio!\n);
+}
+#endif
 }
 }
 
@@ -852,11 +878,6 @@ static void free_assigned_device(AssignedDevice *dev)
 }
 }
 
-static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
-{
-return (uint32_t)seg  16 | (uint32_t)bus  8 | (uint32_t)devfn;
-}
-
 static void assign_failed_examine(AssignedDevice *dev)
 {
 char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
@@ -1263,6 +1284,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev,
 return r;
 }
 
+static int assigned_dev_update_routing_handler(void *opaque, unsigned long 
addr);
+
 static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
 {
 struct kvm_assigned_irq assigned_irq_data;
@@ -1486,7 +1509,9 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 msix_table_entry = pci_get_long(pci_dev-config + pos + 
PCI_MSIX_TABLE);
 bar_nr = msix_table_entry  PCI_MSIX_BIR;
 msix_table_entry = ~PCI_MSIX_BIR;
-dev-msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
+dev-msix_table_addr = pci_region[bar_nr].base_addr +
+   msix_table_entry;
+
 dev-max_msix_entries_nr = get_msix_entries_max_nr(dev);
 }
 #endif
@@ -1670,8 +1695,7 @@ static uint32_t msix_mmio_readw(void *opaque, 
target_phys_addr_t addr)
 (8 * (addr  3)))  0x;
 }
 
-static void msix_mmio_writel(void *opaque,
- target_phys_addr_t addr, uint32_t val)
+static void assigned_dev_update_routing(void *opaque, unsigned long addr)
 {
 AssignedDevice *adev = opaque;
 unsigned int offset = addr  0xfff;
@@ -1683,10 +1707,6 @@ static void msix_mmio_writel(void *opaque,
 struct PCIDevice *pci_dev = adev-dev;
 uint8_t cap = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
-DEBUG(write to MSI-X entry table mmio offset 0x%lx, val 0x%x\n,
-   addr, val);
-memcpy((void *)((char *)page + offset), val, 4);
-
 index = offset / 16;
 
 /* Check if mask bit is being accessed */
@@ -1762,6 +1782,41 @@ static void msix_mmio_writel(void *opaque,
 adev-entry[entry_idx].u.msi.data = msg_data;
 }
 
+static int assigned_dev_update_routing_handler(void *opaque, unsigned long 
addr)
+{
+AssignedDevice *adev = opaque;
+
+if (addr = adev-guest_msix_table_addr

[PATCH 2/4] qemu-kvm: Ioctl for MSIX MMIO support

2011-01-29 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 qemu-kvm.c |   14 ++
 qemu-kvm.h |7 +++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 471306b..956b62a 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1050,6 +1050,20 @@ int kvm_assign_set_msix_entry(kvm_context_t kvm,
 }
 #endif
 
+#ifdef KVM_CAP_MSIX_MMIO
+int kvm_register_msix_mmio(kvm_context_t kvm,
+   struct kvm_msix_mmio_user *mmio_user)
+{
+return kvm_vm_ioctl(kvm_state, KVM_REGISTER_MSIX_MMIO, mmio_user);
+}
+
+int kvm_unregister_msix_mmio(kvm_context_t kvm,
+ struct kvm_msix_mmio_user *mmio_user)
+{
+return kvm_vm_ioctl(kvm_state, KVM_UNREGISTER_MSIX_MMIO, mmio_user);
+}
+#endif
+
 #if defined(KVM_CAP_IRQFD)  defined(CONFIG_EVENTFD)
 
 #include sys/eventfd.h
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 7e6edfb..86799e6 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -602,6 +602,13 @@ int kvm_assign_set_msix_entry(kvm_context_t kvm,
   struct kvm_assigned_msix_entry *entry);
 #endif
 
+#ifdef KVM_CAP_MSIX_MMIO
+int kvm_register_msix_mmio(kvm_context_t kvm,
+   struct kvm_msix_mmio_user *mmio_user);
+int kvm_unregister_msix_mmio(kvm_context_t kvm,
+ struct kvm_msix_mmio_user *mmio_user);
+#endif
+
 #else   /* !CONFIG_KVM */
 
 typedef struct kvm_context *kvm_context_t;
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] qemu-kvm: Header file update for MSI-X MMIO support

2011-01-29 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 kvm/include/linux/kvm.h |   21 +
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h
index e46729e..7b6d5b9 100644
--- a/kvm/include/linux/kvm.h
+++ b/kvm/include/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
+#define KVM_EXIT_MSIX_ROUTING_UPDATE 19
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -530,6 +531,7 @@ struct kvm_enable_cap {
 #ifdef __KVM_HAVE_XCRS
 #define KVM_CAP_XCRS 56
 #endif
+#define KVM_CAP_MSIX_MMIO 60
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -660,6 +662,9 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_MSIX_MMIO */
+#define KVM_REGISTER_MSIX_MMIO_IOW(KVMIO, 0x7d, struct kvm_msix_mmio_user)
+#define KVM_UNREGISTER_MSIX_MMIO  _IOW(KVMIO, 0x7e, struct kvm_msix_mmio_user)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -781,4 +786,20 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+
+#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff
+#define KVM_MSIX_MMIO_TYPE_BASE_MASK   0xff00
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type;
+   __u16 max_entries_nr;
+   __u64 base_addr;
+   __u64 base_va;
+   __u64 flags;
+   __u64 reserved[4];
+};
+
 #endif /* __LINUX_KVM_H */
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] qemu-kvm: device assignment: Enabling MSI-X according to the entries' mask bit

2011-01-29 Thread Sheng Yang
The old MSI-X enabling method assume the entries are written before MSI-X
enabled, but some OS didn't obey this, e.g. FreeBSD. This patch would fix
this.

Also, according to the PCI spec, mask bit of MSI-X table should be set
after reset.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 hw/device-assignment.c |  188 +---
 hw/device-assignment.h |2 +-
 2 files changed, 162 insertions(+), 28 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 8446cd4..f81050f 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1141,15 +1141,12 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, 
unsigned int ctrl_pos)
 #endif
 
 #ifdef KVM_CAP_DEVICE_MSIX
-static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
+
+#define PCI_MSIX_CTRL_MASKBIT  1ul
+static int get_msix_entries_max_nr(AssignedDevice *adev)
 {
-AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
-uint16_t entries_nr = 0, entries_max_nr;
-int pos = 0, i, r = 0;
-uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
-struct kvm_assigned_msix_nr msix_nr;
-struct kvm_assigned_msix_entry msix_entry;
-void *va = adev-msix_table_page;
+int pos, entries_max_nr;
+PCIDevice *pci_dev = adev-dev;
 
 pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
@@ -1157,20 +1154,48 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 entries_max_nr = PCI_MSIX_TABSIZE;
 entries_max_nr += 1;
 
+return entries_max_nr;
+}
+
+static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int entry)
+{
+uint32_t msg_ctrl;
+void *va = adev-msix_table_page;
+
+memcpy(msg_ctrl, va + entry * 16 + 12, 4);
+return (msg_ctrl  PCI_MSIX_CTRL_MASKBIT);
+}
+
+static int get_msix_valid_entries_nr(AssignedDevice *adev,
+uint16_t entries_max_nr)
+{
+void *va = adev-msix_table_page;
+uint32_t msg_ctrl;
+uint16_t entries_nr = 0;
+int i;
+
 /* Get the usable entry number for allocating */
 for (i = 0; i  entries_max_nr; i++) {
 memcpy(msg_ctrl, va + i * 16 + 12, 4);
-memcpy(msg_data, va + i * 16 + 8, 4);
 /* Ignore unused entry even it's unmasked */
-if (msg_data == 0)
+if (assigned_dev_msix_entry_masked(adev, i))
 continue;
 entries_nr ++;
 }
+return entries_nr;
+}
+
+static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
+ uint16_t entries_nr,
+ uint16_t entries_max_nr)
+{
+AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
+int i, r = 0;
+uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
+struct kvm_assigned_msix_nr msix_nr;
+struct kvm_assigned_msix_entry msix_entry;
+void *va = adev-msix_table_page;
 
-if (entries_nr == 0) {
-fprintf(stderr, MSI-X entry number is zero!\n);
-return -EINVAL;
-}
 msix_nr.assigned_dev_id = calc_assigned_dev_id(adev-h_segnr, 
adev-h_busnr,
   (uint8_t)adev-h_devfn);
 msix_nr.entry_nr = entries_nr;
@@ -1182,6 +1207,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 }
 
 free_dev_irq_entries(adev);
+memset(pci_dev-msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
+sizeof(*pci_dev-msix_entry_used));
 adev-irq_entries_nr = entries_nr;
 adev-entry = calloc(entries_nr, sizeof(struct kvm_irq_routing_entry));
 if (!adev-entry) {
@@ -1195,10 +1222,10 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 if (entries_nr = msix_nr.entry_nr)
 break;
 memcpy(msg_ctrl, va + i * 16 + 12, 4);
-memcpy(msg_data, va + i * 16 + 8, 4);
-if (msg_data == 0)
+if (assigned_dev_msix_entry_masked(adev, i))
 continue;
 
+memcpy(msg_data, va + i * 16 + 8, 4);
 memcpy(msg_addr, va + i * 16, 4);
 memcpy(msg_upper_addr, va + i * 16 + 4, 4);
 
@@ -1212,17 +1239,18 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 adev-entry[entries_nr].u.msi.address_lo = msg_addr;
 adev-entry[entries_nr].u.msi.address_hi = msg_upper_addr;
 adev-entry[entries_nr].u.msi.data = msg_data;
-DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, msg_data, msg_addr);
-   kvm_add_routing_entry(adev-entry[entries_nr]);
+DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x!\n, msg_data, msg_addr);
+kvm_add_routing_entry(adev-entry[entries_nr]);
 
 msix_entry.gsi = adev-entry[entries_nr].gsi;
 msix_entry.entry = i;
+pci_dev-msix_entry_used[i] = 1;
 r = kvm_assign_set_msix_entry(kvm_context, msix_entry);
 if (r) {
 fprintf(stderr, fail to set MSI-X entry! %s\n, strerror(-r));
 break;
 }
-DEBUG(MSI-X

Re: [PATCH 3/3] KVM: Add documents for MSI-X MMIO API

2011-01-26 Thread Sheng Yang
On Tuesday 25 January 2011 20:47:38 Avi Kivity wrote:
 On 01/19/2011 10:21 AM, Sheng Yang wrote:
  We already got an guest MMIO address for that in the exit
  information. I've created a chain of handler in qemu to handle it.

But we already decoded the table and entry...
  
  But the handler is still wrapped by vcpu_mmio_write(), as a part of MMIO.
  So it's not quite handy to get the table and entry out.
 
 The kernel handler can create a new kvm_run exit description.
 
Also the updater in the userspace
  
  can share the most logic with ordinary userspace MMIO handler, which take
  address as parameter. So I think we don't need to pass the decoded
  table_id and entry to userspace.
 
 It's mixing layers, which always leads to trouble.  For one, the user
 handler shouldn't do anything with the write since the kernel already
 wrote it into the table.  For another, if two vcpus write to the same
 entry simultaneously, you could see different ordering in the kernel and
 userspace, and get inconsistent results.

The shared logic is not about writing, but about interpret what's written. Old 
MMIO handler would write the data, then interpret it; and our new MMIO would 
only 
share the logic of interpretation. I think that's fair enough?

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: Add documents for MSI-X MMIO API

2011-01-19 Thread Sheng Yang
On Monday 17 January 2011 20:45:55 Avi Kivity wrote:
 On 01/17/2011 02:35 PM, Sheng Yang wrote:
  On Monday 17 January 2011 20:21:45 Avi Kivity wrote:
On 01/06/2011 12:19 PM, Sheng Yang wrote:
  Signed-off-by: Sheng Yangsh...@linux.intel.com
  ---
  
Documentation/kvm/api.txt |   41
+ 1 files changed, 41
insertions(+), 0 deletions(-)
  
  diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
  index e1a9297..4978b94 100644
  --- a/Documentation/kvm/api.txt
  +++ b/Documentation/kvm/api.txt
  @@ -1263,6 +1263,47 @@ struct kvm_assigned_msix_entry {
  
__u16 padding[3];

};
  
  +4.54 KVM_REGISTER_MSIX_MMIO
  +
  +Capability: KVM_CAP_MSIX_MMIO
  +Architectures: x86
  +Type: vm ioctl
  +Parameters: struct kvm_msix_mmio_user (in)
  +Returns: 0 on success, -1 on error
  +
  +This API indicates an MSI-X MMIO address of a guest device. Then
  all MMIO +operation would be handled by kernel. When
  necessary(e.g. MSI data/address +changed), KVM would exit to
  userspace using
  KVM_EXIT_MSIX_ROUTING_UPDATE to +indicate the MMIO modification and
  require userspace to update IRQ routing +table.
  +
  +struct kvm_msix_mmio_user {
  + __u32 dev_id;
  + __u16 type; /* Device type and MMIO address type */
  + __u16 max_entries_nr;   /* Maximum entries supported */
  + __u64 base_addr;/* Guest physical address of MMIO */
  + __u64 base_va;  /* Host virtual address of MMIO mapping 
   */
  + __u64 flags;/* Reserved for now */
  + __u64 reserved[4];
  +};
  +
  +Current device type can be:
  +#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV  (1   0)
  +
  +Current MMIO type can be:
  +#define KVM_MSIX_MMIO_TYPE_BASE_TABLE(1   8)
  +

How does userspace know which entry of which table changed?  Need a
field in struct kvm_run for that.
  
  We already got an guest MMIO address for that in the exit information.
  I've created a chain of handler in qemu to handle it.
 
 But we already decoded the table and entry...

But the handler is still wrapped by vcpu_mmio_write(), as a part of MMIO. So 
it's 
not quite handy to get the table and entry out. Also the updater in the 
userspace 
can share the most logic with ordinary userspace MMIO handler, which take 
address 
as parameter. So I think we don't need to pass the decoded table_id and entry 
to 
userspace.

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-01-19 Thread Sheng Yang
On Monday 17 January 2011 20:39:30 Marcelo Tosatti wrote:
 On Mon, Jan 17, 2011 at 08:18:22PM +0800, Sheng Yang wrote:
+   goto out;
+
+   mmio = mmio_dev-mmio[idx];
+   entry = (addr - mmio-table_base_addr) / PCI_MSIX_ENTRY_SIZE;
+   entry_base = mmio-table_base_va + entry * PCI_MSIX_ENTRY_SIZE;
+   ctrl_pos = (u32 *)(entry_base + PCI_MSIX_ENTRY_VECTOR_CTRL);
+
+   if (get_user(old_ctrl, ctrl_pos))
+   goto out;
+
+   /* No allow writing to other fields when entry is unmasked */
+   if (!(old_ctrl  PCI_MSIX_ENTRY_CTRL_MASKBIT) 
+   offset != PCI_MSIX_ENTRY_VECTOR_CTRL)
+   goto out;
+
+   if (copy_to_user((void __user *)(entry_base + offset), val, 
len))
+   goto out;
   
   Instead of copying to/from userspace (which is subject to swapin,
   unexpected values), you could include the guest written value in a
   kvm_run structure, along with address. Qemu-kvm would use that to
   synchronize its copy of the table, on KVM_EXIT_MSIX_ROUTING_UPDATE
   exit.
  
  We want to acelerate MSI-X mask bit accessing, which won't exit to
  userspace in the most condition. That's the cost we want to optimize.
  Also it's possible to userspace to read the correct value of MMIO(but
  mostly userspace can't write to it in order to prevent synchronize
  issue).
 
 Yes, i meant exit to userspace only when necessary, but instead of
 copying directly everytime record the value the guest has written in
 kvm_run and exit with KVM_EXIT_MSIX_ROUTING_UPDATE.
 
+   if (get_user(new_ctrl, ctrl_pos))
+   goto out;
+
+   if ((offset  PCI_MSIX_ENTRY_VECTOR_CTRL  len == 4) ||
+   (offset  PCI_MSIX_ENTRY_DATA  len == 8))
+   ret = -ENOTSYNC;
+   if (old_ctrl == new_ctrl)
+   goto out;
+   if (!(old_ctrl  PCI_MSIX_ENTRY_CTRL_MASKBIT) 
+   (new_ctrl  PCI_MSIX_ENTRY_CTRL_MASKBIT))
+   r = update_msix_mask_bit(mmio_dev-kvm, mmio, entry, 1);
+   else if ((old_ctrl  PCI_MSIX_ENTRY_CTRL_MASKBIT) 
+   !(new_ctrl  PCI_MSIX_ENTRY_CTRL_MASKBIT))
+   r = update_msix_mask_bit(mmio_dev-kvm, mmio, entry, 0);
   
   Then you rely on the kernel copy of the values to enable/disable irq.
  
  Yes, they are guest owned assigned IRQs. Any potential issue?
 
 Nothing prevents the kernel from enabling or disabling irq multiple
 times with this code (what prevents it is a qemu-kvm that behaves as
 expected). This is not very good.
 
 Perhaps the guest can only harm itself with that, but i'm not sure.

MSI-X interrupts are not shared, so I think guest can only harm itself if it 
was 
doing it wrong.
 
 And also if an msix table page is swapped out guest will hang.
 
+   return r;
+}
   
   This is not consistent with registration, where there are no checks
   regarding validity of assigned device id. So why is it necessary?
  
  I am not quite understand. We need to free mmio anyway, otherwise it
  would result in wrong mmio interception...
 
 If you return -EOPNOTSUPP in case kvm_find_assigned_dev fails in the
 read/write paths, there is no need to free anything.

It may work with assigned devices, but the potential user of this is including 
vfio 
drivers and emulate devices. And I don't think it's a good idea to have 
registeration process but no free process...

--
regards
Yang, Sheng

 
   There is a lock ordering problem BTW:
   
   - read/write handlers: dev-lock - kvm-lock
   - vm_ioctl_deassign_device - kvm_free_msix_mmio: kvm-lock -
   dev-lock
  
  Good catch! Would fix it(and other comments of course).
  
  --
  regards
  Yang, Sheng
  
+
+int kvm_vm_ioctl_unregister_msix_mmio(struct kvm *kvm,
+ struct kvm_msix_mmio_user 
*mmio_user)
+{
+   struct kvm_msix_mmio mmio;
+
+   mmio.dev_id = mmio_user-dev_id;
+   mmio.type = mmio_user-type;
+
+   return kvm_free_msix_mmio(kvm, mmio);
+}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-01-17 Thread Sheng Yang
On Monday 17 January 2011 19:54:47 Marcelo Tosatti wrote:
 On Thu, Jan 06, 2011 at 06:19:44PM +0800, Sheng Yang wrote:
  Then we can support mask bit operation of assigned devices now.
  
  Signed-off-by: Sheng Yang sh...@linux.intel.com
  ---
  
   arch/x86/kvm/Makefile|2 +-
   arch/x86/kvm/x86.c   |8 +-
   include/linux/kvm.h  |   21 
   include/linux/kvm_host.h |   25 
   virt/kvm/assigned-dev.c  |   44 +++
   virt/kvm/kvm_main.c  |   38 ++-
   virt/kvm/msix_mmio.c |  284
   ++ virt/kvm/msix_mmio.h
   |   25 
   8 files changed, 440 insertions(+), 7 deletions(-)
   create mode 100644 virt/kvm/msix_mmio.c
   create mode 100644 virt/kvm/msix_mmio.h
  
  diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
  index ae72ae6..7444dcd 100644
  --- a/virt/kvm/assigned-dev.c
  +++ b/virt/kvm/assigned-dev.c
  @@ -18,6 +18,7 @@
  
   #include linux/interrupt.h
   #include linux/slab.h
   #include irq.h
  
  +#include msix_mmio.h
  
   static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
   list_head *head,
   
int assigned_dev_id)
  
  @@ -191,12 +192,25 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
  
  kvm_deassign_irq(kvm, assigned_dev, assigned_dev-irq_requested_type);
   
   }
  
  +static void assigned_device_free_msix_mmio(struct kvm *kvm,
  +   struct kvm_assigned_dev_kernel *adev)
  +{
  +   struct kvm_msix_mmio mmio;
  +
  +   mmio.dev_id = adev-assigned_dev_id;
  +   mmio.type = KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV |
  +   KVM_MSIX_MMIO_TYPE_BASE_TABLE;
  +   kvm_free_msix_mmio(kvm, mmio);
  +}
  +
  
   static void kvm_free_assigned_device(struct kvm *kvm,
   
   struct kvm_assigned_dev_kernel
   *assigned_dev)
   
   {
   
  kvm_free_assigned_irq(kvm, assigned_dev);
  
  +   assigned_device_free_msix_mmio(kvm, assigned_dev);
  +
  
  __pci_reset_function(assigned_dev-dev);
  pci_restore_state(assigned_dev-dev);
  
  @@ -785,3 +799,33 @@ out:
  return r;
   
   }
  
  +int kvm_assigned_device_update_msix_mask_bit(struct kvm *kvm,
  +   int assigned_dev_id, int entry, bool mask)
  +{
  +   int r = -EFAULT;
  +   struct kvm_assigned_dev_kernel *adev;
  +   int i;
  +
  +   if (!irqchip_in_kernel(kvm))
  +   return r;
  +
  +   mutex_lock(kvm-lock);
  +   adev = kvm_find_assigned_dev(kvm-arch.assigned_dev_head,
  + assigned_dev_id);
  +   if (!adev)
  +   goto out;
  +
  +   for (i = 0; i  adev-entries_nr; i++)
  +   if (adev-host_msix_entries[i].entry == entry) {
  +   if (mask)
  +   disable_irq_nosync(
  +   adev-host_msix_entries[i].vector);
  +   else
  +   enable_irq(adev-host_msix_entries[i].vector);
  +   r = 0;
  +   break;
  +   }
 
 Should check if the host irq is registered as MSIX type.
 
  +out:
  +   mutex_unlock(kvm-lock);
  +   return r;
  +}
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index b1b6cbb..b7807c8 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -56,6 +56,7 @@
  
   #include coalesced_mmio.h
   #include async_pf.h
  
  +#include msix_mmio.h
  
   #define CREATE_TRACE_POINTS
   #include trace/events/kvm.h
  
  @@ -509,6 +510,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
  
  struct mm_struct *mm = kvm-mm;
  
  kvm_arch_sync_events(kvm);
  
  +   kvm_unregister_msix_mmio_dev(kvm);
  
  spin_lock(kvm_lock);
  list_del(kvm-vm_list);
  spin_unlock(kvm_lock);
  
  @@ -1877,6 +1879,24 @@ static long kvm_vm_ioctl(struct file *filp,
  
  mutex_unlock(kvm-lock);
  break;
   
   #endif
  
  +   case KVM_REGISTER_MSIX_MMIO: {
  +   struct kvm_msix_mmio_user mmio_user;
  +
  +   r = -EFAULT;
  +   if (copy_from_user(mmio_user, argp, sizeof mmio_user))
  +   goto out;
  +   r = kvm_vm_ioctl_register_msix_mmio(kvm, mmio_user);
  +   break;
  +   }
  +   case KVM_UNREGISTER_MSIX_MMIO: {
  +   struct kvm_msix_mmio_user mmio_user;
  +
  +   r = -EFAULT;
  +   if (copy_from_user(mmio_user, argp, sizeof mmio_user))
  +   goto out;
  +   r = kvm_vm_ioctl_unregister_msix_mmio(kvm, mmio_user);
  +   break;
  +   }
  
  default:
  r = kvm_arch_vm_ioctl(filp, ioctl, arg);
  if (r == -ENOTTY)
  
  @@ -1988,6 +2008,12 @@ static int kvm_dev_ioctl_create_vm(void)
  
  return r;
  
  }
   
   #endif
  
  +   r = kvm_register_msix_mmio_dev(kvm);
  +   if (r  0) {
  +   kvm_put_kvm(kvm);
  +   return r;
  +   }
  +
  
  r

Re: [PATCH 3/3] KVM: Add documents for MSI-X MMIO API

2011-01-17 Thread Sheng Yang
On Monday 17 January 2011 20:21:45 Avi Kivity wrote:
 On 01/06/2011 12:19 PM, Sheng Yang wrote:
  Signed-off-by: Sheng Yangsh...@linux.intel.com
  ---
  
Documentation/kvm/api.txt |   41
+ 1 files changed, 41
insertions(+), 0 deletions(-)
  
  diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
  index e1a9297..4978b94 100644
  --- a/Documentation/kvm/api.txt
  +++ b/Documentation/kvm/api.txt
  @@ -1263,6 +1263,47 @@ struct kvm_assigned_msix_entry {
  
  __u16 padding[3];

};
  
  +4.54 KVM_REGISTER_MSIX_MMIO
  +
  +Capability: KVM_CAP_MSIX_MMIO
  +Architectures: x86
  +Type: vm ioctl
  +Parameters: struct kvm_msix_mmio_user (in)
  +Returns: 0 on success, -1 on error
  +
  +This API indicates an MSI-X MMIO address of a guest device. Then all
  MMIO +operation would be handled by kernel. When necessary(e.g. MSI
  data/address +changed), KVM would exit to userspace using
  KVM_EXIT_MSIX_ROUTING_UPDATE to +indicate the MMIO modification and
  require userspace to update IRQ routing +table.
  +
  +struct kvm_msix_mmio_user {
  +   __u32 dev_id;
  +   __u16 type; /* Device type and MMIO address type */
  +   __u16 max_entries_nr;   /* Maximum entries supported */
  +   __u64 base_addr;/* Guest physical address of MMIO */
  +   __u64 base_va;  /* Host virtual address of MMIO mapping */
  +   __u64 flags;/* Reserved for now */
  +   __u64 reserved[4];
  +};
  +
  +Current device type can be:
  +#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
  +
  +Current MMIO type can be:
  +#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
  +
 
 How does userspace know which entry of which table changed?  Need a
 field in struct kvm_run for that.

We already got an guest MMIO address for that in the exit information. I've 
created a chain of handler in qemu to handle it.

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3 v7] MSI-X MMIO support for KVM

2011-01-11 Thread Sheng Yang
On Thursday 06 January 2011 18:19:42 Sheng Yang wrote:
 Change from v6:
 1. Discard PBA support. But we can still add it later.
 2. Fix one memory reference bug
 3. Add automatically MMIO unregister after device was deassigned.
 4. Update according to Avi's comments.
 5. Add documents for new API.

Avi?

--
regards
Yang, Sheng

 
 Notice this patchset depends on two PCI patches named:
 
 PCI: MSI: Move MSI-X entry definition to pci_regs.h
 PCI: Add mask bit definition for MSI-X table
 
 These two patches are in the Jesse's pci-2.6 tree. Do I need to repost
 them?
 
 Sheng Yang (3):
   KVM: Move struct kvm_io_device to kvm_host.h
   KVM: Emulate MSI-X table in kernel
   KVM: Add documents for MSI-X MMIO API
 
  Documentation/kvm/api.txt |   41 +++
  arch/x86/kvm/Makefile |2 +-
  arch/x86/kvm/x86.c|8 +-
  include/linux/kvm.h   |   21 
  include/linux/kvm_host.h  |   48 
  virt/kvm/assigned-dev.c   |   44 +++
  virt/kvm/iodev.h  |   25 +
  virt/kvm/kvm_main.c   |   38 ++-
  virt/kvm/msix_mmio.c  |  284
 + virt/kvm/msix_mmio.h  | 
  25 
  10 files changed, 505 insertions(+), 31 deletions(-)
  create mode 100644 virt/kvm/msix_mmio.c
  create mode 100644 virt/kvm/msix_mmio.h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: Move struct kvm_io_device to kvm_host.h

2011-01-06 Thread Sheng Yang
Then it can be used by other struct in kvm_host.h

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |   23 +++
 virt/kvm/iodev.h |   25 +
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b5021db..7d313e0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -98,6 +98,29 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
+struct kvm_io_device;
+
+/**
+ * kvm_io_device_ops are called under kvm slots_lock.
+ * read and write handlers return 0 if the transaction has been handled,
+ * or non-zero to have it passed to the next device.
+ **/
+struct kvm_io_device_ops {
+   int (*read)(struct kvm_io_device *this,
+   gpa_t addr,
+   int len,
+   void *val);
+   int (*write)(struct kvm_io_device *this,
+gpa_t addr,
+int len,
+const void *val);
+   void (*destructor)(struct kvm_io_device *this);
+};
+
+struct kvm_io_device {
+   const struct kvm_io_device_ops *ops;
+};
+
 struct kvm_vcpu {
struct kvm *kvm;
 #ifdef CONFIG_PREEMPT_NOTIFIERS
diff --git a/virt/kvm/iodev.h b/virt/kvm/iodev.h
index 12fd3ca..d1f5651 100644
--- a/virt/kvm/iodev.h
+++ b/virt/kvm/iodev.h
@@ -17,32 +17,9 @@
 #define __KVM_IODEV_H__
 
 #include linux/kvm_types.h
+#include linux/kvm_host.h
 #include asm/errno.h
 
-struct kvm_io_device;
-
-/**
- * kvm_io_device_ops are called under kvm slots_lock.
- * read and write handlers return 0 if the transaction has been handled,
- * or non-zero to have it passed to the next device.
- **/
-struct kvm_io_device_ops {
-   int (*read)(struct kvm_io_device *this,
-   gpa_t addr,
-   int len,
-   void *val);
-   int (*write)(struct kvm_io_device *this,
-gpa_t addr,
-int len,
-const void *val);
-   void (*destructor)(struct kvm_io_device *this);
-};
-
-
-struct kvm_io_device {
-   const struct kvm_io_device_ops *ops;
-};
-
 static inline void kvm_iodevice_init(struct kvm_io_device *dev,
 const struct kvm_io_device_ops *ops)
 {
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3 v7] MSI-X MMIO support for KVM

2011-01-06 Thread Sheng Yang
Change from v6:
1. Discard PBA support. But we can still add it later.
2. Fix one memory reference bug
3. Add automatically MMIO unregister after device was deassigned.
4. Update according to Avi's comments.
5. Add documents for new API.

Notice this patchset depends on two PCI patches named:

PCI: MSI: Move MSI-X entry definition to pci_regs.h
PCI: Add mask bit definition for MSI-X table

These two patches are in the Jesse's pci-2.6 tree. Do I need to repost them?

Sheng Yang (3):
  KVM: Move struct kvm_io_device to kvm_host.h
  KVM: Emulate MSI-X table in kernel
  KVM: Add documents for MSI-X MMIO API

 Documentation/kvm/api.txt |   41 +++
 arch/x86/kvm/Makefile |2 +-
 arch/x86/kvm/x86.c|8 +-
 include/linux/kvm.h   |   21 
 include/linux/kvm_host.h  |   48 
 virt/kvm/assigned-dev.c   |   44 +++
 virt/kvm/iodev.h  |   25 +
 virt/kvm/kvm_main.c   |   38 ++-
 virt/kvm/msix_mmio.c  |  284 +
 virt/kvm/msix_mmio.h  |   25 
 10 files changed, 505 insertions(+), 31 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: Emulate MSI-X table in kernel

2011-01-06 Thread Sheng Yang
Then we can support mask bit operation of assigned devices now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/Makefile|2 +-
 arch/x86/kvm/x86.c   |8 +-
 include/linux/kvm.h  |   21 
 include/linux/kvm_host.h |   25 
 virt/kvm/assigned-dev.c  |   44 +++
 virt/kvm/kvm_main.c  |   38 ++-
 virt/kvm/msix_mmio.c |  284 ++
 virt/kvm/msix_mmio.h |   25 
 8 files changed, 440 insertions(+), 7 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
 
 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o)
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa708c9..89bf12c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -3807,6 +3808,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
   struct kvm_vcpu *vcpu)
 {
gpa_t gpa;
+   int r;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3822,14 +3824,16 @@ static int emulator_write_emulated_onepage(unsigned 
long addr,
 
 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
+   r = vcpu_mmio_write(vcpu, gpa, bytes, val);
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!r)
return X86EMUL_CONTINUE;
 
vcpu-mmio_needed = 1;
-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
+   KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
vcpu-run-mmio.len = vcpu-mmio_size = bytes;
vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..ad9df4b 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
+#define KVM_EXIT_MSIX_ROUTING_UPDATE 19
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -541,6 +542,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
 #define KVM_CAP_ASYNC_PF 59
+#define KVM_CAP_MSIX_MMIO 60
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -672,6 +674,9 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_MSIX_MMIO */
+#define KVM_REGISTER_MSIX_MMIO_IOW(KVMIO,  0x7d, struct kvm_msix_mmio_user)
+#define KVM_UNREGISTER_MSIX_MMIO  _IOW(KVMIO,  0x7e, struct kvm_msix_mmio_user)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -795,4 +800,20 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+
+#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff
+#define KVM_MSIX_MMIO_TYPE_BASE_MASK   0xff00
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type;
+   __u16 max_entries_nr;
+   __u64 base_addr;
+   __u64 base_va;
+   __u64 flags;
+   __u64 reserved[4];
+};
+
 #endif /* __LINUX_KVM_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7d313e0..c10670c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -233,6 +233,27 @@ struct kvm_memslots {
KVM_PRIVATE_MEM_SLOTS];
 };
 
+#define KVM_MSIX_MMIO_MAX32
+
+struct kvm_msix_mmio {
+   u32 dev_id;
+   u16 type;
+   u16 max_entries_nr;
+   u64 flags;
+   gpa_t table_base_addr;
+   hva_t table_base_va;
+   gpa_t pba_base_addr;
+   hva_t pba_base_va

[PATCH 3/3] KVM: Add documents for MSI-X MMIO API

2011-01-06 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 Documentation/kvm/api.txt |   41 +
 1 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index e1a9297..4978b94 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1263,6 +1263,47 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+4.54 KVM_REGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API indicates an MSI-X MMIO address of a guest device. Then all MMIO
+operation would be handled by kernel. When necessary(e.g. MSI data/address
+changed), KVM would exit to userspace using KVM_EXIT_MSIX_ROUTING_UPDATE to
+indicate the MMIO modification and require userspace to update IRQ routing
+table.
+
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type; /* Device type and MMIO address type */
+   __u16 max_entries_nr;   /* Maximum entries supported */
+   __u64 base_addr;/* Guest physical address of MMIO */
+   __u64 base_va;  /* Host virtual address of MMIO mapping */
+   __u64 flags;/* Reserved for now */
+   __u64 reserved[4];
+};
+
+Current device type can be:
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+Current MMIO type can be:
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+
+4.55 KVM_UNREGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API would unregister the specific MSI-X MMIO, indicated by dev_id and
+type fields of struct kvm_msix_mmio_user.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4 v7] qemu-kvm: MSI-X MMIO support for assigned device

2011-01-06 Thread Sheng Yang
Update with kernel patches v7.

Sheng Yang (4):
  qemu-kvm: device assignment: Enabling MSI-X according to the entries'
mask bit
  qemu-kvm: Ioctl for MSIX MMIO support
  qemu-kvm: Header file update for MSI-X MMIO support
  qemu-kvm: MSI-X MMIO support for assigned device

 hw/device-assignment.c  |  275 --
 hw/device-assignment.h  |5 +-
 kvm/include/linux/kvm.h |   21 
 qemu-kvm.c  |   54 +
 qemu-kvm.h  |   18 +++
 5 files changed, 336 insertions(+), 37 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] qemu-kvm: Header file update for MSI-X MMIO support

2011-01-06 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 kvm/include/linux/kvm.h |   21 +
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h
index e46729e..7b6d5b9 100644
--- a/kvm/include/linux/kvm.h
+++ b/kvm/include/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
+#define KVM_EXIT_MSIX_ROUTING_UPDATE 19
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -530,6 +531,7 @@ struct kvm_enable_cap {
 #ifdef __KVM_HAVE_XCRS
 #define KVM_CAP_XCRS 56
 #endif
+#define KVM_CAP_MSIX_MMIO 60
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -660,6 +662,9 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_MSIX_MMIO */
+#define KVM_REGISTER_MSIX_MMIO_IOW(KVMIO, 0x7d, struct kvm_msix_mmio_user)
+#define KVM_UNREGISTER_MSIX_MMIO  _IOW(KVMIO, 0x7e, struct kvm_msix_mmio_user)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -781,4 +786,20 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+
+#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff
+#define KVM_MSIX_MMIO_TYPE_BASE_MASK   0xff00
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type;
+   __u16 max_entries_nr;
+   __u64 base_addr;
+   __u64 base_va;
+   __u64 flags;
+   __u64 reserved[4];
+};
+
 #endif /* __LINUX_KVM_H */
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] qemu-kvm: MSI-X MMIO support for assigned device

2011-01-06 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 hw/device-assignment.c |   93 +--
 hw/device-assignment.h |3 ++
 qemu-kvm.c |   40 
 qemu-kvm.h |   11 ++
 4 files changed, 135 insertions(+), 12 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index f81050f..bddee2a 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -70,6 +70,11 @@ static void assigned_device_pci_cap_write_config(PCIDevice 
*pci_dev,
 static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
 uint32_t address, int len);
 
+static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
+{
+return (uint32_t)seg  16 | (uint32_t)bus  8 | (uint32_t)devfn;
+}
+
 static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
uint32_t addr, int len, uint32_t *val)
 {
@@ -272,6 +277,10 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int 
region_num,
 AssignedDevRegion *region = r_dev-v_addrs[region_num];
 PCIRegion *real_region = r_dev-real_device.regions[region_num];
 int ret = 0;
+#ifdef KVM_CAP_MSIX_MMIO
+int cap_mask = kvm_check_extension(kvm_state, KVM_CAP_MSIX_MMIO);
+struct kvm_msix_mmio_user msix_mmio;
+#endif
 
 DEBUG(e_phys=%08 FMT_PCIBUS  r_virt=%p type=%d len=%08 FMT_PCIBUS  
region_num=%d \n,
   e_phys, region-u.r_virtbase, type, e_size, region_num);
@@ -290,6 +299,23 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int 
region_num,
 
 cpu_register_physical_memory(e_phys + offset,
 TARGET_PAGE_SIZE, r_dev-mmio_index);
+#ifdef KVM_CAP_MSIX_MMIO
+if (cap_mask) {
+r_dev-guest_msix_table_addr = e_phys + offset;
+memset(msix_mmio, 0, sizeof msix_mmio);
+msix_mmio.dev_id = calc_assigned_dev_id(r_dev-h_segnr,
+r_dev-h_busnr, r_dev-h_devfn);
+msix_mmio.type = KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV |
+   KVM_MSIX_MMIO_TYPE_BASE_TABLE;
+msix_mmio.base_addr = e_phys + offset;
+msix_mmio.base_va = (unsigned long)r_dev-msix_table_page;
+msix_mmio.max_entries_nr = r_dev-max_msix_entries_nr;
+msix_mmio.flags = 0;
+ret = kvm_register_msix_mmio(kvm_context, msix_mmio);
+if (ret)
+fprintf(stderr, fail to register in-kernel msix_mmio!\n);
+}
+#endif
 }
 }
 
@@ -852,11 +878,6 @@ static void free_assigned_device(AssignedDevice *dev)
 }
 }
 
-static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
-{
-return (uint32_t)seg  16 | (uint32_t)bus  8 | (uint32_t)devfn;
-}
-
 static void assign_failed_examine(AssignedDevice *dev)
 {
 char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
@@ -1263,6 +1284,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev,
 return r;
 }
 
+static int assigned_dev_update_routing_handler(void *opaque, unsigned long 
addr);
+
 static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
 {
 struct kvm_assigned_irq assigned_irq_data;
@@ -1486,7 +1509,9 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 msix_table_entry = pci_get_long(pci_dev-config + pos + 
PCI_MSIX_TABLE);
 bar_nr = msix_table_entry  PCI_MSIX_BIR;
 msix_table_entry = ~PCI_MSIX_BIR;
-dev-msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
+dev-msix_table_addr = pci_region[bar_nr].base_addr +
+   msix_table_entry;
+
 dev-max_msix_entries_nr = get_msix_entries_max_nr(dev);
 }
 #endif
@@ -1670,8 +1695,7 @@ static uint32_t msix_mmio_readw(void *opaque, 
target_phys_addr_t addr)
 (8 * (addr  3)))  0x;
 }
 
-static void msix_mmio_writel(void *opaque,
- target_phys_addr_t addr, uint32_t val)
+static void assigned_dev_update_routing(void *opaque, unsigned long addr)
 {
 AssignedDevice *adev = opaque;
 unsigned int offset = addr  0xfff;
@@ -1683,10 +1707,6 @@ static void msix_mmio_writel(void *opaque,
 struct PCIDevice *pci_dev = adev-dev;
 uint8_t cap = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
-DEBUG(write to MSI-X entry table mmio offset 0x%lx, val 0x%x\n,
-   addr, val);
-memcpy((void *)((char *)page + offset), val, 4);
-
 index = offset / 16;
 
 /* Check if mask bit is being accessed */
@@ -1762,6 +1782,41 @@ static void msix_mmio_writel(void *opaque,
 adev-entry[entry_idx].u.msi.data = msg_data;
 }
 
+static int assigned_dev_update_routing_handler(void *opaque, unsigned long 
addr)
+{
+AssignedDevice *adev = opaque;
+
+if (addr = adev-guest_msix_table_addr

[PATCH 1/4] qemu-kvm: device assignment: Enabling MSI-X according to the entries' mask bit

2011-01-06 Thread Sheng Yang
The old MSI-X enabling method assume the entries are written before MSI-X
enabled, but some OS didn't obey this, e.g. FreeBSD. This patch would fix
this.

Also, according to the PCI spec, mask bit of MSI-X table should be set
after reset.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 hw/device-assignment.c |  188 +---
 hw/device-assignment.h |2 +-
 2 files changed, 162 insertions(+), 28 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 8446cd4..f81050f 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1141,15 +1141,12 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, 
unsigned int ctrl_pos)
 #endif
 
 #ifdef KVM_CAP_DEVICE_MSIX
-static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
+
+#define PCI_MSIX_CTRL_MASKBIT  1ul
+static int get_msix_entries_max_nr(AssignedDevice *adev)
 {
-AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
-uint16_t entries_nr = 0, entries_max_nr;
-int pos = 0, i, r = 0;
-uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
-struct kvm_assigned_msix_nr msix_nr;
-struct kvm_assigned_msix_entry msix_entry;
-void *va = adev-msix_table_page;
+int pos, entries_max_nr;
+PCIDevice *pci_dev = adev-dev;
 
 pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
@@ -1157,20 +1154,48 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 entries_max_nr = PCI_MSIX_TABSIZE;
 entries_max_nr += 1;
 
+return entries_max_nr;
+}
+
+static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int entry)
+{
+uint32_t msg_ctrl;
+void *va = adev-msix_table_page;
+
+memcpy(msg_ctrl, va + entry * 16 + 12, 4);
+return (msg_ctrl  PCI_MSIX_CTRL_MASKBIT);
+}
+
+static int get_msix_valid_entries_nr(AssignedDevice *adev,
+uint16_t entries_max_nr)
+{
+void *va = adev-msix_table_page;
+uint32_t msg_ctrl;
+uint16_t entries_nr = 0;
+int i;
+
 /* Get the usable entry number for allocating */
 for (i = 0; i  entries_max_nr; i++) {
 memcpy(msg_ctrl, va + i * 16 + 12, 4);
-memcpy(msg_data, va + i * 16 + 8, 4);
 /* Ignore unused entry even it's unmasked */
-if (msg_data == 0)
+if (assigned_dev_msix_entry_masked(adev, i))
 continue;
 entries_nr ++;
 }
+return entries_nr;
+}
+
+static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
+ uint16_t entries_nr,
+ uint16_t entries_max_nr)
+{
+AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
+int i, r = 0;
+uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
+struct kvm_assigned_msix_nr msix_nr;
+struct kvm_assigned_msix_entry msix_entry;
+void *va = adev-msix_table_page;
 
-if (entries_nr == 0) {
-fprintf(stderr, MSI-X entry number is zero!\n);
-return -EINVAL;
-}
 msix_nr.assigned_dev_id = calc_assigned_dev_id(adev-h_segnr, 
adev-h_busnr,
   (uint8_t)adev-h_devfn);
 msix_nr.entry_nr = entries_nr;
@@ -1182,6 +1207,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 }
 
 free_dev_irq_entries(adev);
+memset(pci_dev-msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
+sizeof(*pci_dev-msix_entry_used));
 adev-irq_entries_nr = entries_nr;
 adev-entry = calloc(entries_nr, sizeof(struct kvm_irq_routing_entry));
 if (!adev-entry) {
@@ -1195,10 +1222,10 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 if (entries_nr = msix_nr.entry_nr)
 break;
 memcpy(msg_ctrl, va + i * 16 + 12, 4);
-memcpy(msg_data, va + i * 16 + 8, 4);
-if (msg_data == 0)
+if (assigned_dev_msix_entry_masked(adev, i))
 continue;
 
+memcpy(msg_data, va + i * 16 + 8, 4);
 memcpy(msg_addr, va + i * 16, 4);
 memcpy(msg_upper_addr, va + i * 16 + 4, 4);
 
@@ -1212,17 +1239,18 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 adev-entry[entries_nr].u.msi.address_lo = msg_addr;
 adev-entry[entries_nr].u.msi.address_hi = msg_upper_addr;
 adev-entry[entries_nr].u.msi.data = msg_data;
-DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, msg_data, msg_addr);
-   kvm_add_routing_entry(adev-entry[entries_nr]);
+DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x!\n, msg_data, msg_addr);
+kvm_add_routing_entry(adev-entry[entries_nr]);
 
 msix_entry.gsi = adev-entry[entries_nr].gsi;
 msix_entry.entry = i;
+pci_dev-msix_entry_used[i] = 1;
 r = kvm_assign_set_msix_entry(kvm_context, msix_entry);
 if (r) {
 fprintf(stderr, fail to set MSI-X entry! %s\n, strerror(-r));
 break;
 }
-DEBUG(MSI-X

[PATCH] KVM: VMX: Fix 32bit Windows blue screen with EPT

2010-12-30 Thread Sheng Yang
After CR0 is changed during VMExit, the result of kvm_read_cr3() may be
different. Commit d95bfcdd7cda4dfdac9588e684bc7c75794a075e KVM: Fetch guest
cr3 from hardware on demand caused 32bit Windows guest blue screen when using
with EPT. This patch fixes it by decache CR3 before CR0 change, for both
paging to nonpaging, and nonpaging to paging switch.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---

But I haven't found the exactly point affected by this, any clue?

 arch/x86/kvm/vmx.c |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f107315..0b8cfc1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1921,8 +1921,7 @@ static void ept_update_paging_mode_cr0(unsigned long 
*hw_cr0,
unsigned long cr0,
struct kvm_vcpu *vcpu)
 {
-   ulong cr3;
-
+   kvm_read_cr3(vcpu);
if (!(cr0  X86_CR0_PG)) {
/* From paging/starting to nonpaging */
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
@@ -1937,11 +1936,8 @@ static void ept_update_paging_mode_cr0(unsigned long 
*hw_cr0,
 vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) 
 ~(CPU_BASED_CR3_LOAD_EXITING |
   CPU_BASED_CR3_STORE_EXITING));
-   /* Must fetch cr3 before updating cr0 */
-   cr3 = kvm_read_cr3(vcpu);
vcpu-arch.cr0 = cr0;
vmx_set_cr4(vcpu, kvm_read_cr4(vcpu));
-   vmx_set_cr3(vcpu, cr3);
}
 
if (!(cr0  X86_CR0_WP))
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fix 32bit Windows blue screen with EPT

2010-12-30 Thread Sheng Yang
On Thursday 30 December 2010 16:57:20 Avi Kivity wrote:
 On 12/30/2010 10:35 AM, Sheng Yang wrote:
  After CR0 is changed during VMExit, the result of kvm_read_cr3() may be
  different. Commit d95bfcdd7cda4dfdac9588e684bc7c75794a075e KVM: Fetch
  guest cr3 from hardware on demand caused 32bit Windows guest blue
  screen when using with EPT. This patch fixes it by decache CR3 before
  CR0 change, for both paging to nonpaging, and nonpaging to paging
  switch.
  
  Signed-off-by: Sheng Yangsh...@linux.intel.com
  ---
  
  But I haven't found the exactly point affected by this, any clue?
 
 Can't see it either.
 
  @@ -1921,8 +1921,7 @@ static void ept_update_paging_mode_cr0(unsigned
  long *hw_cr0,
  
  unsigned long cr0,
  struct kvm_vcpu *vcpu)

{
  
  -   ulong cr3;
  -
  +   kvm_read_cr3(vcpu);
 
 Without this line, it fails?

Yes, seems something happened on paging to nonpaging switch process.
 
 I think it's better to call vmx_decache_cr3() explicitly, since it
 explains what we're doing.  vmx_decache_cr3 depends on arch.cr0, and
 we're changing that here.

OK.
 
  if (!(cr0  X86_CR0_PG)) {
  
  /* From paging/starting to nonpaging */
  vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
  
  @@ -1937,11 +1936,8 @@ static void ept_update_paging_mode_cr0(unsigned
  long *hw_cr0,
  
   vmcs_read32(CPU_BASED_VM_EXEC_CONTROL)
  
  ~(CPU_BASED_CR3_LOAD_EXITING |
  
 CPU_BASED_CR3_STORE_EXITING));
  
  -   /* Must fetch cr3 before updating cr0 */
  -   cr3 = kvm_read_cr3(vcpu);
  
  vcpu-arch.cr0 = cr0;
  vmx_set_cr4(vcpu, kvm_read_cr4(vcpu));
  
  -   vmx_set_cr3(vcpu, cr3);
 
 This is indeed bogus.  But what ensures that we'll have the correct
 GUEST_CR3 after enabling paging?

In fact I don't understand why we need this line. All modification is for CR3 
reading, why we need to set hardware CR3 again? It should be the same as when 
we 
don't have CR3 accessor I think.

--
regards
Yang, Sheng

 
  }
  
  if (!(cr0  X86_CR0_WP))
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fix 32bit Windows blue screen with EPT

2010-12-30 Thread Sheng Yang
On Thursday 30 December 2010 16:57:20 Avi Kivity wrote:
 On 12/30/2010 10:35 AM, Sheng Yang wrote:
  After CR0 is changed during VMExit, the result of kvm_read_cr3() may be
  different. Commit d95bfcdd7cda4dfdac9588e684bc7c75794a075e KVM: Fetch
  guest cr3 from hardware on demand caused 32bit Windows guest blue
  screen when using with EPT. This patch fixes it by decache CR3 before
  CR0 change, for both paging to nonpaging, and nonpaging to paging
  switch.
  
  Signed-off-by: Sheng Yangsh...@linux.intel.com
  ---
  
  But I haven't found the exactly point affected by this, any clue?
 
 Can't see it either.
 
  @@ -1921,8 +1921,7 @@ static void ept_update_paging_mode_cr0(unsigned
  long *hw_cr0,
  
  unsigned long cr0,
  struct kvm_vcpu *vcpu)

{
  
  -   ulong cr3;
  -
  +   kvm_read_cr3(vcpu);
 
 Without this line, it fails?
 
 I think it's better to call vmx_decache_cr3() explicitly, since it
 explains what we're doing.  vmx_decache_cr3 depends on arch.cr0, and
 we're changing that here.
 
  if (!(cr0  X86_CR0_PG)) {
  
  /* From paging/starting to nonpaging */
  vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
  
  @@ -1937,11 +1936,8 @@ static void ept_update_paging_mode_cr0(unsigned
  long *hw_cr0,
  
   vmcs_read32(CPU_BASED_VM_EXEC_CONTROL)
  
  ~(CPU_BASED_CR3_LOAD_EXITING |
  
 CPU_BASED_CR3_STORE_EXITING));
  
  -   /* Must fetch cr3 before updating cr0 */
  -   cr3 = kvm_read_cr3(vcpu);
  
  vcpu-arch.cr0 = cr0;
  vmx_set_cr4(vcpu, kvm_read_cr4(vcpu));
  
  -   vmx_set_cr3(vcpu, cr3);
 
 This is indeed bogus.  But what ensures that we'll have the correct
 GUEST_CR3 after enabling paging?

BTW: What did you find when you added this two lines?

--
regards
Yang, Sheng

 
  }
  
  if (!(cr0  X86_CR0_WP))
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel

2010-12-30 Thread Sheng Yang
On Thursday 30 December 2010 16:52:58 Michael S. Tsirkin wrote:
 On Thu, Dec 30, 2010 at 04:24:10PM +0800, Sheng Yang wrote:
  On Thursday 30 December 2010 16:15:32 Michael S. Tsirkin wrote:
   On Thu, Dec 30, 2010 at 03:55:10PM +0800, Sheng Yang wrote:
On Thursday 30 December 2010 15:47:48 Michael S. Tsirkin wrote:
 On Thu, Dec 30, 2010 at 03:32:42PM +0800, Sheng Yang wrote:
  On Wednesday 29 December 2010 17:28:24 Michael S. Tsirkin wrote:
   On Wed, Dec 29, 2010 at 04:55:19PM +0800, Sheng Yang wrote:
On Wednesday 29 December 2010 16:31:35 Michael S. Tsirkin wrote:
 On Wed, Dec 29, 2010 at 03:18:13PM +0800, Sheng Yang wrote:
  On Tuesday 28 December 2010 20:26:13 Avi Kivity wrote:
   On 12/22/2010 10:44 AM, Sheng Yang wrote:
Then we can support mask bit operation of assigned
devices now.


@@ -3817,14 +3819,16 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

  mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa,
*(u64 *)val);

+   r = vcpu_mmio_write(vcpu, gpa, bytes, val);

/*

 * Is this MMIO handled locally?
 */

-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!r)

return X86EMUL_CONTINUE;

vcpu-mmio_needed = 1;

-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
+   KVM_EXIT_MSIX_ROUTING_UPDATE : 
KVM_EXIT_MMIO;
   
   This isn't very pretty, exit_reason should be written
   in vcpu_mmio_write().  I guess we can refactor it
   later.
  
  Sure.
  
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV
(1  0)
+
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+#define KVM_MSIX_MMIO_TYPE_BASE_PBA(1  9)
+
+#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff
+#define KVM_MSIX_MMIO_TYPE_BASE_MASK   0xff00
   
   Any explanation of these?
  
  I chose to use assigned device id instead of one specific
  table id, because every device should got at most one MSI
  MMIO(the same should applied to vfio device as well), and
  if we use specific table ID, we need way to associate
  with the device anyway, to perform mask/unmask or other
  operation. So I think it's better to use device ID here
  directly.
 
 Table id will be needed to make things work for emulated
 devices.

I suppose even emulated device should got some kind of
id(BDF)?
   
   Not that I know. Look at how irqfd is defined for example,
   or how interrupts are sent through a gsi.
   I would like to make the interface be able to support that.
   
I think that is
enough for identification, which is already there, so we
don't need to allocate another ID for the device - because
one device would got at most one MSI-X MMIO, then use BDF or
other device specific ID should be quite straightforward.
   
   So you propose allocating ids for emulated devices?
   OK. How will we map e.g. irqfds to these?
  
  I don't understand. I've checked virtio-pci.c which is using
  irqfd, and it's still a PCI device, and still have BDF, right?
  
  Also, what we want is a way to identify the MSI-X MMIO. For
  assigned device, we use BDF, then we can easily identify the
  MMIO as well as the device. For others, even they don't have
  BDF(I don't think so, because MSI-X is a part of PCI, and every
  PCI device has BDF), what you need is an ID, no matter what it
  is and how it defined. QEmu can get the allocation done, and the
  type field in this API can still tell which kind of ID/devices
  they are, then determine how to deal with them.
 
 Yes, the PCI device can be identified with e.g. BFD
 (won't work for multi-domain but then we can write an allocator
 maybe). But how will we inject these interrupts?
 We can do this now with GSI ioctl or map GSI to irqfd
 and inject with irqfd write.

I suppose it's not in the scope of this patch...
   
   This is why I suggested mapping GSI to msix.
   
But I think you can still do
this, everything is the same as before. QEmu can read from table to
get data/address pair, then program the routing table, etc.
   
   Yes, fine, but mask is the problem :)
   When qemu/irqfd injects an interrupt and it's masked,
   guest

Re: [PATCH] KVM: VMX: Fix 32bit Windows blue screen with EPT

2010-12-30 Thread Sheng Yang
On Thursday 30 December 2010 17:14:23 Avi Kivity wrote:
 On 12/30/2010 11:05 AM, Sheng Yang wrote:
if (!(cr0   X86_CR0_PG)) {

/* From paging/starting to nonpaging */
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
  
  @@ -1937,11 +1936,8 @@ static void
  ept_update_paging_mode_cr0(unsigned long *hw_cr0,
  
vmcs_read32(CPU_BASED_VM_EXEC_CONTROL)

~(CPU_BASED_CR3_LOAD_EXITING |

CPU_BASED_CR3_STORE_EXITING));
  
  - /* Must fetch cr3 before updating cr0 */
  - cr3 = kvm_read_cr3(vcpu);
  
vcpu-arch.cr0 = cr0;
vmx_set_cr4(vcpu, kvm_read_cr4(vcpu));
  
  - vmx_set_cr3(vcpu, cr3);

This is indeed bogus.  But what ensures that we'll have the correct
GUEST_CR3 after enabling paging?
  
  In fact I don't understand why we need this line. All modification is for
  CR3 reading, why we need to set hardware CR3 again? It should be the
  same as when we don't have CR3 accessor I think.
 
 when cr0.pg=0 then we set GUEST_CR3=identity_pagetable.  We don't want
 that when we we switch to paging mode.

I think kvm_mmu_reset_context() in kvm_set_cr0() should cover this?

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: VMX: Fix 32bit Windows blue screen with EPT

2010-12-30 Thread Sheng Yang
After CR0 is changed during VMExit, the result of kvm_read_cr3() may be
different. Commit d95bfcdd7cda4dfdac9588e684bc7c75794a075e KVM: Fetch guest
cr3 from hardware on demand caused 32bit Windows guest blue screen when using
with EPT. This patch fixes it by decache CR3 before CR0 change, for both
paging to nonpaging, and nonpaging to paging switch.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/vmx.c |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f107315..bf89ec2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1921,8 +1921,7 @@ static void ept_update_paging_mode_cr0(unsigned long 
*hw_cr0,
unsigned long cr0,
struct kvm_vcpu *vcpu)
 {
-   ulong cr3;
-
+   vmx_decache_cr3(vcpu);
if (!(cr0  X86_CR0_PG)) {
/* From paging/starting to nonpaging */
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
@@ -1937,11 +1936,8 @@ static void ept_update_paging_mode_cr0(unsigned long 
*hw_cr0,
 vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) 
 ~(CPU_BASED_CR3_LOAD_EXITING |
   CPU_BASED_CR3_STORE_EXITING));
-   /* Must fetch cr3 before updating cr0 */
-   cr3 = kvm_read_cr3(vcpu);
vcpu-arch.cr0 = cr0;
vmx_set_cr4(vcpu, kvm_read_cr4(vcpu));
-   vmx_set_cr3(vcpu, cr3);
}
 
if (!(cr0  X86_CR0_WP))
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel

2010-12-30 Thread Sheng Yang
On Thursday 30 December 2010 18:32:56 Michael S. Tsirkin wrote:
 On Thu, Dec 30, 2010 at 11:30:12AM +0200, Avi Kivity wrote:
  On 12/30/2010 09:47 AM, Michael S. Tsirkin wrote:
  I am not really suggesting this. What I say is PBA is unimplemented
  let us not commit to an interface yet.
  
  What happens to a guest that tries to use PBA?
  It's a mandatory part of MSI-X, no?
 
 Yes. Unfortunately the pending bit is in fact a communication channel
 used for function specific purposes when mask bit is set,
 and 0 when unset. The spec even seems to *require* this use:
 
 I refer to this:
 
   For MSI and MSI-X, while a vector is masked, the function is prohibited
   from sending the associated message, and the function must set the
   associated Pending bit whenever the function would otherwise send the
   message. When software unmasks a vector whose associated Pending bit is
   set, the function must schedule sending the associated message, and
   clear the Pending bit as soon as the message has been sent. Note that
   clearing the MSI-X Function Mask bit may result in many messages needing
   to be sent.
 
 
   If a masked vector has its Pending bit set, and the associated
   underlying interrupt events are somehow satisfied (usually by software
   though the exact manner is function-specific), the function must clear
   the Pending bit, to avoid sending a spurious interrupt message later
   when software unmasks the vector. However, if a subsequent interrupt
   event occurs while the vector is still masked, the function must again
   set the Pending bit.
 
 
   Software is permitted to mask one or more vectors indefinitely, and
   service their associated interrupt events strictly based on polling
   their Pending bits. A function must set and clear its Pending bits as
   necessary to support this “pure polling” mode of operation.
 
 For assigned devices, supporting this would require
 that the mask bits on the device are set if the mask bit in
 guest is set (otherwise pending bits are disabled).

For assigned device, I think the result we should return is IRQ_PENDING bit of 
related IRQ. Seems it perfectly fits the meaning of pending bit definition here 
- 
set when masked, and if we didn't clean it, one interrupt would be retriggered 
after unmask. But it's a internal flag, and use it would lead to some core 
change(more need to be considered if we want to operate the flag bit outside 
core 
kernel part). 
 
 Existing code does not support PBA in assigned devices, so at least it's
 not a regression there, and the virtio spec says nothing about this so
 we should be fine.

I agree. At least it's not a regression. And in fact we haven't seen any device 
driver use this. I've checked Linux kernel code, found no one used PCI_MSIX_PBA 
or 
msix_pba_offset_reg().

I guess it's fine to get MSI-X mask part in first, then deal with PBA part if 
necessary - though we haven't seen any driver use it so far. It won't be worse 
with this patch anyway...

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel

2010-12-29 Thread Sheng Yang
On Wednesday 29 December 2010 16:31:35 Michael S. Tsirkin wrote:
 On Wed, Dec 29, 2010 at 03:18:13PM +0800, Sheng Yang wrote:
  On Tuesday 28 December 2010 20:26:13 Avi Kivity wrote:
   On 12/22/2010 10:44 AM, Sheng Yang wrote:
Then we can support mask bit operation of assigned devices now.


@@ -3817,14 +3819,16 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

  mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);

+   r = vcpu_mmio_write(vcpu, gpa, bytes, val);

/*

 * Is this MMIO handled locally?
 */

-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!r)

return X86EMUL_CONTINUE;

vcpu-mmio_needed = 1;

-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
+   KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
   
   This isn't very pretty, exit_reason should be written in
   vcpu_mmio_write().  I guess we can refactor it later.
  
  Sure.
  
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+#define KVM_MSIX_MMIO_TYPE_BASE_PBA(1  9)
+
+#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff
+#define KVM_MSIX_MMIO_TYPE_BASE_MASK   0xff00
   
   Any explanation of these?
  
  I chose to use assigned device id instead of one specific table id,
  because every device should got at most one MSI MMIO(the same should
  applied to vfio device as well), and if we use specific table ID, we
  need way to associate with the device anyway, to perform mask/unmask or
  other operation. So I think it's better to use device ID here directly.
 
 Table id will be needed to make things work for emulated devices.

I suppose even emulated device should got some kind of id(BDF)? I think that is 
enough for identification, which is already there, so we don't need to allocate 
another ID for the device - because one device would got at most one MSI-X 
MMIO, 
then use BDF or other device specific ID should be quite straightforward.
 
 My idea was this: we have the device id in kvm_assigned_msix_entry already.
 Just put table id and entry number in kvm_irq_routing_entry (create
 a new gsi type for this).
 The result will also work for irqfd because these are mapped to gsi.
 
  And for the table and pba address, it's due to the mapping in userspace
  may know the guest MSI-X table address and PBA address at different
  time(due to different BAR, refer to the code in assigned_dev_iomem_map()
  of qemu). So I purposed this API to allow each of them can be passed to
  kernel space individually.
  
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type;
+   __u16 max_entries_nr;
+   __u64 base_addr;
+   __u64 base_va;
+   __u64 flags;
+   __u64 reserved[4];
+};
+


+int kvm_assigned_device_update_msix_mask_bit(struct kvm *kvm,
+   int assigned_dev_id, int entry, u32 
flag)
+{
   
   Need a better name for 'flag' (and make it a bool).
   
+   int r = -EFAULT;
+   struct kvm_assigned_dev_kernel *adev;
+   int i;
+
+   if (!irqchip_in_kernel(kvm))
+   return r;
+
+   mutex_lock(kvm-lock);
+   adev = kvm_find_assigned_dev(kvm-arch.assigned_dev_head,
+ assigned_dev_id);
+   if (!adev)
+   goto out;
+
+   for (i = 0; i  adev-entries_nr; i++)
+   if (adev-host_msix_entries[i].entry == entry) {
+   if (flag)
+   disable_irq_nosync(
+   
adev-host_msix_entries[i].vector);
+   else
+   
enable_irq(adev-host_msix_entries[i].vector);
+   r = 0;
+   break;
+   }
+out:
+   mutex_unlock(kvm-lock);
+   return r;
+}

@@ -1988,6 +2008,12 @@ static int kvm_dev_ioctl_create_vm(void)

return r;

}
  
  #endif

+   r = kvm_register_msix_mmio_dev(kvm);
+   if (r  0) {
+   kvm_put_kvm(kvm);
+   return r;
+   }
   
   Shouldn't this be part of individual KVM_REGISTER_MSIX_MMIO calls?
  
  In fact this MMIO device is more like global one for the VM, not for
  every devices. It should handle all MMIO from all MSI-X enabled devices,
  so I put it in the VM init/destroy process.
  
+static int msix_table_mmio_read(struct kvm_io_device *this, gpa_t
addr, int len, +void *val)
+{
+   struct

Re: [PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel

2010-12-29 Thread Sheng Yang
On Wednesday 29 December 2010 17:28:24 Michael S. Tsirkin wrote:
 On Wed, Dec 29, 2010 at 04:55:19PM +0800, Sheng Yang wrote:
  On Wednesday 29 December 2010 16:31:35 Michael S. Tsirkin wrote:
   On Wed, Dec 29, 2010 at 03:18:13PM +0800, Sheng Yang wrote:
On Tuesday 28 December 2010 20:26:13 Avi Kivity wrote:
 On 12/22/2010 10:44 AM, Sheng Yang wrote:
  Then we can support mask bit operation of assigned devices now.
  
  
  @@ -3817,14 +3819,16 @@ static int
  emulator_write_emulated_onepage(unsigned long addr,
  
mmio:
  trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
  
  +   r = vcpu_mmio_write(vcpu, gpa, bytes, val);
  
  /*
  
   * Is this MMIO handled locally?
   */
  
  -   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
  +   if (!r)
  
  return X86EMUL_CONTINUE;
  
  vcpu-mmio_needed = 1;
  
  -   vcpu-run-exit_reason = KVM_EXIT_MMIO;
  +   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
  +   KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
 
 This isn't very pretty, exit_reason should be written in
 vcpu_mmio_write().  I guess we can refactor it later.

Sure.

  +#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
  +
  +#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
  +#define KVM_MSIX_MMIO_TYPE_BASE_PBA(1  9)
  +
  +#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff
  +#define KVM_MSIX_MMIO_TYPE_BASE_MASK   0xff00
 
 Any explanation of these?

I chose to use assigned device id instead of one specific table id,
because every device should got at most one MSI MMIO(the same should
applied to vfio device as well), and if we use specific table ID, we
need way to associate with the device anyway, to perform mask/unmask
or other operation. So I think it's better to use device ID here
directly.
   
   Table id will be needed to make things work for emulated devices.
  
  I suppose even emulated device should got some kind of id(BDF)?
 
 Not that I know. Look at how irqfd is defined for example,
 or how interrupts are sent through a gsi.
 I would like to make the interface be able to support that.

  I think that is
  enough for identification, which is already there, so we don't need to
  allocate another ID for the device - because one device would got at
  most one MSI-X MMIO, then use BDF or other device specific ID should be
  quite straightforward.
 
 So you propose allocating ids for emulated devices?
 OK. How will we map e.g. irqfds to these?

I don't understand. I've checked virtio-pci.c which is using irqfd, and it's 
still 
a PCI device, and still have BDF, right? 

Also, what we want is a way to identify the MSI-X MMIO. For assigned device, we 
use BDF, then we can easily identify the MMIO as well as the device. For 
others, 
even they don't have BDF(I don't think so, because MSI-X is a part of PCI, and 
every PCI device has BDF), what you need is an ID, no matter what it is and how 
it 
defined. QEmu can get the allocation done, and the type field in this API can 
still 
tell which kind of ID/devices they are, then determine how to deal with them.

 
   My idea was this: we have the device id in kvm_assigned_msix_entry
   already. Just put table id and entry number in kvm_irq_routing_entry
   (create a new gsi type for this).
   The result will also work for irqfd because these are mapped to gsi.
   
And for the table and pba address, it's due to the mapping in
userspace may know the guest MSI-X table address and PBA address at
different time(due to different BAR, refer to the code in
assigned_dev_iomem_map() of qemu). So I purposed this API to allow
each of them can be passed to kernel space individually.

  +struct kvm_msix_mmio_user {
  +   __u32 dev_id;
  +   __u16 type;
  +   __u16 max_entries_nr;
  +   __u64 base_addr;
  +   __u64 base_va;
  +   __u64 flags;
  +   __u64 reserved[4];
  +};
  +
  
  
  +int kvm_assigned_device_update_msix_mask_bit(struct kvm *kvm,
  +   int assigned_dev_id, int entry, u32 
  flag)
  +{
 
 Need a better name for 'flag' (and make it a bool).
 
  +   int r = -EFAULT;
  +   struct kvm_assigned_dev_kernel *adev;
  +   int i;
  +
  +   if (!irqchip_in_kernel(kvm))
  +   return r;
  +
  +   mutex_lock(kvm-lock);
  +   adev = kvm_find_assigned_dev(kvm-arch.assigned_dev_head,
  + assigned_dev_id);
  +   if (!adev)
  +   goto out;
  +
  +   for (i = 0; i  adev-entries_nr; i++)
  +   if (adev-host_msix_entries[i].entry == entry) {
  +   if (flag)
  +   disable_irq_nosync

Re: [PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel

2010-12-29 Thread Sheng Yang
On Thursday 30 December 2010 15:47:48 Michael S. Tsirkin wrote:
 On Thu, Dec 30, 2010 at 03:32:42PM +0800, Sheng Yang wrote:
  On Wednesday 29 December 2010 17:28:24 Michael S. Tsirkin wrote:
   On Wed, Dec 29, 2010 at 04:55:19PM +0800, Sheng Yang wrote:
On Wednesday 29 December 2010 16:31:35 Michael S. Tsirkin wrote:
 On Wed, Dec 29, 2010 at 03:18:13PM +0800, Sheng Yang wrote:
  On Tuesday 28 December 2010 20:26:13 Avi Kivity wrote:
   On 12/22/2010 10:44 AM, Sheng Yang wrote:
Then we can support mask bit operation of assigned devices
now.


@@ -3817,14 +3819,16 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

  mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64
*)val);

+   r = vcpu_mmio_write(vcpu, gpa, bytes, val);

/*

 * Is this MMIO handled locally?
 */

-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!r)

return X86EMUL_CONTINUE;

vcpu-mmio_needed = 1;

-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
+   KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
   
   This isn't very pretty, exit_reason should be written in
   vcpu_mmio_write().  I guess we can refactor it later.
  
  Sure.
  
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+#define KVM_MSIX_MMIO_TYPE_BASE_PBA(1  9)
+
+#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff
+#define KVM_MSIX_MMIO_TYPE_BASE_MASK   0xff00
   
   Any explanation of these?
  
  I chose to use assigned device id instead of one specific table
  id, because every device should got at most one MSI MMIO(the
  same should applied to vfio device as well), and if we use
  specific table ID, we need way to associate with the device
  anyway, to perform mask/unmask or other operation. So I think
  it's better to use device ID here directly.
 
 Table id will be needed to make things work for emulated devices.

I suppose even emulated device should got some kind of id(BDF)?
   
   Not that I know. Look at how irqfd is defined for example,
   or how interrupts are sent through a gsi.
   I would like to make the interface be able to support that.
   
I think that is
enough for identification, which is already there, so we don't need
to allocate another ID for the device - because one device would got
at most one MSI-X MMIO, then use BDF or other device specific ID
should be quite straightforward.
   
   So you propose allocating ids for emulated devices?
   OK. How will we map e.g. irqfds to these?
  
  I don't understand. I've checked virtio-pci.c which is using irqfd, and
  it's still a PCI device, and still have BDF, right?
  
  Also, what we want is a way to identify the MSI-X MMIO. For assigned
  device, we use BDF, then we can easily identify the MMIO as well as the
  device. For others, even they don't have BDF(I don't think so, because
  MSI-X is a part of PCI, and every PCI device has BDF), what you need is
  an ID, no matter what it is and how it defined. QEmu can get the
  allocation done, and the type field in this API can still tell which
  kind of ID/devices they are, then determine how to deal with them.
 
 Yes, the PCI device can be identified with e.g. BFD
 (won't work for multi-domain but then we can write an allocator maybe).
 But how will we inject these interrupts?
 We can do this now with GSI ioctl or map GSI to irqfd
 and inject with irqfd write.

I suppose it's not in the scope of this patch... But I think you can still do 
this, everything is the same as before. QEmu can read from table to get 
data/address pair, then program the routing table, etc.

--
regards
Yang, Sheng
 
 My idea was this: we have the device id in kvm_assigned_msix_entry
 already. Just put table id and entry number in
 kvm_irq_routing_entry (create a new gsi type for this).
 The result will also work for irqfd because these are mapped to
 gsi.
 
  And for the table and pba address, it's due to the mapping in
  userspace may know the guest MSI-X table address and PBA address
  at different time(due to different BAR, refer to the code in
  assigned_dev_iomem_map() of qemu). So I purposed this API to
  allow each of them can be passed to kernel space individually.
  
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type;
+   __u16 max_entries_nr;
+   __u64 base_addr;
+   __u64 base_va;
+   __u64 flags

Re: [PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel

2010-12-28 Thread Sheng Yang
On Tuesday 28 December 2010 20:26:13 Avi Kivity wrote:
 On 12/22/2010 10:44 AM, Sheng Yang wrote:
  Then we can support mask bit operation of assigned devices now.
  
  
  @@ -3817,14 +3819,16 @@ static int
  emulator_write_emulated_onepage(unsigned long addr,
  
mmio:
  trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
  
  +   r = vcpu_mmio_write(vcpu, gpa, bytes, val);
  
  /*
  
   * Is this MMIO handled locally?
   */
  
  -   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
  +   if (!r)
  
  return X86EMUL_CONTINUE;
  
  vcpu-mmio_needed = 1;
  
  -   vcpu-run-exit_reason = KVM_EXIT_MMIO;
  +   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
  +   KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
 
 This isn't very pretty, exit_reason should be written in
 vcpu_mmio_write().  I guess we can refactor it later.

Sure.
 
  +#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
  +
  +#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
  +#define KVM_MSIX_MMIO_TYPE_BASE_PBA(1  9)
  +
  +#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff
  +#define KVM_MSIX_MMIO_TYPE_BASE_MASK   0xff00
 
 Any explanation of these?

I chose to use assigned device id instead of one specific table id, because 
every 
device should got at most one MSI MMIO(the same should applied to vfio device 
as 
well), and if we use specific table ID, we need way to associate with the 
device 
anyway, to perform mask/unmask or other operation. So I think it's better to 
use 
device ID here directly. 

And for the table and pba address, it's due to the mapping in userspace may 
know 
the guest MSI-X table address and PBA address at different time(due to 
different 
BAR, refer to the code in assigned_dev_iomem_map() of qemu). So I purposed this 
API to allow each of them can be passed to kernel space individually.
 
  +struct kvm_msix_mmio_user {
  +   __u32 dev_id;
  +   __u16 type;
  +   __u16 max_entries_nr;
  +   __u64 base_addr;
  +   __u64 base_va;
  +   __u64 flags;
  +   __u64 reserved[4];
  +};
  +
  
  
  +int kvm_assigned_device_update_msix_mask_bit(struct kvm *kvm,
  +   int assigned_dev_id, int entry, u32 flag)
  +{
 
 Need a better name for 'flag' (and make it a bool).
 
  +   int r = -EFAULT;
  +   struct kvm_assigned_dev_kernel *adev;
  +   int i;
  +
  +   if (!irqchip_in_kernel(kvm))
  +   return r;
  +
  +   mutex_lock(kvm-lock);
  +   adev = kvm_find_assigned_dev(kvm-arch.assigned_dev_head,
  + assigned_dev_id);
  +   if (!adev)
  +   goto out;
  +
  +   for (i = 0; i  adev-entries_nr; i++)
  +   if (adev-host_msix_entries[i].entry == entry) {
  +   if (flag)
  +   disable_irq_nosync(
  +   adev-host_msix_entries[i].vector);
  +   else
  +   enable_irq(adev-host_msix_entries[i].vector);
  +   r = 0;
  +   break;
  +   }
  +out:
  +   mutex_unlock(kvm-lock);
  +   return r;
  +}
  
  @@ -1988,6 +2008,12 @@ static int kvm_dev_ioctl_create_vm(void)
  
  return r;
  
  }

#endif
  
  +   r = kvm_register_msix_mmio_dev(kvm);
  +   if (r  0) {
  +   kvm_put_kvm(kvm);
  +   return r;
  +   }
 
 Shouldn't this be part of individual KVM_REGISTER_MSIX_MMIO calls?

In fact this MMIO device is more like global one for the VM, not for every 
devices. It should handle all MMIO from all MSI-X enabled devices, so I put it 
in 
the VM init/destroy process.

  +static int msix_table_mmio_read(struct kvm_io_device *this, gpa_t addr,
  int len, +  void *val)
  +{
  +   struct kvm_msix_mmio_dev *mmio_dev =
  +   container_of(this, struct kvm_msix_mmio_dev, table_dev);
  +   struct kvm_msix_mmio *mmio;
  +   int idx, ret = 0, entry, offset, r;
  +
  +   mutex_lock(mmio_dev-lock);
  +   idx = get_mmio_table_index(mmio_dev, addr, len);
  +   if (idx  0) {
  +   ret = -EOPNOTSUPP;
  +   goto out;
  +   }
  +   if ((addr  0x3) || (len != 4  len != 8))
  +   goto out;
 
 What about (addr  4)  (len == 8)? Is it supported? It may cross entry
 boundaries.

Should not supported. But I haven't found words on the PCI spec for it. So I 
didn't add this check.
 
  +   mmio =mmio_dev-mmio[idx];
  +
  +   entry = (addr - mmio-table_base_addr) / PCI_MSIX_ENTRY_SIZE;
  +   offset = addr  0xf;
  +   r = copy_from_user(val, (void *)(mmio-table_base_va +
  +   entry * PCI_MSIX_ENTRY_SIZE + offset), len);
  
  
  +   if (r)
  +   goto out;
  +out:
  +   mutex_unlock(mmio_dev-lock);
  +   return ret;
  +}
  +
  +static int msix_table_mmio_write(struct kvm_io_device *this, gpa_t addr,
  +   int len, const void *val)
  +{
  +   struct kvm_msix_mmio_dev *mmio_dev =
  +   container_of(this, struct

Re: [PATCH 0/2 v6] MSI-X mask bit support for KVM

2010-12-27 Thread Sheng Yang
On Wednesday 22 December 2010 16:44:53 Sheng Yang wrote:
 This patchset didn't include two PCI related patches which would be checked
 in through PCI subsystem.
 
 Would add the API document soon.

Avi?

BTW, there is one compiling issue for the second patch, due to last minute 
clean 
up...

Would update it along with other comments.

--
regards
Yang, Sheng

 
 Change from v5:
 Complete rewrote according to Avi's comments.
 
 Sheng Yang (2):
   KVM: Move struct kvm_io_device to kvm_host.h
   KVM: Emulate MSI-X table and PBA in kernel
 
  arch/x86/kvm/Makefile|2 +-
  arch/x86/kvm/x86.c   |8 +-
  include/linux/kvm.h  |   22 
  include/linux/kvm_host.h |   48 +
  virt/kvm/assigned-dev.c  |   30 ++
  virt/kvm/iodev.h |   25 +-
  virt/kvm/kvm_main.c  |   38 +++-
  virt/kvm/msix_mmio.c |  244
 ++ 
virt/kvm/msix_mmio.h | 
  24 +
  9 files changed, 410 insertions(+), 31 deletions(-)
  create mode 100644 virt/kvm/msix_mmio.c
  create mode 100644 virt/kvm/msix_mmio.h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel

2010-12-22 Thread Sheng Yang
Then we can support mask bit operation of assigned devices now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/Makefile|2 +-
 arch/x86/kvm/x86.c   |8 +-
 include/linux/kvm.h  |   22 
 include/linux/kvm_host.h |   25 +
 virt/kvm/assigned-dev.c  |   30 ++
 virt/kvm/kvm_main.c  |   38 +++-
 virt/kvm/msix_mmio.c |  244 ++
 virt/kvm/msix_mmio.h |   24 +
 8 files changed, 386 insertions(+), 7 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
 
 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o)
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ed373ba..0be5837 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1965,6 +1965,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -3802,6 +3803,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
   struct kvm_vcpu *vcpu)
 {
gpa_t gpa;
+   int r;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3817,14 +3819,16 @@ static int emulator_write_emulated_onepage(unsigned 
long addr,
 
 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
+   r = vcpu_mmio_write(vcpu, gpa, bytes, val);
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!r)
return X86EMUL_CONTINUE;
 
vcpu-mmio_needed = 1;
-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-exit_reason = (r == -ENOTSYNC) ?
+   KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
vcpu-run-mmio.len = vcpu-mmio_size = bytes;
vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..44838fe 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
+#define KVM_EXIT_MSIX_ROUTING_UPDATE 19
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -541,6 +542,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
 #define KVM_CAP_ASYNC_PF 59
+#define KVM_CAP_MSIX_MMIO 60
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -672,6 +674,9 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_MSIX_MMIO */
+#define KVM_REGISTER_MSIX_MMIO_IOW(KVMIO,  0x7d, struct kvm_msix_mmio_user)
+#define KVM_UNREGISTER_MSIX_MMIO  _IOW(KVMIO,  0x7e, struct kvm_msix_mmio_user)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -795,4 +800,21 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+#define KVM_MSIX_MMIO_TYPE_BASE_PBA(1  9)
+
+#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff
+#define KVM_MSIX_MMIO_TYPE_BASE_MASK   0xff00
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type;
+   __u16 max_entries_nr;
+   __u64 base_addr;
+   __u64 base_va;
+   __u64 flags;
+   __u64 reserved[4];
+};
+
 #endif /* __LINUX_KVM_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ac026ad..15fdd0d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -231,6 +231,27 @@ struct kvm_memslots {
KVM_PRIVATE_MEM_SLOTS];
 };
 
+#define KVM_MSIX_MMIO_MAX32
+
+struct kvm_msix_mmio {
+   u32 dev_id;
+   u16 type;
+   u16 max_entries_nr;
+   u64 flags;
+   gpa_t table_base_addr;
+   hva_t table_base_va

[PATCH 0/2 v6] MSI-X mask bit support for KVM

2010-12-22 Thread Sheng Yang
This patchset didn't include two PCI related patches which would be checked
in through PCI subsystem.

Would add the API document soon.

Change from v5:
Complete rewrote according to Avi's comments.

Sheng Yang (2):
  KVM: Move struct kvm_io_device to kvm_host.h
  KVM: Emulate MSI-X table and PBA in kernel

 arch/x86/kvm/Makefile|2 +-
 arch/x86/kvm/x86.c   |8 +-
 include/linux/kvm.h  |   22 
 include/linux/kvm_host.h |   48 +
 virt/kvm/assigned-dev.c  |   30 ++
 virt/kvm/iodev.h |   25 +-
 virt/kvm/kvm_main.c  |   38 +++-
 virt/kvm/msix_mmio.c |  244 ++
 virt/kvm/msix_mmio.h |   24 +
 9 files changed, 410 insertions(+), 31 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: Move struct kvm_io_device to kvm_host.h

2010-12-22 Thread Sheng Yang
Then it can be used by other struct in kvm_host.h

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |   23 +++
 virt/kvm/iodev.h |   25 +
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ac4e83a..ac026ad 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -98,6 +98,29 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
+struct kvm_io_device;
+
+/**
+ * kvm_io_device_ops are called under kvm slots_lock.
+ * read and write handlers return 0 if the transaction has been handled,
+ * or non-zero to have it passed to the next device.
+ **/
+struct kvm_io_device_ops {
+   int (*read)(struct kvm_io_device *this,
+   gpa_t addr,
+   int len,
+   void *val);
+   int (*write)(struct kvm_io_device *this,
+gpa_t addr,
+int len,
+const void *val);
+   void (*destructor)(struct kvm_io_device *this);
+};
+
+struct kvm_io_device {
+   const struct kvm_io_device_ops *ops;
+};
+
 struct kvm_vcpu {
struct kvm *kvm;
 #ifdef CONFIG_PREEMPT_NOTIFIERS
diff --git a/virt/kvm/iodev.h b/virt/kvm/iodev.h
index 12fd3ca..d1f5651 100644
--- a/virt/kvm/iodev.h
+++ b/virt/kvm/iodev.h
@@ -17,32 +17,9 @@
 #define __KVM_IODEV_H__
 
 #include linux/kvm_types.h
+#include linux/kvm_host.h
 #include asm/errno.h
 
-struct kvm_io_device;
-
-/**
- * kvm_io_device_ops are called under kvm slots_lock.
- * read and write handlers return 0 if the transaction has been handled,
- * or non-zero to have it passed to the next device.
- **/
-struct kvm_io_device_ops {
-   int (*read)(struct kvm_io_device *this,
-   gpa_t addr,
-   int len,
-   void *val);
-   int (*write)(struct kvm_io_device *this,
-gpa_t addr,
-int len,
-const void *val);
-   void (*destructor)(struct kvm_io_device *this);
-};
-
-
-struct kvm_io_device {
-   const struct kvm_io_device_ops *ops;
-};
-
 static inline void kvm_iodevice_init(struct kvm_io_device *dev,
 const struct kvm_io_device_ops *ops)
 {
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4 v6] MSI-X MMIO support in userspace for assigned devices

2010-12-22 Thread Sheng Yang
BTW: the first patch can be applied alone.

Sheng Yang (4):
  qemu-kvm: device assignment: Enabling MSI-X according to the entries'
mask bit
  qemu-kvm: Ioctl for MSIX MMIO support
  qemu-kvm: Header file update for MSI-X MMIO support
  qemu-kvm: MSI-X MMIO support for assigned device

 hw/device-assignment.c  |  325 +--
 hw/device-assignment.h  |9 +-
 kvm/include/linux/kvm.h |   22 +++
 qemu-kvm.c  |   50 +++
 qemu-kvm.h  |   18 +++
 5 files changed, 382 insertions(+), 42 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] qemu-kvm: Header file update for MSI-X MMIO support

2010-12-22 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 kvm/include/linux/kvm.h |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h
index e46729e..e11d2b2 100644
--- a/kvm/include/linux/kvm.h
+++ b/kvm/include/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
+#define KVM_EXIT_MSIX_ROUTING_UPDATE 19
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -530,6 +531,7 @@ struct kvm_enable_cap {
 #ifdef __KVM_HAVE_XCRS
 #define KVM_CAP_XCRS 56
 #endif
+#define KVM_CAP_MSIX_MMIO 60
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -660,6 +662,9 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_MSIX_MMIO */
+#define KVM_REGISTER_MSIX_MMIO_IOW(KVMIO, 0x7d, struct kvm_msix_mmio_user)
+#define KVM_UNREGISTER_MSIX_MMIO  _IOW(KVMIO, 0x7e, struct kvm_msix_mmio_user)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -781,4 +786,21 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+#define KVM_MSIX_MMIO_TYPE_BASE_PBA(1  9)
+
+#define KVM_MSIX_MMIO_TYPE_DEV_MASK0x00ff
+#define KVM_MSIX_MMIO_TYPE_BASE_MASK   0xff00
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type;
+   __u16 max_entries_nr;
+   __u64 base_addr;
+   __u64 base_va;
+   __u64 flags;
+   __u64 reserved[4];
+};
+
 #endif /* __LINUX_KVM_H */
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] qemu-kvm: Ioctl for MSIX MMIO support

2010-12-22 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 qemu-kvm.c |   14 ++
 qemu-kvm.h |7 +++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 471306b..956b62a 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1050,6 +1050,20 @@ int kvm_assign_set_msix_entry(kvm_context_t kvm,
 }
 #endif
 
+#ifdef KVM_CAP_MSIX_MMIO
+int kvm_register_msix_mmio(kvm_context_t kvm,
+   struct kvm_msix_mmio_user *mmio_user)
+{
+return kvm_vm_ioctl(kvm_state, KVM_REGISTER_MSIX_MMIO, mmio_user);
+}
+
+int kvm_unregister_msix_mmio(kvm_context_t kvm,
+ struct kvm_msix_mmio_user *mmio_user)
+{
+return kvm_vm_ioctl(kvm_state, KVM_UNREGISTER_MSIX_MMIO, mmio_user);
+}
+#endif
+
 #if defined(KVM_CAP_IRQFD)  defined(CONFIG_EVENTFD)
 
 #include sys/eventfd.h
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 7e6edfb..86799e6 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -602,6 +602,13 @@ int kvm_assign_set_msix_entry(kvm_context_t kvm,
   struct kvm_assigned_msix_entry *entry);
 #endif
 
+#ifdef KVM_CAP_MSIX_MMIO
+int kvm_register_msix_mmio(kvm_context_t kvm,
+   struct kvm_msix_mmio_user *mmio_user);
+int kvm_unregister_msix_mmio(kvm_context_t kvm,
+ struct kvm_msix_mmio_user *mmio_user);
+#endif
+
 #else   /* !CONFIG_KVM */
 
 typedef struct kvm_context *kvm_context_t;
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: device assignment: Enabling MSI-X according to mask bit

2010-12-20 Thread Sheng Yang
On Thursday 16 December 2010 16:21:52 Sheng Yang wrote:
 The old MSI-X enabling method assume the entries are written before MSI-X
 enabled, but some OS didn't obey this, e.g. FreeBSD. This patch would fix
 this.
 
 Also, according to the PCI spec, mask bit of MSI-X table should be set
 after reset.

Ping?

--
regards
Yang, Sheng

 
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  hw/device-assignment.c |  188
 +--- hw/device-
assignment.h | 
   2 +-
  2 files changed, 162 insertions(+), 28 deletions(-)
 
 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 832c236..ed0b491 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -,15 +,12 @@ static void assigned_dev_update_msi(PCIDevice
 *pci_dev, unsigned int ctrl_pos) #endif
 
  #ifdef KVM_CAP_DEVICE_MSIX
 -static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
 +
 +#define PCI_MSIX_CTRL_MASKBIT1ul
 +static int get_msix_entries_max_nr(AssignedDevice *adev)
  {
 -AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
 -uint16_t entries_nr = 0, entries_max_nr;
 -int pos = 0, i, r = 0;
 -uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
 -struct kvm_assigned_msix_nr msix_nr;
 -struct kvm_assigned_msix_entry msix_entry;
 -void *va = adev-msix_table_page;
 +int pos, entries_max_nr;
 +PCIDevice *pci_dev = adev-dev;
 
  pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
 @@ -1127,20 +1124,48 @@ static int assigned_dev_update_msix_mmio(PCIDevice
 *pci_dev) entries_max_nr = PCI_MSIX_TABSIZE;
  entries_max_nr += 1;
 
 +return entries_max_nr;
 +}
 +
 +static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int entry)
 +{
 +uint32_t msg_ctrl;
 +void *va = adev-msix_table_page;
 +
 +memcpy(msg_ctrl, va + entry * 16 + 12, 4);
 +return (msg_ctrl  PCI_MSIX_CTRL_MASKBIT);
 +}
 +
 +static int get_msix_valid_entries_nr(AssignedDevice *adev,
 +  uint16_t entries_max_nr)
 +{
 +void *va = adev-msix_table_page;
 +uint32_t msg_ctrl;
 +uint16_t entries_nr = 0;
 +int i;
 +
  /* Get the usable entry number for allocating */
  for (i = 0; i  entries_max_nr; i++) {
  memcpy(msg_ctrl, va + i * 16 + 12, 4);
 -memcpy(msg_data, va + i * 16 + 8, 4);
  /* Ignore unused entry even it's unmasked */
 -if (msg_data == 0)
 +if (assigned_dev_msix_entry_masked(adev, i))
  continue;
  entries_nr ++;
  }
 +return entries_nr;
 +}
 +
 +static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
 + uint16_t entries_nr,
 + uint16_t entries_max_nr)
 +{
 +AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
 +int i, r = 0;
 +uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
 +struct kvm_assigned_msix_nr msix_nr;
 +struct kvm_assigned_msix_entry msix_entry;
 +void *va = adev-msix_table_page;
 
 -if (entries_nr == 0) {
 -fprintf(stderr, MSI-X entry number is zero!\n);
 -return -EINVAL;
 -}
  msix_nr.assigned_dev_id = calc_assigned_dev_id(adev-h_segnr,
 adev-h_busnr, (uint8_t)adev-h_devfn); msix_nr.entry_nr = entries_nr;
 @@ -1152,6 +1177,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice
 *pci_dev) }
 
  free_dev_irq_entries(adev);
 +memset(pci_dev-msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
 +   
 sizeof(*pci_dev-msix_entry_used)); adev-irq_entries_nr = entries_nr;
  adev-entry = calloc(entries_nr, sizeof(struct
 kvm_irq_routing_entry)); if (!adev-entry) {
 @@ -1165,10 +1192,10 @@ static int assigned_dev_update_msix_mmio(PCIDevice
 *pci_dev) if (entries_nr = msix_nr.entry_nr)
  break;
  memcpy(msg_ctrl, va + i * 16 + 12, 4);
 -memcpy(msg_data, va + i * 16 + 8, 4);
 -if (msg_data == 0)
 +if (assigned_dev_msix_entry_masked(adev, i))
  continue;
 
 +memcpy(msg_data, va + i * 16 + 8, 4);
  memcpy(msg_addr, va + i * 16, 4);
  memcpy(msg_upper_addr, va + i * 16 + 4, 4);
 
 @@ -1182,17 +1209,18 @@ static int assigned_dev_update_msix_mmio(PCIDevice
 *pci_dev) adev-entry[entries_nr].u.msi.address_lo = msg_addr;
  adev-entry[entries_nr].u.msi.address_hi = msg_upper_addr;
  adev-entry[entries_nr].u.msi.data = msg_data;
 -DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, msg_data,
 msg_addr); -  kvm_add_routing_entry(adev-entry[entries_nr]);
 +DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x!\n, msg_data,
 msg_addr); +kvm_add_routing_entry(adev-entry[entries_nr]);
 
  msix_entry.gsi = adev-entry[entries_nr].gsi;
  msix_entry.entry = i;
 +pci_dev-msix_entry_used[i] = 1;
  r = kvm_assign_set_msix_entry(kvm_context, msix_entry);
  if (r

[PATCH] qemu-kvm: device assignment: Enabling MSI-X according to mask bit

2010-12-16 Thread Sheng Yang
The old MSI-X enabling method assume the entries are written before MSI-X
enabled, but some OS didn't obey this, e.g. FreeBSD. This patch would fix
this.

Also, according to the PCI spec, mask bit of MSI-X table should be set
after reset.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 hw/device-assignment.c |  188 +---
 hw/device-assignment.h |2 +-
 2 files changed, 162 insertions(+), 28 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 832c236..ed0b491 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -,15 +,12 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, 
unsigned int ctrl_pos)
 #endif
 
 #ifdef KVM_CAP_DEVICE_MSIX
-static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
+
+#define PCI_MSIX_CTRL_MASKBIT  1ul
+static int get_msix_entries_max_nr(AssignedDevice *adev)
 {
-AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
-uint16_t entries_nr = 0, entries_max_nr;
-int pos = 0, i, r = 0;
-uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
-struct kvm_assigned_msix_nr msix_nr;
-struct kvm_assigned_msix_entry msix_entry;
-void *va = adev-msix_table_page;
+int pos, entries_max_nr;
+PCIDevice *pci_dev = adev-dev;
 
 pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
@@ -1127,20 +1124,48 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 entries_max_nr = PCI_MSIX_TABSIZE;
 entries_max_nr += 1;
 
+return entries_max_nr;
+}
+
+static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int entry)
+{
+uint32_t msg_ctrl;
+void *va = adev-msix_table_page;
+
+memcpy(msg_ctrl, va + entry * 16 + 12, 4);
+return (msg_ctrl  PCI_MSIX_CTRL_MASKBIT);
+}
+
+static int get_msix_valid_entries_nr(AssignedDevice *adev,
+uint16_t entries_max_nr)
+{
+void *va = adev-msix_table_page;
+uint32_t msg_ctrl;
+uint16_t entries_nr = 0;
+int i;
+
 /* Get the usable entry number for allocating */
 for (i = 0; i  entries_max_nr; i++) {
 memcpy(msg_ctrl, va + i * 16 + 12, 4);
-memcpy(msg_data, va + i * 16 + 8, 4);
 /* Ignore unused entry even it's unmasked */
-if (msg_data == 0)
+if (assigned_dev_msix_entry_masked(adev, i))
 continue;
 entries_nr ++;
 }
+return entries_nr;
+}
+
+static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
+ uint16_t entries_nr,
+ uint16_t entries_max_nr)
+{
+AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
+int i, r = 0;
+uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
+struct kvm_assigned_msix_nr msix_nr;
+struct kvm_assigned_msix_entry msix_entry;
+void *va = adev-msix_table_page;
 
-if (entries_nr == 0) {
-fprintf(stderr, MSI-X entry number is zero!\n);
-return -EINVAL;
-}
 msix_nr.assigned_dev_id = calc_assigned_dev_id(adev-h_segnr, 
adev-h_busnr,
   (uint8_t)adev-h_devfn);
 msix_nr.entry_nr = entries_nr;
@@ -1152,6 +1177,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 }
 
 free_dev_irq_entries(adev);
+memset(pci_dev-msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
+sizeof(*pci_dev-msix_entry_used));
 adev-irq_entries_nr = entries_nr;
 adev-entry = calloc(entries_nr, sizeof(struct kvm_irq_routing_entry));
 if (!adev-entry) {
@@ -1165,10 +1192,10 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 if (entries_nr = msix_nr.entry_nr)
 break;
 memcpy(msg_ctrl, va + i * 16 + 12, 4);
-memcpy(msg_data, va + i * 16 + 8, 4);
-if (msg_data == 0)
+if (assigned_dev_msix_entry_masked(adev, i))
 continue;
 
+memcpy(msg_data, va + i * 16 + 8, 4);
 memcpy(msg_addr, va + i * 16, 4);
 memcpy(msg_upper_addr, va + i * 16 + 4, 4);
 
@@ -1182,17 +1209,18 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 adev-entry[entries_nr].u.msi.address_lo = msg_addr;
 adev-entry[entries_nr].u.msi.address_hi = msg_upper_addr;
 adev-entry[entries_nr].u.msi.data = msg_data;
-DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, msg_data, msg_addr);
-   kvm_add_routing_entry(adev-entry[entries_nr]);
+DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x!\n, msg_data, msg_addr);
+kvm_add_routing_entry(adev-entry[entries_nr]);
 
 msix_entry.gsi = adev-entry[entries_nr].gsi;
 msix_entry.entry = i;
+pci_dev-msix_entry_used[i] = 1;
 r = kvm_assign_set_msix_entry(kvm_context, msix_entry);
 if (r) {
 fprintf(stderr, fail to set MSI-X entry! %s\n, strerror(-r));
 break;
 }
-DEBUG(MSI-X

[PATCH] KVM: Fix OSXSAVE after migration

2010-12-07 Thread Sheng Yang
CPUID's OSXSAVE is a mirror of CR4.OSXSAVE bit. We need to update the CPUID
after migration.

Cc: sta...@kernel.org
Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/x86.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ed373ba..51a2bce 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5585,6 +5585,8 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 
mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs-cr4;
kvm_x86_ops-set_cr4(vcpu, sregs-cr4);
+   if (sregs-cr4  X86_CR4_OSXSAVE)
+   update_cpuid(vcpu);
if (!is_long_mode(vcpu)  is_pae(vcpu)) {
load_pdptrs(vcpu, vcpu-arch.walk_mmu, vcpu-arch.cr3);
mmu_reset_needed = 1;
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Add reading critical region for kvm_io_bus_read/write

2010-12-06 Thread Sheng Yang
Seems we missed it.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
Do we need this, or slot_lock covered this?

 virt/kvm/kvm_main.c |   24 
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c4ee364..3e71b91 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2228,28 +2228,36 @@ static void kvm_io_bus_destroy(struct kvm_io_bus *bus)
 int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
 int len, const void *val)
 {
-   int i;
+   int i, idx, r = -EOPNOTSUPP;
struct kvm_io_bus *bus;
 
+   idx = srcu_read_lock(kvm-srcu);
bus = srcu_dereference(kvm-buses[bus_idx], kvm-srcu);
for (i = 0; i  bus-dev_count; i++)
-   if (!kvm_iodevice_write(bus-devs[i], addr, len, val))
-   return 0;
-   return -EOPNOTSUPP;
+   if (!kvm_iodevice_write(bus-devs[i], addr, len, val)) {
+   r = 0;
+   break;
+   }
+   srcu_read_unlock(kvm-srcu, idx);
+   return r;
 }
 
 /* kvm_io_bus_read - called under kvm-slots_lock */
 int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
int len, void *val)
 {
-   int i;
+   int i, idx, r = -EOPNOTSUPP;
struct kvm_io_bus *bus;
 
+   idx = srcu_read_lock(kvm-srcu);
bus = srcu_dereference(kvm-buses[bus_idx], kvm-srcu);
for (i = 0; i  bus-dev_count; i++)
-   if (!kvm_iodevice_read(bus-devs[i], addr, len, val))
-   return 0;
-   return -EOPNOTSUPP;
+   if (!kvm_iodevice_read(bus-devs[i], addr, len, val)) {
+   r = 0;
+   break;
+   }
+   srcu_read_unlock(kvm-srcu, idx);
+   return r;
 }
 
 /* Caller must hold slots_lock. */
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add reading critical region for kvm_io_bus_read/write

2010-12-06 Thread Sheng Yang
On Monday 06 December 2010 20:58:10 Avi Kivity wrote:
 On 12/06/2010 10:44 AM, Sheng Yang wrote:
  Seems we missed it.
  
  Signed-off-by: Sheng Yangsh...@linux.intel.com
  ---
  Do we need this, or slot_lock covered this?
  
virt/kvm/kvm_main.c |   24 
1 files changed, 16 insertions(+), 8 deletions(-)
  
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index c4ee364..3e71b91 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -2228,28 +2228,36 @@ static void kvm_io_bus_destroy(struct kvm_io_bus
  *bus)
  
int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,

   int len, const void *val)

{
  
  -   int i;
  +   int i, idx, r = -EOPNOTSUPP;
  
  struct kvm_io_bus *bus;
  
  +   idx = srcu_read_lock(kvm-srcu);
  
  bus = srcu_dereference(kvm-buses[bus_idx],kvm-srcu);
  for (i = 0; i  bus-dev_count; i++)
  
  -   if (!kvm_iodevice_write(bus-devs[i], addr, len, val))
  -   return 0;
  -   return -EOPNOTSUPP;
  +   if (!kvm_iodevice_write(bus-devs[i], addr, len, val)) {
  +   r = 0;
  +   break;
  +   }
  +   srcu_read_unlock(kvm-srcu, idx);
  +   return r;
  
}
 
 Isn't this already taken care of by srcu_read_lock() in
 vcpu_enter_guest(), just before calling -handle_exit()?

Yes, finally found it. So big reading critical region...

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance test result between virtio_pci MSI-X disable and enable

2010-12-02 Thread Sheng Yang
On Thu, Dec 2, 2010 at 5:49 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Dec 02, 2010 at 09:13:28AM +0800, Yang, Sheng wrote:
 On Wednesday 01 December 2010 22:03:58 Michael S. Tsirkin wrote:
  On Wed, Dec 01, 2010 at 04:41:38PM +0800, lidong chen wrote:
   I used sr-iov, give each vm 2 vf.
   after apply the patch, and i found performence is the same.
  
   the reason is in function msix_mmio_write, mostly addr is not in mmio
   range.
  
   static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr, int
   len,
  
                        const void *val)
  
   {
  
     struct kvm_assigned_dev_kernel *adev =
  
                     container_of(this, struct kvm_assigned_dev_kernel,
  
                                  msix_mmio_dev);
  
     int idx, r = 0;
     unsigned long new_val = *(unsigned long *)val;
  
     mutex_lock(adev-kvm-lock);
     if (!msix_mmio_in_range(adev, addr, len)) {
  
             // return here.
  
                    r = -EOPNOTSUPP;
  
             goto out;
  
     }
  
   i printk the value:
   addr             start           end           len
   F004C00C   F0044000  F0044030     4
  
   00:06.0 Ethernet controller: Intel Corporation Unknown device 10ed (rev
   01)
  
     Subsystem: Intel Corporation Unknown device 000c
     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
  
   Stepping- SERR- FastB2B-
  
     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort-
  
   TAbort- MAbort- SERR- PERR-
  
     Latency: 0
     Region 0: Memory at f004 (32-bit, non-prefetchable) [size=16K]
     Region 3: Memory at f0044000 (32-bit, non-prefetchable) [size=16K]
     Capabilities: [40] MSI-X: Enable+ Mask- TabSize=3
  
             Vector table: BAR=3 offset=
             PBA: BAR=3 offset=2000
  
   00:07.0 Ethernet controller: Intel Corporation Unknown device 10ed (rev
   01)
  
     Subsystem: Intel Corporation Unknown device 000c
     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
  
   Stepping- SERR- FastB2B-
  
     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort-
  
   TAbort- MAbort- SERR- PERR-
  
     Latency: 0
     Region 0: Memory at f0048000 (32-bit, non-prefetchable) [size=16K]
     Region 3: Memory at f004c000 (32-bit, non-prefetchable) [size=16K]
     Capabilities: [40] MSI-X: Enable+ Mask- TabSize=3
  
             Vector table: BAR=3 offset=
             PBA: BAR=3 offset=2000
  
   +static bool msix_mmio_in_range(struct kvm_assigned_dev_kernel *adev,
   +                       gpa_t addr, int len)
   +{
   + gpa_t start, end;
   +
   + BUG_ON(adev-msix_mmio_base == 0);
   + start = adev-msix_mmio_base;
   + end = adev-msix_mmio_base + PCI_MSIX_ENTRY_SIZE *
   +         adev-msix_max_entries_nr;
   + if (addr = start  addr + len = end)
   +         return true;
   +
   + return false;
   +}
 
  Hmm, this check looks wrong to me: there's no guarantee
  that guest uses the first N entries in the table.
  E.g. it could use a single entry, but only the last one.

 Please check the PCI spec.


 This is pretty explicit in the spec: the the last paragraph in the below:

 IMPLEMENTATION NOTE
 Handling MSI-X Vector Shortages

 Handling MSI-X Vector Shortages
 For the case where fewer vectors are allocated to a function than desired,

You may not notice the premise here. Also check for Table Size would
help I think.

-- 
regards,
Yang, Sheng

software-
 controlled aliasing as enabled by MSI-X is one approach for handling the 
 situation. For
 example, if a function supports five queues, each with an associated MSI-X 
 table entry, but
 only three vectors are allocated, the function could be designed for software 
 still to configure
 all five table entries, assigning one or more vectors to multiple table 
 entries. Software could
 assign the three vectors {A,B,C} to the five entries as ABCCC, ABBCC, ABCBA, 
 or other
 similar combinations.


 Alternatively, the function could be designed for software to configure it 
 (using a device-
 specific mechanism) to use only three queues and three MSI-X table entries. 
 Software could
 assign the three vectors {A,B,C} to the five entries as ABC--, A-B-C, A--CB, 
 or other similar
 combinations.



 --
 regards
 Yang, Sheng


   2010/11/30 Yang, Sheng sheng.y...@intel.com:
On Tuesday 30 November 2010 17:10:11 lidong chen wrote:
sr-iov also meet this problem, MSIX mask waste a lot of cpu resource.
   
I test kvm with sriov, which the vf driver could not disable msix.
so the host os waste a lot of cpu.  cpu rate of host os is 90%.
   
then I test xen with sriov, there ara also a lot of vm exits caused by
MSIX mask.
but the cpu rate of xen and domain0 is less than kvm. cpu rate of xen
and domain0 is 60%.
   
without sr-iov, the cpu rate of xen and domain0 is higher than kvm.
   
so i think the problem is kvm waste more cpu resource to deal with
MSIX mask. and we can see how xen deal 

Re: Mask bit support's API

2010-12-02 Thread Sheng Yang
On Thu, Dec 2, 2010 at 10:26 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Dec 02, 2010 at 03:56:52PM +0200, Avi Kivity wrote:
 On 12/02/2010 03:47 PM, Michael S. Tsirkin wrote:
 
   Which case?  the readl() doesn't need access to the routing table,
   just the entry.
 
 One thing that read should do is flush in the outstanding
 interrupts and flush out the mask bit writes.

 The mask bit writes are synchronous.

 wrt interrupts, we can deal with assigned devices, and can poll
 irqfds.  But we can't force vhost-net to issue an interrupt (and I
 don't think it's required).

 To clarify:

        mask write
        read

 it is safe for guest to assume no more interrupts

 where as with a simple
        mask write

 an interrupt might be in flight and get delivered shortly afterwards.

I think it's already contained in the current patchset.


   Oh, I think there is a terminology problem, I was talking about
   kvm's irq routing table, you were talking about the msix entries.
 
   I think treating it as a cache causes more problems, since there are
   now two paths for reads (in cache and not in cache) and more things
   for writes to manage.
 
   Here's my proposed API:
 
   KVM_DEFINE_MSIX_TABLE(table_id, nr_entries, msix_base_gpa,
   pending_bitmap_base_gpa)
 
    - called when the guest enables msix
 
 I would add virtual addresses so that we can use swappable memory to
 store the state.

 Right.

Do we need synchronization between kernel and userspace? Any recommended method?

 If we do, maybe we can just keep the table there and then
 KVM_SET/GET_MSIX_ENTRY and the new exit won't be needed?

 Still need to to let userspace know it needs to reprogram the irqfd
 or whatever it uses to inject the interrupt.

 Why do we need to reprogram irqfd?  I thought irqfd would map to an
 entry within the table instead of address/data as now.
 Could you clarify please?


   KVM_REMOVE_MSIX_TABLE(table_id)
 
     - called when the guest disables msix
 
   KVM_SET_MSIX_ENTRY(table_id, entry_id, contents)
 
     - called when the guest enables msix (to initialize it), or after
   live migration
 
 What is entry_id here?

 Entry within the table.

 So I think KVM_DEFINE_MSIX_TABLE should be called when msix is
 enabled (note: it can not be called at boot anyway since pa
 depends on BAR assigned by BIOS).

Don't agree. MMIO can be write regardless of if MSIX is enabled. If
you handle MMIO to kernel, them handle them all. I suppose qemu still
got control of BAR? Then leave it in the current place should be fine.

   Michael?  I think that should work for virtio and vfio assigned
   devices?  Not sure about pending bits.
 
 Pending bits must be tracked in kernel, but I don't see
 how we can support polling mode if we don't exit to userspace
 on pending bit reads.
 
 This does mean that some reads will be fast and some will be
 slow, and it's a bit sad that we seem to be optimizing
 for specific guests, but I just can't come up with
 anything better.
 

 If the pending bits live in userspace memory, the device model can
 update them directly?

 Note that these are updated on an interrupt, so updating them
 in userspace would need get_user_page etc trickery,
 and add the overhead of atomics.

 Further I think it's important to avoid the overhead of updating them
 all the time, and only do this when an interrupt is
 masked or on pending bits read. Since userspace does not know
 when interrupts are masked, this means do update on each read.

In fact qemu's accessing to MMIO should be quite rare after moving all
the things to the kernel. Using IOCTL is also fine with me.

And how to do update on each read?

-- 
regards,
Yang, Sheng

 So maybe just add an ioctl to get and to clear pending bits.
 Maybe set for symmetry.

 For live migration too.  But if they live in memory, no need for
 get/set, just specify the address.

 --
 error compiling committee.c: too many arguments to function
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func

2010-11-18 Thread Sheng Yang
On Thu, Nov 18, 2010 at 5:41 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote:
 
   *entry may be stale after rcu_read_unlock().  Is this a problem?
 
 I suppose not. All MSI-X MMIO accessing would be executed without delay, so 
 no re-
 order issue would happen. If the guest is reading and writing the field at 
 the same
 time(from two cpus), it should got some kinds of sync method for itself - 
 or it
 may not care what's the reading result(like the one after msix_mask_irq()).

 I guess so.  Michael/Alex?

 This is kvm_get_irq_routing_entry which is used for table reads,
 correct?  Actually, the pci read *is* the sync method that guests use,
 they rely on reads to flush out all previous writes.

Michael, I think the *sync* you are talking about is not the one I
meant. I was talking about two cpus case, one is reading and the other
is writing, the order can't be determined if guest doesn't use lock or
some other synchronize methods; and you're talking about to flush out
all previous writes of the only one CPU...

-- 
regards,
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-18 Thread Sheng Yang
On Thu, Nov 18, 2010 at 5:28 PM, Avi Kivity a...@redhat.com wrote:
 On 11/18/2010 03:58 AM, Sheng Yang wrote:

 On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote:
   On 11/15/2010 11:15 AM, Sheng Yang wrote:
     This patch enable per-vector mask for assigned devices using MSI-X.
   
     This patch provided two new APIs: one is for guest to specific
  device's
     MSI-X table address in MMIO, the other is for userspace to get
     information about mask bit.
   
     All the mask bit operation are kept in kernel, in order to
  accelerate.
     Userspace shouldn't access the device MMIO directly for the
  information,
     instead it should uses provided API to do so.
   
     Signed-off-by: Sheng Yangsh...@linux.intel.com
     ---
   
       arch/x86/kvm/x86.c       |    1 +
       include/linux/kvm.h      |   32 +
       include/linux/kvm_host.h |    5 +
       virt/kvm/assigned-dev.c  |  318
       +- 4 files changed,
 355
       insertions(+), 1 deletions(-)
 
   Documentation?

 For we are keeping changing the API for last several versions, I'd like to
 settle
 down the API first. Would bring back the document after API was agreed.

 Maybe for APIs we should start with only the documentation patch, agree on
 that, and move on to the implementation.

Yes, would follow it next time. And I would bring back the documents
in the next edition, for Michael and I have reached agreement on API.

   What if it's a 64-bit write on a 32-bit host?

 In fact we haven't support QWORD(64bit) accessing now. The reason is we
 haven't
 seen any OS is using it in this way now, so I think we can leave it later.

 Also seems QEmu doesn't got the way to handle 64bit MMIO.

 There's a difference, if the API doesn't support it, we can't add it later
 without changing both kernel and userspace.

Um... Which API you're talking about? I think userspace API(set msix
mmio, and get mask bit status) is unrelated here?

 
   That's not very good.  We should do the entire thing in the kernel or
  in
   userspace.  We can have a new EXIT_REASON to let userspace know an msix
   entry changed, and it should read it from the kernel.

 If you look it in this way:
 1. Mask bit owned by kernel.
 2. Routing owned by userspace.
 3. Read the routing in kernel is an speed up for normal operation -
 because kernel
 can read from them.

 So I think the logic here is clear to understand.

 Still, it's complicated and the state is split across multiple components.

So how about removing the reading acceleration part in the patch
temporarily? Kernel owns mask bit and userspace owns others. That
should be better. I can add the reading part later when we can find an
elegant way to do so.


 But if we can modify the routing in kernel, it would be raise some sync
 issues due
 to both kernel and userspace own routing. So maybe the better solution is
 move the
 routing to kernel.

 That may work, but I don't think we can do this for vfio.

-- 
regards,
Yang, Sheng

 --
 error compiling committee.c: too many arguments to function

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func

2010-11-18 Thread Sheng Yang
On Thu, Nov 18, 2010 at 8:33 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Nov 18, 2010 at 07:59:10PM +0800, Sheng Yang wrote:
 On Thu, Nov 18, 2010 at 5:41 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote:
  
    *entry may be stale after rcu_read_unlock().  Is this a problem?
  
  I suppose not. All MSI-X MMIO accessing would be executed without delay, 
  so no re-
  order issue would happen. If the guest is reading and writing the field 
  at the same
  time(from two cpus), it should got some kinds of sync method for itself 
  - or it
  may not care what's the reading result(like the one after 
  msix_mask_irq()).
 
  I guess so.  Michael/Alex?
 
  This is kvm_get_irq_routing_entry which is used for table reads,
  correct?  Actually, the pci read *is* the sync method that guests use,
  they rely on reads to flush out all previous writes.

 Michael, I think the *sync* you are talking about is not the one I
 meant. I was talking about two cpus case, one is reading and the other
 is writing, the order can't be determined if guest doesn't use lock or
 some other synchronize methods; and you're talking about to flush out
 all previous writes of the only one CPU...

 Yes, but you don't seem to flush out writes on a read, either.

... I don't understand... We are emulating the writing operation using
software and make it in effect immediately... What should we supposed
to do with this flush?

-- 
regards,
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-17 Thread Sheng Yang
On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote:
 On 11/15/2010 11:15 AM, Sheng Yang wrote:
  This patch enable per-vector mask for assigned devices using MSI-X.
  
  This patch provided two new APIs: one is for guest to specific device's
  MSI-X table address in MMIO, the other is for userspace to get
  information about mask bit.
  
  All the mask bit operation are kept in kernel, in order to accelerate.
  Userspace shouldn't access the device MMIO directly for the information,
  instead it should uses provided API to do so.
  
  Signed-off-by: Sheng Yangsh...@linux.intel.com
  ---
  
arch/x86/kvm/x86.c   |1 +
include/linux/kvm.h  |   32 +
include/linux/kvm_host.h |5 +
virt/kvm/assigned-dev.c  |  318
+- 4 files changed, 
355
insertions(+), 1 deletions(-)
 
 Documentation?

For we are keeping changing the API for last several versions, I'd like to 
settle 
down the API first. Would bring back the document after API was agreed.
 
  +static int msix_mmio_read(struct kvm_io_device *this, gpa_t addr, int
  len, +void *val)
  +{
  +   struct kvm_assigned_dev_kernel *adev =
  +   container_of(this, struct kvm_assigned_dev_kernel,
  +msix_mmio_dev);
  +   int idx, r = 0;
  +   u32 entry[4];
  +   struct kvm_kernel_irq_routing_entry e;
  +
  +   /* TODO: Get big-endian machine work */
  +   mutex_lock(adev-kvm-lock);
  +   if (!msix_mmio_in_range(adev, addr, len)) {
  +   r = -EOPNOTSUPP;
  +   goto out;
  +   }
  +   if ((addr  0x3) || len != 4)
  +   goto out;
  +
  +   idx = msix_get_enabled_idx(adev, addr, len);
  +   if (idx  0) {
  +   idx = (addr - adev-msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
  +   if ((addr % PCI_MSIX_ENTRY_SIZE) ==
  +   PCI_MSIX_ENTRY_VECTOR_CTRL)
  +   *(unsigned long *)val =
  +   test_bit(idx, adev-msix_mask_bitmap) ?
  +   PCI_MSIX_ENTRY_CTRL_MASKBIT : 0;
  +   else
  +   r = -EOPNOTSUPP;
  +   goto out;
  +   }
  +
  +   r = kvm_get_irq_routing_entry(adev-kvm,
  +   adev-guest_msix_entries[idx].vector,e);
  +   if (r || e.type != KVM_IRQ_ROUTING_MSI) {
  +   r = -EOPNOTSUPP;
  +   goto out;
  +   }
  +   entry[0] = e.msi.address_lo;
  +   entry[1] = e.msi.address_hi;
  +   entry[2] = e.msi.data;
  +   entry[3] = test_bit(adev-guest_msix_entries[idx].entry,
  +   adev-msix_mask_bitmap);
  +   memcpy(val,entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);
  +
  +out:
  +   mutex_unlock(adev-kvm-lock);
  +   return r;
  +}
  +
  +static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr, int
  len, + const void *val)
  +{
  +   struct kvm_assigned_dev_kernel *adev =
  +   container_of(this, struct kvm_assigned_dev_kernel,
  +msix_mmio_dev);
  +   int idx, r = 0;
  +   unsigned long new_val = *(unsigned long *)val;
 
 What if it's a 64-bit write on a 32-bit host?

In fact we haven't support QWORD(64bit) accessing now. The reason is we haven't 
seen any OS is using it in this way now, so I think we can leave it later.

Also seems QEmu doesn't got the way to handle 64bit MMIO.
 
 Are we sure the trailing bytes of val are zero?
 
  +
  +   /* TODO: Get big-endian machine work */
 
 BUILD_BUG_ON(something)

Good idea!
 
  +   mutex_lock(adev-kvm-lock);
  +   if (!msix_mmio_in_range(adev, addr, len)) {
  +   r = -EOPNOTSUPP;
  +   goto out;
  +   }
 
 Why is this needed?  Didn't the iodev check already do this?

Well, kvm_io_device_ops() hasn't got in_range callback yet...
 
  +   if ((addr  0x3) || len != 4)
  +   goto out;
 
 What if len == 8?  I think mst said it was legal.

Since we haven't seen anyone is using it in this way, so I think we can leave 
it 
later.
 
  +
  +   idx = msix_get_enabled_idx(adev, addr, len);
  +   if (idx  0) {
  +   idx = (addr - adev-msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
  +   if (((addr % PCI_MSIX_ENTRY_SIZE) ==
  +   PCI_MSIX_ENTRY_VECTOR_CTRL)) {
  +   if (new_val  ~PCI_MSIX_ENTRY_CTRL_MASKBIT)
  +   goto out;
  +   if (new_val  PCI_MSIX_ENTRY_CTRL_MASKBIT)
  +   set_bit(idx, adev-msix_mask_bitmap);
  +   else
  +   clear_bit(idx, adev-msix_mask_bitmap);
  +   /* It's possible that we need re-enable MSI-X, so go
  +* back to userspace */
  +   }
  +   /* Userspace would handle other MMIO writing */
  +   r = -EOPNOTSUPP;
 
 That's not very good.  We should do the entire thing in the kernel or in
 userspace.  We can have a new EXIT_REASON to let userspace know

Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func

2010-11-17 Thread Sheng Yang
On Wednesday 17 November 2010 22:01:41 Avi Kivity wrote:
 On 11/15/2010 11:15 AM, Sheng Yang wrote:
  We need to query the entry later.
  
  +int kvm_get_irq_routing_entry(struct kvm *kvm, int gsi,
  +   struct kvm_kernel_irq_routing_entry *entry)
  +{
  +   int count = 0;
  +   struct kvm_kernel_irq_routing_entry *ei = NULL;
  +   struct kvm_irq_routing_table *irq_rt;
  +   struct hlist_node *n;
  +
  +   rcu_read_lock();
  +   irq_rt = rcu_dereference(kvm-irq_routing);
  +   if (gsi  irq_rt-nr_rt_entries)
  +   hlist_for_each_entry(ei, n,irq_rt-map[gsi], link)
  +   count++;
  +   if (count == 1)
  +   *entry = *ei;
  +   rcu_read_unlock();
  +
  +   return (count != 1);
  +}
  +
 
 Not good form to rely on ei being valid after the loop.
 
 I guess this is only useful for msi?  Need to document it.

May can be used for others later, it's somehow generic. Where should I document 
it?
 
 *entry may be stale after rcu_read_unlock().  Is this a problem?

I suppose not. All MSI-X MMIO accessing would be executed without delay, so no 
re-
order issue would happen. If the guest is reading and writing the field at the 
same 
time(from two cpus), it should got some kinds of sync method for itself - or it 
may not care what's the reading result(like the one after msix_mask_irq()). 

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-17 Thread Sheng Yang
On Thursday 18 November 2010 14:21:40 Michael S. Tsirkin wrote:
 On Thu, Nov 18, 2010 at 09:58:55AM +0800, Sheng Yang wrote:
+static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr,
int len, + const void *val)
+{
+   struct kvm_assigned_dev_kernel *adev =
+   container_of(this, struct 
kvm_assigned_dev_kernel,
+msix_mmio_dev);
+   int idx, r = 0;
+   unsigned long new_val = *(unsigned long *)val;
   
   What if it's a 64-bit write on a 32-bit host?
  
  In fact we haven't support QWORD(64bit) accessing now. The reason is we
  haven't seen any OS is using it in this way now, so I think we can leave
  it later.
  
  Also seems QEmu doesn't got the way to handle 64bit MMIO.
 
 I think it does.  I think it simply splits these to 32-bit transactions
 and handles as such. That seems to be spec-compilant.  I wouldn't want us
 to regress.

Yes, you're right...

I think I have to add it. :shrug:

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-16 Thread Sheng Yang
On Wednesday 17 November 2010 03:45:22 Marcelo Tosatti wrote:
 On Mon, Nov 15, 2010 at 05:15:32PM +0800, Sheng Yang wrote:
  This patch enable per-vector mask for assigned devices using MSI-X.
  
  This patch provided two new APIs: one is for guest to specific device's
  MSI-X table address in MMIO, the other is for userspace to get
  information about mask bit.
  
  All the mask bit operation are kept in kernel, in order to accelerate.
  Userspace shouldn't access the device MMIO directly for the information,
  instead it should uses provided API to do so.
  
  Signed-off-by: Sheng Yang sh...@linux.intel.com
  ---
  
   arch/x86/kvm/x86.c   |1 +
   include/linux/kvm.h  |   32 +
   include/linux/kvm_host.h |5 +
   virt/kvm/assigned-dev.c  |  318
   +- 4 files changed, 
355
   insertions(+), 1 deletions(-)
  
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index fc29223..37602e2 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  
  case KVM_CAP_X86_ROBUST_SINGLESTEP:
  case KVM_CAP_XSAVE:
  
  case KVM_CAP_ASYNC_PF:
  +   case KVM_CAP_MSIX_MASK:
  r = 1;
  break;
  
  case KVM_CAP_COALESCED_MMIO:
  diff --git a/include/linux/kvm.h b/include/linux/kvm.h
  index ea2dc1a..b3e5ffe 100644
  --- a/include/linux/kvm.h
  +++ b/include/linux/kvm.h
  @@ -541,6 +541,9 @@ struct kvm_ppc_pvinfo {
  
   #define KVM_CAP_PPC_GET_PVINFO 57
   #define KVM_CAP_PPC_IRQ_LEVEL 58
   #define KVM_CAP_ASYNC_PF 59
  
  +#ifdef __KVM_HAVE_MSIX
  +#define KVM_CAP_MSIX_MASK 60
  +#endif
  
   #ifdef KVM_CAP_IRQ_ROUTING
  
  @@ -672,6 +675,9 @@ struct kvm_clock_data {
  
   #define KVM_XEN_HVM_CONFIG_IOW(KVMIO,  0x7a, struct
   kvm_xen_hvm_config) #define KVM_SET_CLOCK _IOW(KVMIO, 
   0x7b, struct kvm_clock_data) #define KVM_GET_CLOCK
   _IOR(KVMIO,  0x7c, struct kvm_clock_data)
  
  +/* Available with KVM_CAP_MSIX_MASK */
  +#define KVM_GET_MSIX_ENTRY_IOWR(KVMIO,  0x7d, struct
  kvm_msix_entry) +#define KVM_UPDATE_MSIX_MMIO  _IOW(KVMIO,  0x7e,
  struct kvm_msix_mmio)
  
   /* Available with KVM_CAP_PIT_STATE2 */
   #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct
   kvm_pit_state2) #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0,
   struct kvm_pit_state2)
  
  @@ -795,4 +801,30 @@ struct kvm_assigned_msix_entry {
  
  __u16 padding[3];
   
   };
  
  +#define KVM_MSIX_TYPE_ASSIGNED_DEV 1
  +
  +#define KVM_MSIX_FLAG_MASKBIT  (1  0)
  +#define KVM_MSIX_FLAG_QUERY_MASKBIT(1  0)
  +
  +struct kvm_msix_entry {
  +   __u32 id;
  +   __u32 type;
 
 Is type really necessary? Will it ever differ from
 KVM_MSIX_TYPE_ASSIGNED_DEV?

This is the suggestion from Michael. He want it to be reused by emulated/pv 
devices. So I add the type field here.
 
  +   __u32 entry; /* The index of entry in the MSI-X table */
  +   __u32 flags;
  +   __u32 query_flags;
  +   __u32 reserved[5];
  +};
  +
  +#define KVM_MSIX_MMIO_FLAG_REGISTER(1  0)
  +#define KVM_MSIX_MMIO_FLAG_UNREGISTER  (1  1)
  +
  +struct kvm_msix_mmio {
  +   __u32 id;
  +   __u32 type;
  +   __u64 base_addr;
  +   __u32 max_entries_nr;
  +   __u32 flags;
  +   __u32 reserved[6];
  +};
  +
  
   #endif /* __LINUX_KVM_H */
  
  diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
  index f09db87..57a437a 100644
  --- a/include/linux/kvm_host.h
  +++ b/include/linux/kvm_host.h
  @@ -501,6 +501,7 @@ struct kvm_guest_msix_entry {
  
   };
   
   #define KVM_ASSIGNED_ENABLED_IOMMU (1  0)
  
  +#define KVM_ASSIGNED_ENABLED_MSIX_MMIO (1  1)
  
   struct kvm_assigned_dev_kernel {
   
  struct kvm_irq_ack_notifier ack_notifier;
  struct work_struct interrupt_work;
  
  @@ -521,6 +522,10 @@ struct kvm_assigned_dev_kernel {
  
  struct pci_dev *dev;
  struct kvm *kvm;
  spinlock_t assigned_dev_lock;
  
  +   DECLARE_BITMAP(msix_mask_bitmap, KVM_MAX_MSIX_PER_DEV);
  +   gpa_t msix_mmio_base;
  +   struct kvm_io_device msix_mmio_dev;
  +   int msix_max_entries_nr;
  
   };
   
   struct kvm_irq_mask_notifier {
  
  diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
  index 5c6b96d..76a1f12 100644
  --- a/virt/kvm/assigned-dev.c
  +++ b/virt/kvm/assigned-dev.c
  @@ -226,12 +226,27 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
  
  kvm_deassign_irq(kvm, assigned_dev, assigned_dev-irq_requested_type);
   
   }
  
  +static void unregister_msix_mmio(struct kvm *kvm,
  +struct kvm_assigned_dev_kernel *adev)
  +{
  +   if (adev-flags  KVM_ASSIGNED_ENABLED_MSIX_MMIO) {
  +   mutex_lock(kvm-slots_lock);
  +   kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS,
  +   adev-msix_mmio_dev);
  +   mutex_unlock(kvm-slots_lock);
  +   adev-flags = ~KVM_ASSIGNED_ENABLED_MSIX_MMIO

Re: [PATCH 2/7] PCI: Add mask bit definition for MSI-X table

2010-11-15 Thread Sheng Yang
On Friday 12 November 2010 01:29:29 Jesse Barnes wrote:
 On Thu, 11 Nov 2010 15:46:55 +0800
 
 Sheng Yang sh...@linux.intel.com wrote:
  Then we can use it instead of magic number 1.
  
  Reviewed-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
  Cc: Matthew Wilcox wi...@linux.intel.com
  Cc: Jesse Barnes jbar...@virtuousgeek.org
  Cc: linux-...@vger.kernel.org
  Signed-off-by: Sheng Yang sh...@linux.intel.com
  ---
  
   drivers/pci/msi.c|5 +++--
   include/linux/pci_regs.h |1 +
   2 files changed, 4 insertions(+), 2 deletions(-)
  
  diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
  index 69b7be3..095634e 100644
  --- a/drivers/pci/msi.c
  +++ b/drivers/pci/msi.c
  @@ -158,8 +158,9 @@ static u32 __msix_mask_irq(struct msi_desc *desc, u32
  flag)
  
  u32 mask_bits = desc-masked;
  unsigned offset = desc-msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
  
  PCI_MSIX_ENTRY_VECTOR_CTRL;
  
  -   mask_bits = ~1;
  -   mask_bits |= flag;
  +   mask_bits = ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
  +   if (flag)
  +   mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
  
  writel(mask_bits, desc-mask_base + offset);
  
  return mask_bits;
  
  diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
  index acfc224..ff51632 100644
  --- a/include/linux/pci_regs.h
  +++ b/include/linux/pci_regs.h
  @@ -313,6 +313,7 @@
  
   #define  PCI_MSIX_ENTRY_UPPER_ADDR 4
   #define  PCI_MSIX_ENTRY_DATA   8
   #define  PCI_MSIX_ENTRY_VECTOR_CTRL12
  
  +#define   PCI_MSIX_ENTRY_CTRL_MASKBIT  1
  
   /* CompactPCI Hotswap Register */
 
 Applied 1/7 and 2/7 to my linux-next tree, thanks.
 
 If it's easier to push them both through the kvm tree let me know; you
 can just add my acked-by in that case.

Thanks Jesse!

Avi, which way do you prefer?

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   9   10   >