Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Al Viro
On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
 + /* We re-use eventfd for irqfd */
 + fd = sys_eventfd2(0, 0);
 + if (fd  0) {
 + ret = fd;
 + goto fail;
 + }
 +
 + /* We maintain a reference to eventfd for the irqfd lifetime */
 + file = eventfd_fget(fd);
 + if (IS_ERR(file)) {
 + ret = PTR_ERR(file);
 + goto fail;
 + }
 +
 + irqfd-file = file;

This is just plain wrong.  You have no promise whatsoever that caller of
that sucker won't race with e.g. dup2().  IOW, you can't assume that
file will be of the expected kind.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Avi Kivity

Michael S. Tsirkin wrote:

On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
  

This allows an eventfd to be registered as an irq source with a guest.  Any
signaling operation on the eventfd (via userspace or kernel) will inject
the registered GSI at the next available window.

Signed-off-by: Gregory Haskins ghask...@novell.com



If we ever want to use this with e.g. MSI-X emulation in guest, and want
to be stricly compliant to MSI-X, we'll need a way for guest to mask
interrupts, and for host to report that a masked interrupt is pending.
Ideally, all this will be doable with a couple of mmapped pages to avoid
vmexits/system calls.

  


We could do this in two ways:

- move msix entry emulation into the kernel
- require the device to support replacing its irqfd, and juggle it like so:
  - guest disables msi
  - replace device model fd with eventfd belonging to us
  - when the device fires its eventfd, set the irq pending bit
  - guest enables msi
   - if the pending bit is set, fire the interrupt?
   - replace device model fd with the real irqfd

I'm leaning towards the latter, though it's not an easy call.


+static void
+irqfd_inject(struct work_struct *work)
+{
+   struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
+   struct kvm *kvm = irqfd-kvm;
+
+   mutex_lock(kvm-lock);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
+   mutex_unlock(kvm-lock);



This will do weird stuff (deliver the irq twice) if the irq is
MSI/MSI-X. I know this was discussed already and is a temporary
shortcut, but maybe add a comment that we really want kvm_toggle_irq,
so that we won't forget?
  


If so, that's a bug.  MSI should ignore kvm_set_irq(..., 0).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Davide Libenzi
On Sun, 3 May 2009, Al Viro wrote:

 On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
  +   /* We re-use eventfd for irqfd */
  +   fd = sys_eventfd2(0, 0);
  +   if (fd  0) {
  +   ret = fd;
  +   goto fail;
  +   }
  +
  +   /* We maintain a reference to eventfd for the irqfd lifetime */
  +   file = eventfd_fget(fd);
  +   if (IS_ERR(file)) {
  +   ret = PTR_ERR(file);
  +   goto fail;
  +   }
  +
  +   irqfd-file = file;
 
 This is just plain wrong.  You have no promise whatsoever that caller of
 that sucker won't race with e.g. dup2().  IOW, you can't assume that
 file will be of the expected kind.

The eventfd_fget() checks for the file_operations pointer, before 
returning the file*, and fails if the fd in not an eventfd. Or you have 
other concerns?



- Davide


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Al Viro
On Sun, May 03, 2009 at 11:07:26AM -0700, Davide Libenzi wrote:
 On Sun, 3 May 2009, Al Viro wrote:
  On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
   + /* We re-use eventfd for irqfd */
   + fd = sys_eventfd2(0, 0);
   + if (fd  0) {
   + ret = fd;
   + goto fail;
   + }
   +
   + /* We maintain a reference to eventfd for the irqfd lifetime */
   + file = eventfd_fget(fd);
   + if (IS_ERR(file)) {
   + ret = PTR_ERR(file);
   + goto fail;
   + }
   +
   + irqfd-file = file;
  
  This is just plain wrong.  You have no promise whatsoever that caller of
  that sucker won't race with e.g. dup2().  IOW, you can't assume that
  file will be of the expected kind.
 
 The eventfd_fget() checks for the file_operations pointer, before 
 returning the file*, and fails if the fd in not an eventfd. Or you have 
 other concerns?

OK, but... it's still wrong.  Descriptor numbers are purely for interaction
with userland; using them that way violates very general race-prevention
rules, even if you do paper over the worst of consequences with check in
eventfd_fget().

General rules:
* descriptor you've generated is fit only for return to userland;
* descriptor you've got from userland is fit only for *single*
fget() or equivalent, unless you are one of the core syscalls manipulating
the descriptor table itself (dup2, etc.)
* once file is installed in descriptor table, you'd better be past
the last failure exit; sys_close() on cleanup path is not acceptable.
That's what reserving descriptors is for.

IOW, the sane solution would be to export something that returns your
struct file *.  And stop playing with fd completely.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Michael S. Tsirkin
On Sun, May 03, 2009 at 07:59:40PM +0300, Avi Kivity wrote:
 Michael S. Tsirkin wrote:
 On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
   
 This allows an eventfd to be registered as an irq source with a guest.  Any
 signaling operation on the eventfd (via userspace or kernel) will inject
 the registered GSI at the next available window.

 Signed-off-by: Gregory Haskins ghask...@novell.com
 

 If we ever want to use this with e.g. MSI-X emulation in guest, and want
 to be stricly compliant to MSI-X, we'll need a way for guest to mask
 interrupts, and for host to report that a masked interrupt is pending.
 Ideally, all this will be doable with a couple of mmapped pages to avoid
 vmexits/system calls.

   

 We could do this in two ways:

 - move msix entry emulation into the kernel

It's not too bad IMO: MSIX is just a table with a list
of vectors, you check the mask bit on each interrupt,
if masked mark vector pending and poll until unmasked.

 - require the device to support replacing its irqfd, and juggle it like so:
   - guest disables msi
   - replace device model fd with eventfd belonging to us
   - when the device fires its eventfd, set the irq pending bit
   - guest enables msi
- if the pending bit is set, fire the interrupt?
- replace device model fd with the real irqfd

Looks like a lot of code. No?

 I'm leaning towards the latter, though it's not an easy call.

Actually there's a third option: add KVM_MASK_IRQ, KVM_UNMASK_IRQ ioctls
which will block/unblock guest from getting interrupt on this irq,
whatever the source.  Interrupts are queued in kernel while masked. A
third ioctl KVM_PENDING_IRQS will return the status for a set if IRQs.
qemu would call these ioctls when guest edits the MSIX vector control or
reads the pending bit array.

 +static void
 +irqfd_inject(struct work_struct *work)
 +{
 +   struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
 +   struct kvm *kvm = irqfd-kvm;
 +
 +   mutex_lock(kvm-lock);
 +   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
 +   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
 +   mutex_unlock(kvm-lock);
 

 This will do weird stuff (deliver the irq twice) if the irq is
 MSI/MSI-X. I know this was discussed already and is a temporary
 shortcut, but maybe add a comment that we really want kvm_toggle_irq,
 so that we won't forget?
   

 If so, that's a bug.  MSI should ignore kvm_set_irq(..., 0).

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Avi Kivity

Michael S. Tsirkin wrote:

On Sun, May 03, 2009 at 07:59:40PM +0300, Avi Kivity wrote:
  

Michael S. Tsirkin wrote:


On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
  
  

This allows an eventfd to be registered as an irq source with a guest.  Any
signaling operation on the eventfd (via userspace or kernel) will inject
the registered GSI at the next available window.

Signed-off-by: Gregory Haskins ghask...@novell.com



If we ever want to use this with e.g. MSI-X emulation in guest, and want
to be stricly compliant to MSI-X, we'll need a way for guest to mask
interrupts, and for host to report that a masked interrupt is pending.
Ideally, all this will be doable with a couple of mmapped pages to avoid
vmexits/system calls.

  
  

We could do this in two ways:

- move msix entry emulation into the kernel



It's not too bad IMO: MSIX is just a table with a list
of vectors, you check the mask bit on each interrupt,
if masked mark vector pending and poll until unmasked.
  


Right, but it's more and more core, and more and more bugs.  Bugs in the 
kernel are more expensive to fix and get to users.


  

- require the device to support replacing its irqfd, and juggle it like so:
  - guest disables msi
  - replace device model fd with eventfd belonging to us
  - when the device fires its eventfd, set the irq pending bit
  - guest enables msi
   - if the pending bit is set, fire the interrupt?
   - replace device model fd with the real irqfd



Looks like a lot of code. No?
  


We'll need exactly the same code if we do it in the kernel.  The only 
addition is replacing the fd.



I'm leaning towards the latter, though it's not an easy call.



Actually there's a third option: add KVM_MASK_IRQ, KVM_UNMASK_IRQ ioctls
which will block/unblock guest from getting interrupt on this irq,
whatever the source.  Interrupts are queued in kernel while masked. A
third ioctl KVM_PENDING_IRQS will return the status for a set if IRQs.
qemu would call these ioctls when guest edits the MSIX vector control or
reads the pending bit array.
  


I think this is the best option.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Michael S. Tsirkin
On Sun, May 03, 2009 at 10:17:16PM +0300, Avi Kivity wrote:
 Actually there's a third option: add KVM_MASK_IRQ, KVM_UNMASK_IRQ ioctls
 which will block/unblock guest from getting interrupt on this irq,
 whatever the source.  Interrupts are queued in kernel while masked. A
 third ioctl KVM_PENDING_IRQS will return the status for a set if IRQs.
 qemu would call these ioctls when guest edits the MSIX vector control or
 reads the pending bit array.
   

 I think this is the best option.

Sounds good.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Davide Libenzi
On Sun, 3 May 2009, Al Viro wrote:

 IOW, the sane solution would be to export something that returns your
 struct file *.  And stop playing with fd completely.

This builds but it's not tested at all.

- Make all the work of the old anon_inode_getfd(), done by a new 
  anon_inode_getfile(), with anon_inode_getfd() using its services

- Make all the work done by sys_eventfd(), done by a new 
  eventfd_file_create() (which in turn uses anon_inode_getfile()), with 
  sys_eventfd() using its services

IRQfd can use eventfd_file_create(), fget(), get_unused_fd_flags() and 
fd_install() just before returning.
Is that what you had in mind?



- Davide


---
 fs/anon_inodes.c|   68 +---
 fs/eventfd.c|   44 +---
 include/linux/anon_inodes.h |3 +
 include/linux/eventfd.h |6 +++
 4 files changed, 92 insertions(+), 29 deletions(-)

Index: linux-2.6.mod/fs/anon_inodes.c
===
--- linux-2.6.mod.orig/fs/anon_inodes.c 2009-05-03 12:21:09.0 -0700
+++ linux-2.6.mod/fs/anon_inodes.c  2009-05-03 12:54:02.0 -0700
@@ -64,28 +64,24 @@ static const struct dentry_operations an
  *
  * Creates a new file by hooking it on a single inode. This is useful for files
  * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfd() will share a single inode,
+ * All the files created with anon_inode_getfile() will share a single inode,
  * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns new descriptor or -error.
+ * setup.  Returns the newly created file* or error.
  */
-int anon_inode_getfd(const char *name, const struct file_operations *fops,
-void *priv, int flags)
+struct file *anon_inode_getfile(const char *name,
+   const struct file_operations *fops,
+   void *priv, int flags)
 {
struct qstr this;
struct dentry *dentry;
struct file *file;
-   int error, fd;
+   int error;
 
if (IS_ERR(anon_inode_inode))
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
if (fops-owner  !try_module_get(fops-owner))
-   return -ENOENT;
-
-   error = get_unused_fd_flags(flags);
-   if (error  0)
-   goto err_module;
-   fd = error;
+   return ERR_PTR(-ENOENT);
 
/*
 * Link the inode to a directory entry by creating a unique name
@@ -97,7 +93,7 @@ int anon_inode_getfd(const char *name, c
this.hash = 0;
dentry = d_alloc(anon_inode_mnt-mnt_sb-s_root, this);
if (!dentry)
-   goto err_put_unused_fd;
+   goto err_module;
 
/*
 * We know the anon_inode inode count is always greater than zero,
@@ -123,16 +119,54 @@ int anon_inode_getfd(const char *name, c
file-f_version = 0;
file-private_data = priv;
 
+   return file;
+
+err_dput:
+   dput(dentry);
+err_module:
+   module_put(fops-owner);
+   return ERR_PTR(error);
+}
+EXPORT_SYMBOL_GPL(anon_inode_getfile);
+
+/**
+ * anon_inode_getfd - creates a new file instance by hooking it up to an
+ *anonymous inode, and a dentry that describe the class
+ *of the file
+ *
+ * @name:[in]name of the class of the new file
+ * @fops:[in]file operations for the new file
+ * @priv:[in]private data for the new file (will be file's 
private_data)
+ * @flags:   [in]flags
+ *
+ * Creates a new file by hooking it on a single inode. This is useful for files
+ * that do not need to have a full-fledged inode in order to operate correctly.
+ * All the files created with anon_inode_getfd() will share a single inode,
+ * hence saving memory and avoiding code duplication for the file/inode/dentry
+ * setup.  Returns new descriptor or -error.
+ */
+int anon_inode_getfd(const char *name, const struct file_operations *fops,
+void *priv, int flags)
+{
+   int error, fd;
+   struct file *file;
+
+   error = get_unused_fd_flags(flags);
+   if (error  0)
+   return error;
+   fd = error;
+
+   file = anon_inode_getfile(name, fops, priv, flags);
+   if (IS_ERR(file)) {
+   error = PTR_ERR(file);
+   goto err_put_unused_fd;
+   }
fd_install(fd, file);
 
return fd;
 
-err_dput:
-   dput(dentry);
 err_put_unused_fd:
put_unused_fd(fd);
-err_module:
-   module_put(fops-owner);
return error;
 }
 EXPORT_SYMBOL_GPL(anon_inode_getfd);
Index: linux-2.6.mod/fs/eventfd.c
===
--- linux-2.6.mod.orig/fs/eventfd.c 2009-05-03 12:21:09.0 -0700
+++ linux-2.6.mod/fs/eventfd.c  2009-05-03 

Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-04-30 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
 This allows an eventfd to be registered as an irq source with a guest.  Any
 signaling operation on the eventfd (via userspace or kernel) will inject
 the registered GSI at the next available window.
 
 Signed-off-by: Gregory Haskins ghask...@novell.com

If we ever want to use this with e.g. MSI-X emulation in guest, and want
to be stricly compliant to MSI-X, we'll need a way for guest to mask
interrupts, and for host to report that a masked interrupt is pending.
Ideally, all this will be doable with a couple of mmapped pages to avoid
vmexits/system calls.

 +static void
 +irqfd_inject(struct work_struct *work)
 +{
 + struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
 + struct kvm *kvm = irqfd-kvm;
 +
 + mutex_lock(kvm-lock);
 + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
 + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
 + mutex_unlock(kvm-lock);

This will do weird stuff (deliver the irq twice) if the irq is
MSI/MSI-X. I know this was discussed already and is a temporary
shortcut, but maybe add a comment that we really want kvm_toggle_irq,
so that we won't forget?

 +}
 +

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-04-27 Thread Gregory Haskins
This allows an eventfd to be registered as an irq source with a guest.  Any
signaling operation on the eventfd (via userspace or kernel) will inject
the registered GSI at the next available window.

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 arch/x86/kvm/Makefile|2 -
 arch/x86/kvm/x86.c   |1 
 include/linux/kvm.h  |7 ++
 include/linux/kvm_host.h |4 +
 virt/kvm/irqfd.c |  158 ++
 virt/kvm/kvm_main.c  |   11 +++
 6 files changed, 182 insertions(+), 1 deletions(-)
 create mode 100644 virt/kvm/irqfd.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index b43c4ef..d5fff51 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -3,7 +3,7 @@
 #
 
 common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
-coalesced_mmio.o irq_comm.o)
+coalesced_mmio.o irq_comm.o irqfd.o)
 ifeq ($(CONFIG_KVM_TRACE),y)
 common-objs += $(addprefix ../../../virt/kvm/, kvm_trace.o)
 endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ab1fdac..0773433 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1027,6 +1027,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_REINJECT_CONTROL:
case KVM_CAP_IRQ_INJECT_STATUS:
case KVM_CAP_ASSIGN_DEV_IRQ:
+   case KVM_CAP_IRQFD:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 3db5d8d..0e594f2 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -415,6 +415,7 @@ struct kvm_trace_rec {
 #define KVM_CAP_ASSIGN_DEV_IRQ 29
 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */
 #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30
+#define KVM_CAP_IRQFD 31
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -454,6 +455,11 @@ struct kvm_irq_routing {
 
 #endif
 
+struct kvm_irqfd {
+   __u32 gsi;
+   __u32 flags;
+};
+
 /*
  * ioctls for VM fds
  */
@@ -498,6 +504,7 @@ struct kvm_irq_routing {
 #define KVM_ASSIGN_SET_MSIX_ENTRY \
_IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry)
 #define KVM_DEASSIGN_DEV_IRQ   _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
+#define KVM_IRQFD   _IOW(KVMIO, 0x76, struct kvm_irqfd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 095ebb6..6a8d1c1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -134,6 +134,7 @@ struct kvm {
struct list_head vm_list;
struct kvm_io_bus mmio_bus;
struct kvm_io_bus pio_bus;
+   struct list_head irqfds;
struct kvm_vm_stat stat;
struct kvm_arch arch;
atomic_t users_count;
@@ -524,4 +525,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
 
 #endif
 
+int kvm_irqfd(struct kvm *kvm, int gsi, int flags);
+void kvm_irqfd_release(struct kvm *kvm);
+
 #endif
diff --git a/virt/kvm/irqfd.c b/virt/kvm/irqfd.c
new file mode 100644
index 000..4fd2bea
--- /dev/null
+++ b/virt/kvm/irqfd.c
@@ -0,0 +1,158 @@
+/*
+ * irqfd: Allows an eventfd to be used to inject an interrupt to the guest
+ *
+ * Credit goes to Avi Kivity for the original idea.
+ *
+ * Copyright 2009 Novell.  All Rights Reserved.
+ *
+ * Author:
+ * Gregory Haskins ghask...@novell.com
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include linux/kvm_host.h
+#include linux/eventfd.h
+#include linux/workqueue.h
+#include linux/syscalls.h
+#include linux/wait.h
+#include linux/poll.h
+#include linux/file.h
+#include linux/list.h
+
+struct _irqfd {
+   struct kvm   *kvm;
+   int   gsi;
+   struct file  *file;
+   struct list_head  list;
+   poll_tablept;
+   wait_queue_head_t*wqh;
+   wait_queue_t  wait;
+   struct work_structwork;
+};
+
+static void
+irqfd_inject(struct work_struct *work)
+{
+   struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
+   struct kvm *kvm = irqfd-kvm;
+
+   mutex_lock(kvm-lock);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
+   mutex_unlock(kvm-lock);
+}
+
+static int
+irqfd_wakeup(wait_queue_t *wait, unsigned mode, int