Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-05 Thread Gregory Haskins
Hi Rusty,

Rusty Russell wrote:
 On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote:
   
 Avi Kivity wrote:
 
 Gregory Haskins wrote:
 One idea is similar to signalfd() or eventfd()
   
 And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born.
 ;)
 

 The lguest patch queue already has such an interface :)

Cool!  Ultimately I think it will be easier if both lguest+kvm support
the same eventfd notion so this is good you are already moving in the
same direction.

 And I have a partially complete in-kernel virtio_pci patch with the same 
 trick.
   

I thought lguest didn't use pci?  Or do you just mean that you have an
in-kernel virtio-net for lguest?

As a follow up question, I wonder if we can easily port that to vbus so
that it will work in both lguest and kvm? (note to self: push a skeleton
example today)

 I switched from kernel created eventfd to userspace passes in eventfd
 after a while though; it lets you connect multiple virtqueues to a single fd
 if you want.
   

Yeah, actually we switched that that model, too.  Aside from the
limitation you point out, there were some problems that Al Viro had
raised trying to do it in kernel w.r.t. fd abuse.

 Combined with a minor change to allow any process with access to the lguest fd
 to queue interrupts, this allowed lguest to move to a thread-per-virtqueue
 model which was a significant speedup as well as nice code reduction.
   

Yep, that was one of my findings on venet as well so I was looking
forward to trying to get virtio-net to do the same.
 Here's the relevant kernel patch for reading.
   

Thanks Rusty!  Will take a look.
 Thanks!
 Rusty.

 lguest: use eventfds for device notification

 Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
 an address: the main Launcher process returns with this address, and figures
 out what device to run.

 A far nicer model is to let processes bind an eventfd to an address: if we
 find one, we simply signal the eventfd.

 Signed-off-by: Rusty Russell ru...@rustcorp.com.au
 Cc: Davide Libenzi davi...@xmailserver.org
 ---
  drivers/lguest/Kconfig  |2 -
  drivers/lguest/core.c   |8 ++--
  drivers/lguest/lg.h |9 
  drivers/lguest/lguest_user.c|   73 
 
  include/linux/lguest_launcher.h |1 
  5 files changed, 89 insertions(+), 4 deletions(-)

 diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig
 --- a/drivers/lguest/Kconfig
 +++ b/drivers/lguest/Kconfig
 @@ -1,6 +1,6 @@
  config LGUEST
   tristate Linux hypervisor example code
 - depends on X86_32  EXPERIMENTAL  !X86_PAE  FUTEX
 + depends on X86_32  EXPERIMENTAL  !X86_PAE  EVENTFD
   

Note to self:  we probably need a similar line in KVM now.

   select HVC_DRIVER
   ---help---
 This is a very simple module which allows you to run
 diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
 --- a/drivers/lguest/core.c
 +++ b/drivers/lguest/core.c
 @@ -198,9 +198,11 @@ int run_guest(struct lg_cpu *cpu, unsign
   /* It's possible the Guest did a NOTIFY hypercall to the
* Launcher, in which case we return from the read() now. */
   if (cpu-pending_notify) {
 - if (put_user(cpu-pending_notify, user))
 - return -EFAULT;
 - return sizeof(cpu-pending_notify);
 + if (!send_notify_to_eventfd(cpu)) {
 + if (put_user(cpu-pending_notify, user))
 + return -EFAULT;
 + return sizeof(cpu-pending_notify);
 + }
   }
  
   /* Check for signals */
 diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
 --- a/drivers/lguest/lg.h
 +++ b/drivers/lguest/lg.h
 @@ -82,6 +82,11 @@ struct lg_cpu {
   struct lg_cpu_arch arch;
  };
  
 +struct lg_eventfds {
 + unsigned long addr;
 + struct file *event;
 +};
 +
  /* The private info the thread maintains about the guest. */
  struct lguest
  {
 @@ -102,6 +107,9 @@ struct lguest
   unsigned int stack_pages;
   u32 tsc_khz;
  
 + unsigned int num_eventfds;
 + struct lg_eventfds *eventfds;
 +
   /* Dead? */
   const char *dead;
  };
 @@ -152,6 +160,7 @@ void setup_default_idt_entries(struct lg
  void copy_traps(const struct lg_cpu *cpu, struct desc_struct *idt,
   const unsigned long *def);
  void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta);
 +bool send_notify_to_eventfd(struct lg_cpu *cpu);
  void init_clockdev(struct lg_cpu *cpu);
  bool check_syscall_vector(struct lguest *lg);
  int init_interrupts(void);
 diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
 --- a/drivers/lguest/lguest_user.c
 +++ b/drivers/lguest/lguest_user.c
 @@ -7,6 +7,8 @@
  #include linux/miscdevice.h
  #include linux/fs.h
  #include linux/sched.h
 

Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-05 Thread Avi Kivity

Gregory Haskins wrote:

@@ -1,6 +1,6 @@
 config LGUEST
tristate Linux hypervisor example code
-   depends on X86_32  EXPERIMENTAL  !X86_PAE  FUTEX
+   depends on X86_32  EXPERIMENTAL  !X86_PAE  EVENTFD
  



Note to self:  we probably need a similar line in KVM now.

  


'select EVENTFD' is more appropriate.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-05 Thread Rusty Russell
On Fri, 5 Jun 2009 09:26:48 pm Gregory Haskins wrote:
 Hi Rusty,

 Rusty Russell wrote:
  On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote:
  Avi Kivity wrote:
  Gregory Haskins wrote:
  One idea is similar to signalfd() or eventfd()
 
  And thus the kvm-eventfd (irqfd/iosignalfd) interface project was
  born. ;)
 
  The lguest patch queue already has such an interface :)

 Cool!  Ultimately I think it will be easier if both lguest+kvm support
 the same eventfd notion so this is good you are already moving in the
 same direction.

Not really; lguest doesn't do PCI.

  And I have a partially complete in-kernel virtio_pci patch with the same
  trick.

 I thought lguest didn't use pci?  Or do you just mean that you have an
 in-kernel virtio-net for lguest?

No, this was for kvm.  Sorry for the confusion.

 Other than the potential rcu issues that Paul already addressed, looks
 good.  FWIW: this looks like what we are calling iosignalfd on the kvm
 land (unless I am misunderstanding).  Do you have the equivalent of
 irqfd going the other way?

Yes; lguest uses write() (offset indicates cpu #) rather than ioctls, but 
anyone can do the LHREQ_IRQ write to queue an interrupt for delivery.

So the threads just get the same /dev/lguest fd and it's simple.

Thanks!
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-05 Thread Gregory Haskins
Rusty Russell wrote:
 On Fri, 5 Jun 2009 09:26:48 pm Gregory Haskins wrote:
   
 Hi Rusty,

 Rusty Russell wrote:
 
 On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote:
   
 Avi Kivity wrote:
 
 Gregory Haskins wrote:
 One idea is similar to signalfd() or eventfd()
   
 And thus the kvm-eventfd (irqfd/iosignalfd) interface project was
 born. ;)
 
 The lguest patch queue already has such an interface :)
   
 Cool!  Ultimately I think it will be easier if both lguest+kvm support
 the same eventfd notion so this is good you are already moving in the
 same direction.
 

 Not really; lguest doesn't do PCI.
   

Thats ok.  I see these eventfd interfaces as somewhat orthogonal to
PCI.  I.e. if both lguest and kvm have an eventfd mechnism for signaling
in both directions (e.g. interrupts and io), it would make it easier to
support the kind of thing I am striving for with a unified backend. 
That is: one in-kernel virtio-net that works in both (or even many) HV
environments.  I see that as a higher layer abstraction than PCI, per se.
   
 And I have a partially complete in-kernel virtio_pci patch with the same
 trick.
   
 I thought lguest didn't use pci?  Or do you just mean that you have an
 in-kernel virtio-net for lguest?
 

 No, this was for kvm.  Sorry for the confusion.
   

Ah, sorry.  Well, if its in any kind of shape to see the light of day,
please forward it over.  Perhaps Michael and I can craft it into a
working solution.

   
 Other than the potential rcu issues that Paul already addressed, looks
 good.  FWIW: this looks like what we are calling iosignalfd on the kvm
 land (unless I am misunderstanding).  Do you have the equivalent of
 irqfd going the other way?
 

 Yes; lguest uses write() (offset indicates cpu #) rather than ioctls, but 
 anyone can do the LHREQ_IRQ write to queue an interrupt for delivery.

 So the threads just get the same /dev/lguest fd and it's simple.
   

Ah, ok.  Thats workable, too.  (This kind of detail would be buried in
the lguest connector for vbus anyway, so it doesn't have to have a
uniform eventfd_signal() interface to work.  The fd concept alone is
sufficiently flexible).

Thanks Rusty,
-Greg



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-05 Thread Rusty Russell
On Fri, 5 Jun 2009 03:00:10 pm Paul E. McKenney wrote:
 On Fri, Jun 05, 2009 at 02:25:01PM +0930, Rusty Russell wrote:
  +   /* lg-eventfds is RCU-protected */
  +   preempt_disable();

 Suggest changing to rcu_read_lock() to match the synchronize_rcu().

Ah yes, much better.  As I was implementing it I warred with myself since
lguest aims for simplicity above all else.  But since we only ever add things
to the array, RCU probably is simpler.

  +   for (i = 0; i  cpu-lg-num_eventfds; i++) {
  +   if (cpu-lg-eventfds[i].addr == cpu-pending_notify) {
  +   eventfd_signal(cpu-lg-eventfds[i].event, 1);

 Shouldn't this be something like the following?

   p = rcu_dereference(cpu-lg-eventfds);
   if (p[i].addr == cpu-pending_notify) {
   eventfd_signal(p[i].event, 1);

Hmm, need to read num_eventfds first, too.  It doesn't matter if we get the old
-num_eventfds and the new -eventfds, but the other way around would be bad.

Here's the inter-diff:

diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -39,18 +39,24 @@ static int break_guest_out(struct lg_cpu
 
 bool send_notify_to_eventfd(struct lg_cpu *cpu)
 {
-   unsigned int i;
+   unsigned int i, num;
+   struct lg_eventfds *eventfds;
+
+   /* Make sure we grab the total number before accessing the array. */
+   cpu-lg-num_eventfds = num;
+   rmb();
 
/* lg-eventfds is RCU-protected */
rcu_read_lock();
-   for (i = 0; i  cpu-lg-num_eventfds; i++) {
-   if (cpu-lg-eventfds[i].addr == cpu-pending_notify) {
-   eventfd_signal(cpu-lg-eventfds[i].event, 1);
+   eventfds = rcu_dereference(cpu-lg-eventfds);
+   for (i = 0; i  num; i++) {
+   if (eventfds[i].addr == cpu-pending_notify) {
+   eventfd_signal(eventfds[i].event, 1);
cpu-pending_notify = 0;
break;
}
}
-   preempt_enable();
+   rcu_read_unlock();
return cpu-pending_notify == 0;
 }
 
Thanks!
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-05 Thread Paul E. McKenney
On Sat, Jun 06, 2009 at 12:25:57AM +0930, Rusty Russell wrote:
 On Fri, 5 Jun 2009 03:00:10 pm Paul E. McKenney wrote:
  On Fri, Jun 05, 2009 at 02:25:01PM +0930, Rusty Russell wrote:
   + /* lg-eventfds is RCU-protected */
   + preempt_disable();
 
  Suggest changing to rcu_read_lock() to match the synchronize_rcu().
 
 Ah yes, much better.  As I was implementing it I warred with myself since
 lguest aims for simplicity above all else.  But since we only ever add things
 to the array, RCU probably is simpler.

;-)

   + for (i = 0; i  cpu-lg-num_eventfds; i++) {
   + if (cpu-lg-eventfds[i].addr == cpu-pending_notify) {
   + eventfd_signal(cpu-lg-eventfds[i].event, 1);
 
  Shouldn't this be something like the following?
 
  p = rcu_dereference(cpu-lg-eventfds);
  if (p[i].addr == cpu-pending_notify) {
  eventfd_signal(p[i].event, 1);
 
 Hmm, need to read num_eventfds first, too.  It doesn't matter if we get the 
 old
 -num_eventfds and the new -eventfds, but the other way around would be bad.

Yep!!!  ;-)

 Here's the inter-diff:
 
 diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
 --- a/drivers/lguest/lguest_user.c
 +++ b/drivers/lguest/lguest_user.c
 @@ -39,18 +39,24 @@ static int break_guest_out(struct lg_cpu
 
  bool send_notify_to_eventfd(struct lg_cpu *cpu)
  {
 - unsigned int i;
 + unsigned int i, num;
 + struct lg_eventfds *eventfds;
 +
 + /* Make sure we grab the total number before accessing the array. */
 + cpu-lg-num_eventfds = num;
 + rmb();
 
   /* lg-eventfds is RCU-protected */
   rcu_read_lock();
 - for (i = 0; i  cpu-lg-num_eventfds; i++) {
 - if (cpu-lg-eventfds[i].addr == cpu-pending_notify) {
 - eventfd_signal(cpu-lg-eventfds[i].event, 1);
 + eventfds = rcu_dereference(cpu-lg-eventfds);
 + for (i = 0; i  num; i++) {
 + if (eventfds[i].addr == cpu-pending_notify) {
 + eventfd_signal(eventfds[i].event, 1);
   cpu-pending_notify = 0;
   break;
   }
   }
 - preempt_enable();
 + rcu_read_unlock();
   return cpu-pending_notify == 0;
  }

It is possible to get rid of the rmb() and wmb() as well, doing
something like the following:

struct lg_eventfds_num {
unsigned int n;
struct lg_eventfds a[0];
}

Then the rcu_dereference() gets you a pointer to a struct lg_eventfds_num,
which has the array and its length in guaranteed synchronization without
the need for barriers.

Does this work for you, or is there some complication that I am missing?

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-04 Thread Gregory Haskins
Avi Kivity wrote:
 Gregory Haskins wrote:
 Avi,

 Gregory Haskins wrote:
  
 Todo:
 *) Develop some kind of hypercall registration mechanism for KVM so
 that
we can use that as an integration point instead of directly hooking
kvm hypercalls
   

 What would you like to see here?  I now remember why I removed the
 original patch I had for registration...it requires some kind of
 discovery mechanism on its own.  Note that this is hard, but I figured
 it would make the overall series simpler if I didn't go this route and
 instead just integrated with a statically allocated vector.  That being
 said, I have no problem adding this back in but figure we should discuss
 the approach so I don't go down a rat-hole ;)

   


 One idea is similar to signalfd() or eventfd().  Provide a kvm ioctl
 that takes a gsi and returns an fd.  Writes to the fd change the state
 of the line, possible triggering an interrupt.  Another ioctl takes a
 hypercall number or pio port as well as an existing fd.  Invocations
 of the hypercall or writes to the port write to the fd (using the same
 protocol as eventfd), so the other end can respond.

 The nice thing is that this can be used by both kernel and userspace
 components, and for kernel components, hypercalls can be either
 buffered or unbuffered.


And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born. ;)

(Michael FYI: so I will be pushing a vbus-v4 series at some point in the
near future that is expressed in terms of irqfd/iosignalfd, per the
conversation above.  The patches in v3 and earlier are more intrusive to
the KVM core than they will be in final form)

-Greg




signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-04 Thread Rusty Russell
On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote:
 Avi Kivity wrote:
  Gregory Haskins wrote:
  One idea is similar to signalfd() or eventfd()

 And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born.
 ;)

The lguest patch queue already has such an interface :)  And I have a
partially complete in-kernel virtio_pci patch with the same trick.

I switched from kernel created eventfd to userspace passes in eventfd
after a while though; it lets you connect multiple virtqueues to a single fd
if you want.

Combined with a minor change to allow any process with access to the lguest fd
to queue interrupts, this allowed lguest to move to a thread-per-virtqueue
model which was a significant speedup as well as nice code reduction.

Here's the relevant kernel patch for reading.

Thanks!
Rusty.

lguest: use eventfds for device notification

Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
an address: the main Launcher process returns with this address, and figures
out what device to run.

A far nicer model is to let processes bind an eventfd to an address: if we
find one, we simply signal the eventfd.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Cc: Davide Libenzi davi...@xmailserver.org
---
 drivers/lguest/Kconfig  |2 -
 drivers/lguest/core.c   |8 ++--
 drivers/lguest/lg.h |9 
 drivers/lguest/lguest_user.c|   73 
 include/linux/lguest_launcher.h |1 
 5 files changed, 89 insertions(+), 4 deletions(-)

diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig
--- a/drivers/lguest/Kconfig
+++ b/drivers/lguest/Kconfig
@@ -1,6 +1,6 @@
 config LGUEST
tristate Linux hypervisor example code
-   depends on X86_32  EXPERIMENTAL  !X86_PAE  FUTEX
+   depends on X86_32  EXPERIMENTAL  !X86_PAE  EVENTFD
select HVC_DRIVER
---help---
  This is a very simple module which allows you to run
diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -198,9 +198,11 @@ int run_guest(struct lg_cpu *cpu, unsign
/* It's possible the Guest did a NOTIFY hypercall to the
 * Launcher, in which case we return from the read() now. */
if (cpu-pending_notify) {
-   if (put_user(cpu-pending_notify, user))
-   return -EFAULT;
-   return sizeof(cpu-pending_notify);
+   if (!send_notify_to_eventfd(cpu)) {
+   if (put_user(cpu-pending_notify, user))
+   return -EFAULT;
+   return sizeof(cpu-pending_notify);
+   }
}
 
/* Check for signals */
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -82,6 +82,11 @@ struct lg_cpu {
struct lg_cpu_arch arch;
 };
 
+struct lg_eventfds {
+   unsigned long addr;
+   struct file *event;
+};
+
 /* The private info the thread maintains about the guest. */
 struct lguest
 {
@@ -102,6 +107,9 @@ struct lguest
unsigned int stack_pages;
u32 tsc_khz;
 
+   unsigned int num_eventfds;
+   struct lg_eventfds *eventfds;
+
/* Dead? */
const char *dead;
 };
@@ -152,6 +160,7 @@ void setup_default_idt_entries(struct lg
 void copy_traps(const struct lg_cpu *cpu, struct desc_struct *idt,
const unsigned long *def);
 void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta);
+bool send_notify_to_eventfd(struct lg_cpu *cpu);
 void init_clockdev(struct lg_cpu *cpu);
 bool check_syscall_vector(struct lguest *lg);
 int init_interrupts(void);
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -7,6 +7,8 @@
 #include linux/miscdevice.h
 #include linux/fs.h
 #include linux/sched.h
+#include linux/eventfd.h
+#include linux/file.h
 #include lg.h
 
 /*L:055 When something happens, the Waker process needs a way to stop the
@@ -35,6 +37,70 @@ static int break_guest_out(struct lg_cpu
}
 }
 
+bool send_notify_to_eventfd(struct lg_cpu *cpu)
+{
+   unsigned int i;
+
+   /* lg-eventfds is RCU-protected */
+   preempt_disable();
+   for (i = 0; i  cpu-lg-num_eventfds; i++) {
+   if (cpu-lg-eventfds[i].addr == cpu-pending_notify) {
+   eventfd_signal(cpu-lg-eventfds[i].event, 1);
+   cpu-pending_notify = 0;
+   break;
+   }
+   }
+   preempt_enable();
+   return cpu-pending_notify == 0;
+}
+
+static int add_eventfd(struct lguest *lg, unsigned long addr, int fd)
+{
+   struct lg_eventfds *new, *old;
+
+   if (!addr)
+   return -EINVAL;
+
+   /* Replace the old array with the 

Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-04 Thread Paul E. McKenney
On Fri, Jun 05, 2009 at 02:25:01PM +0930, Rusty Russell wrote:
 On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote:
  Avi Kivity wrote:
   Gregory Haskins wrote:
   One idea is similar to signalfd() or eventfd()
 
  And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born.
  ;)
 
 The lguest patch queue already has such an interface :)  And I have a
 partially complete in-kernel virtio_pci patch with the same trick.
 
 I switched from kernel created eventfd to userspace passes in eventfd
 after a while though; it lets you connect multiple virtqueues to a single fd
 if you want.
 
 Combined with a minor change to allow any process with access to the lguest fd
 to queue interrupts, this allowed lguest to move to a thread-per-virtqueue
 model which was a significant speedup as well as nice code reduction.
 
 Here's the relevant kernel patch for reading.
 
 Thanks!
 Rusty.
 
 lguest: use eventfds for device notification
 
 Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
 an address: the main Launcher process returns with this address, and figures
 out what device to run.
 
 A far nicer model is to let processes bind an eventfd to an address: if we
 find one, we simply signal the eventfd.

A couple of (probably misguided) RCU questions/suggestions interspersed.

 Signed-off-by: Rusty Russell ru...@rustcorp.com.au
 Cc: Davide Libenzi davi...@xmailserver.org
 ---
  drivers/lguest/Kconfig  |2 -
  drivers/lguest/core.c   |8 ++--
  drivers/lguest/lg.h |9 
  drivers/lguest/lguest_user.c|   73 
 
  include/linux/lguest_launcher.h |1 
  5 files changed, 89 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig
 --- a/drivers/lguest/Kconfig
 +++ b/drivers/lguest/Kconfig
 @@ -1,6 +1,6 @@
  config LGUEST
   tristate Linux hypervisor example code
 - depends on X86_32  EXPERIMENTAL  !X86_PAE  FUTEX
 + depends on X86_32  EXPERIMENTAL  !X86_PAE  EVENTFD
   select HVC_DRIVER
   ---help---
 This is a very simple module which allows you to run
 diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
 --- a/drivers/lguest/core.c
 +++ b/drivers/lguest/core.c
 @@ -198,9 +198,11 @@ int run_guest(struct lg_cpu *cpu, unsign
   /* It's possible the Guest did a NOTIFY hypercall to the
* Launcher, in which case we return from the read() now. */
   if (cpu-pending_notify) {
 - if (put_user(cpu-pending_notify, user))
 - return -EFAULT;
 - return sizeof(cpu-pending_notify);
 + if (!send_notify_to_eventfd(cpu)) {
 + if (put_user(cpu-pending_notify, user))
 + return -EFAULT;
 + return sizeof(cpu-pending_notify);
 + }
   }
 
   /* Check for signals */
 diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
 --- a/drivers/lguest/lg.h
 +++ b/drivers/lguest/lg.h
 @@ -82,6 +82,11 @@ struct lg_cpu {
   struct lg_cpu_arch arch;
  };
 
 +struct lg_eventfds {
 + unsigned long addr;
 + struct file *event;
 +};
 +
  /* The private info the thread maintains about the guest. */
  struct lguest
  {
 @@ -102,6 +107,9 @@ struct lguest
   unsigned int stack_pages;
   u32 tsc_khz;
 
 + unsigned int num_eventfds;
 + struct lg_eventfds *eventfds;
 +
   /* Dead? */
   const char *dead;
  };
 @@ -152,6 +160,7 @@ void setup_default_idt_entries(struct lg
  void copy_traps(const struct lg_cpu *cpu, struct desc_struct *idt,
   const unsigned long *def);
  void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta);
 +bool send_notify_to_eventfd(struct lg_cpu *cpu);
  void init_clockdev(struct lg_cpu *cpu);
  bool check_syscall_vector(struct lguest *lg);
  int init_interrupts(void);
 diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
 --- a/drivers/lguest/lguest_user.c
 +++ b/drivers/lguest/lguest_user.c
 @@ -7,6 +7,8 @@
  #include linux/miscdevice.h
  #include linux/fs.h
  #include linux/sched.h
 +#include linux/eventfd.h
 +#include linux/file.h
  #include lg.h
 
  /*L:055 When something happens, the Waker process needs a way to stop the
 @@ -35,6 +37,70 @@ static int break_guest_out(struct lg_cpu
   }
  }
 
 +bool send_notify_to_eventfd(struct lg_cpu *cpu)
 +{
 + unsigned int i;
 +
 + /* lg-eventfds is RCU-protected */
 + preempt_disable();

Suggest changing to rcu_read_lock() to match the synchronize_rcu().

 + for (i = 0; i  cpu-lg-num_eventfds; i++) {
 + if (cpu-lg-eventfds[i].addr == cpu-pending_notify) {
 + eventfd_signal(cpu-lg-eventfds[i].event, 1);

Shouldn't this be something like the following?

p = rcu_dereference(cpu-lg-eventfds);
if 

Re: [RFC PATCH v2 00/19] virtual-bus

2009-04-11 Thread Avi Kivity

Gregory Haskins wrote:

Avi,

Gregory Haskins wrote:
  

Todo:
*) Develop some kind of hypercall registration mechanism for KVM so that
   we can use that as an integration point instead of directly hooking
   kvm hypercalls
  



What would you like to see here?  I now remember why I removed the
original patch I had for registration...it requires some kind of
discovery mechanism on its own.  Note that this is hard, but I figured
it would make the overall series simpler if I didn't go this route and
instead just integrated with a statically allocated vector.  That being
said, I have no problem adding this back in but figure we should discuss
the approach so I don't go down a rat-hole ;)

  



One idea is similar to signalfd() or eventfd().  Provide a kvm ioctl 
that takes a gsi and returns an fd.  Writes to the fd change the state 
of the line, possible triggering an interrupt.  Another ioctl takes a 
hypercall number or pio port as well as an existing fd.  Invocations of 
the hypercall or writes to the port write to the fd (using the same 
protocol as eventfd), so the other end can respond.


The nice thing is that this can be used by both kernel and userspace 
components, and for kernel components, hypercalls can be either buffered 
or unbuffered.



So, one thing we could do is use a string-identifier to discover
hypercall resources.  In this model, we would have one additional
hypercall registered with kvm (in addition to the mmu-ops, etc) called
KVM_HC_DYNHC or something like that.  The support for DYNHC could be
indicated in the cpuid (much like I do with the RESET, DYNIRQ, and VBUS
support today.   When hypercall provides register, the could provide a
string such as vbus, and they would be allocated a hypercall id. 
Likewise, the HC_DYNHC interface would allow a guest to query the cpuid

for the DYNHC feature, and then query the HC_DYNHC vector for a string
to hc# translation.  If the provider is not present, we return -1 for
the hc#, otherwise we return the one that was allocated.

I know how you feel about string-ids in general, but I am not quite sure
how to design this otherwise without it looking eerily similar to what I
already have (which is registering a new HC vector in kvm_para.h)
  


No need for a string ID.  Reserve a range of hypercall numbers for 
dynamic IDs.  Userspace allocates one and gives it to the device using 
its configuration space (as applies to whatever bus it is using).



--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 00/19] virtual-bus

2009-04-09 Thread Gregory Haskins
This is release v2.  Changes since v1:

*) Incorporated review feedback from Stephen Hemminger on vbus-enet driver
*) Added support for connecting to vbus devices from userspace
*) Added support for a virtio-vbus transport to allow virtio drivers to
   work with vbus (needs testing and backend models).

(Avi, I know I still owe you a reply re the PCI debate)

Todo:
*) Develop some kind of hypercall registration mechanism for KVM so that
   we can use that as an integration point instead of directly hooking
   kvm hypercalls
*) Beef up the userspace event channel ABI to support different event types
*) Add memory-registration support
*) Integrate with qemu PCI device model to render vbus objects as PCI
*) Develop some virtio backend devices.
*) Support ethtool_ops for venet.

---

RFC: Virtual-bus

applies to v2.6.29 (will port to git HEAD soon)

FIRST OFF: Let me state that this is _not_ a KVM or networking specific
technology.  Virtual-Bus is a mechanism for defining and deploying
software “devices” directly in a Linux kernel.  These devices are designed
to be directly accessed from a variety of environments in an arbitrarly nested
fashion.  The goal is provide for the potential for maxium IO performance by
providing the shortest and most efficient path to the bare metal kernel, and
thus the actual IO resources.  For instance, an application can be written to
run the same on baremetal as it does in guest userspace nested 10 levels deep,
all the while providing direct access to the resource, thus reducing latency
and boosting throughput.  A good way to think of this is perhaps like software
based SR-IOV that supports nesting of the pass-through.

Due to its design as an in-kernel resource, it also provides very strong
notions of protection and isolation so as to not introduce a security
compromise when compared to traditional/alternative models where such
guarantees are provided by something like userspace or hardware.

The example use-case we have provided supports a “virtual-ethernet” device
being utilized in a KVM guest environment, so comparisons to virtio-net will
be natural. However, please note that this is but one use-case, of many we have
planned for the future (such as userspace bypass and RT guest support).
The goal for right now is to describe what a virual-bus is and why we
believe it is useful.

We are intent to get this core technology merged, even if the networking
components are not accepted as is.  It should be noted that, in many ways,
virtio could be considered complimentary to the technology.  We could
in fact, have implemented the virtual-ethernet using a virtio-ring, but
it would have required ABI changes that we didn't want to yet propose
without having the concept in general vetted and accepted by the community.

[Update: this release includes a virtio-vbus transport, so virtio-net and
other such drivers can now run over vbus in addition to the venet system
provided]

To cut to the chase, we recently measured our virtual-ethernet on 
v2.6.29 on two 8-core x86_64 boxes with Chelsio T3 10GE connected back
to back via cross over.  We measured bare-metal performance, as well
as a kvm guest (running the same kernel) connected to the T3 via
a linux-bridge+tap configuration with a 1500 MTU.  The results are as
follows:

Bare metal: tput = 4078Mb/s, round-trip = 25593pps (39us rtt)
Virtio-net: tput = 4003Mb/s, round-trip = 320pps (3125us rtt)
Venet: tput = 4050Mb/s, round-trip = 15255 (65us rtt)

As you can see, all three technologies can achieve (MTU limited) line-rate,
but the virtio-net solution is severely limited on the latency front (by a
factor of 48:1)

Note that the 320pps is technically artificially low in virtio-net, caused by a
a known design limitation to use a timer for tx-mitigation.  However, note that
even when removing the timer from the path the best we could achieve was
350us-450us of latency, and doing so causes the tput to drop to 1300Mb/s.
So even in this case, I think the in-kernel results presents a compelling
argument for the new model presented.

[Update: Anthony Ligouri is working on this userspace implementation problem
currently and has obtained significant performance gains by utilizing some of
the techniques we use in this patch set as well.  More details to come.]

When we jump to 9000 byte MTU, the situation looks similar

Bare metal: tput = 9717Mb/s, round-trip = 30396pps (33us rtt)
Virtio-net: tput = 4578Mb/s, round-trip = 249pps (4016us rtt)
Venet: tput = 5802Mb/s, round-trip = 15127 (66us rtt)

Note that even the throughput was slightly better in this test for venet, though
neither venet nor virtio-net could achieve line-rate.  I suspect some tuning may
allow these numbers to improve, TBD.

So with that said, lets jump into the description:

Virtual-Bus: What is it?


Virtual-Bus is a kernel based IO resource container technology.  It is modeled
on a concept similar to the Linux Device-Model 

Re: [RFC PATCH v2 00/19] virtual-bus

2009-04-09 Thread Gregory Haskins
Avi,

Gregory Haskins wrote:

 Todo:
 *) Develop some kind of hypercall registration mechanism for KVM so that
we can use that as an integration point instead of directly hooking
kvm hypercalls
   

What would you like to see here?  I now remember why I removed the
original patch I had for registration...it requires some kind of
discovery mechanism on its own.  Note that this is hard, but I figured
it would make the overall series simpler if I didn't go this route and
instead just integrated with a statically allocated vector.  That being
said, I have no problem adding this back in but figure we should discuss
the approach so I don't go down a rat-hole ;)

So, one thing we could do is use a string-identifier to discover
hypercall resources.  In this model, we would have one additional
hypercall registered with kvm (in addition to the mmu-ops, etc) called
KVM_HC_DYNHC or something like that.  The support for DYNHC could be
indicated in the cpuid (much like I do with the RESET, DYNIRQ, and VBUS
support today.   When hypercall provides register, the could provide a
string such as vbus, and they would be allocated a hypercall id. 
Likewise, the HC_DYNHC interface would allow a guest to query the cpuid
for the DYNHC feature, and then query the HC_DYNHC vector for a string
to hc# translation.  If the provider is not present, we return -1 for
the hc#, otherwise we return the one that was allocated.

I know how you feel about string-ids in general, but I am not quite sure
how to design this otherwise without it looking eerily similar to what I
already have (which is registering a new HC vector in kvm_para.h)

Thoughts?

-Greg



signature.asc
Description: OpenPGP digital signature