Re: [RFC PATCH v2 00/19] virtual-bus
Hi Rusty, Rusty Russell wrote: On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote: Avi Kivity wrote: Gregory Haskins wrote: One idea is similar to signalfd() or eventfd() And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born. ;) The lguest patch queue already has such an interface :) Cool! Ultimately I think it will be easier if both lguest+kvm support the same eventfd notion so this is good you are already moving in the same direction. And I have a partially complete in-kernel virtio_pci patch with the same trick. I thought lguest didn't use pci? Or do you just mean that you have an in-kernel virtio-net for lguest? As a follow up question, I wonder if we can easily port that to vbus so that it will work in both lguest and kvm? (note to self: push a skeleton example today) I switched from kernel created eventfd to userspace passes in eventfd after a while though; it lets you connect multiple virtqueues to a single fd if you want. Yeah, actually we switched that that model, too. Aside from the limitation you point out, there were some problems that Al Viro had raised trying to do it in kernel w.r.t. fd abuse. Combined with a minor change to allow any process with access to the lguest fd to queue interrupts, this allowed lguest to move to a thread-per-virtqueue model which was a significant speedup as well as nice code reduction. Yep, that was one of my findings on venet as well so I was looking forward to trying to get virtio-net to do the same. Here's the relevant kernel patch for reading. Thanks Rusty! Will take a look. Thanks! Rusty. lguest: use eventfds for device notification Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with an address: the main Launcher process returns with this address, and figures out what device to run. A far nicer model is to let processes bind an eventfd to an address: if we find one, we simply signal the eventfd. Signed-off-by: Rusty Russell ru...@rustcorp.com.au Cc: Davide Libenzi davi...@xmailserver.org --- drivers/lguest/Kconfig |2 - drivers/lguest/core.c |8 ++-- drivers/lguest/lg.h |9 drivers/lguest/lguest_user.c| 73 include/linux/lguest_launcher.h |1 5 files changed, 89 insertions(+), 4 deletions(-) diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig --- a/drivers/lguest/Kconfig +++ b/drivers/lguest/Kconfig @@ -1,6 +1,6 @@ config LGUEST tristate Linux hypervisor example code - depends on X86_32 EXPERIMENTAL !X86_PAE FUTEX + depends on X86_32 EXPERIMENTAL !X86_PAE EVENTFD Note to self: we probably need a similar line in KVM now. select HVC_DRIVER ---help--- This is a very simple module which allows you to run diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c --- a/drivers/lguest/core.c +++ b/drivers/lguest/core.c @@ -198,9 +198,11 @@ int run_guest(struct lg_cpu *cpu, unsign /* It's possible the Guest did a NOTIFY hypercall to the * Launcher, in which case we return from the read() now. */ if (cpu-pending_notify) { - if (put_user(cpu-pending_notify, user)) - return -EFAULT; - return sizeof(cpu-pending_notify); + if (!send_notify_to_eventfd(cpu)) { + if (put_user(cpu-pending_notify, user)) + return -EFAULT; + return sizeof(cpu-pending_notify); + } } /* Check for signals */ diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -82,6 +82,11 @@ struct lg_cpu { struct lg_cpu_arch arch; }; +struct lg_eventfds { + unsigned long addr; + struct file *event; +}; + /* The private info the thread maintains about the guest. */ struct lguest { @@ -102,6 +107,9 @@ struct lguest unsigned int stack_pages; u32 tsc_khz; + unsigned int num_eventfds; + struct lg_eventfds *eventfds; + /* Dead? */ const char *dead; }; @@ -152,6 +160,7 @@ void setup_default_idt_entries(struct lg void copy_traps(const struct lg_cpu *cpu, struct desc_struct *idt, const unsigned long *def); void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta); +bool send_notify_to_eventfd(struct lg_cpu *cpu); void init_clockdev(struct lg_cpu *cpu); bool check_syscall_vector(struct lguest *lg); int init_interrupts(void); diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -7,6 +7,8 @@ #include linux/miscdevice.h #include linux/fs.h #include linux/sched.h
Re: [RFC PATCH v2 00/19] virtual-bus
Gregory Haskins wrote: @@ -1,6 +1,6 @@ config LGUEST tristate Linux hypervisor example code - depends on X86_32 EXPERIMENTAL !X86_PAE FUTEX + depends on X86_32 EXPERIMENTAL !X86_PAE EVENTFD Note to self: we probably need a similar line in KVM now. 'select EVENTFD' is more appropriate. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 00/19] virtual-bus
On Fri, 5 Jun 2009 09:26:48 pm Gregory Haskins wrote: Hi Rusty, Rusty Russell wrote: On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote: Avi Kivity wrote: Gregory Haskins wrote: One idea is similar to signalfd() or eventfd() And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born. ;) The lguest patch queue already has such an interface :) Cool! Ultimately I think it will be easier if both lguest+kvm support the same eventfd notion so this is good you are already moving in the same direction. Not really; lguest doesn't do PCI. And I have a partially complete in-kernel virtio_pci patch with the same trick. I thought lguest didn't use pci? Or do you just mean that you have an in-kernel virtio-net for lguest? No, this was for kvm. Sorry for the confusion. Other than the potential rcu issues that Paul already addressed, looks good. FWIW: this looks like what we are calling iosignalfd on the kvm land (unless I am misunderstanding). Do you have the equivalent of irqfd going the other way? Yes; lguest uses write() (offset indicates cpu #) rather than ioctls, but anyone can do the LHREQ_IRQ write to queue an interrupt for delivery. So the threads just get the same /dev/lguest fd and it's simple. Thanks! Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 00/19] virtual-bus
Rusty Russell wrote: On Fri, 5 Jun 2009 09:26:48 pm Gregory Haskins wrote: Hi Rusty, Rusty Russell wrote: On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote: Avi Kivity wrote: Gregory Haskins wrote: One idea is similar to signalfd() or eventfd() And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born. ;) The lguest patch queue already has such an interface :) Cool! Ultimately I think it will be easier if both lguest+kvm support the same eventfd notion so this is good you are already moving in the same direction. Not really; lguest doesn't do PCI. Thats ok. I see these eventfd interfaces as somewhat orthogonal to PCI. I.e. if both lguest and kvm have an eventfd mechnism for signaling in both directions (e.g. interrupts and io), it would make it easier to support the kind of thing I am striving for with a unified backend. That is: one in-kernel virtio-net that works in both (or even many) HV environments. I see that as a higher layer abstraction than PCI, per se. And I have a partially complete in-kernel virtio_pci patch with the same trick. I thought lguest didn't use pci? Or do you just mean that you have an in-kernel virtio-net for lguest? No, this was for kvm. Sorry for the confusion. Ah, sorry. Well, if its in any kind of shape to see the light of day, please forward it over. Perhaps Michael and I can craft it into a working solution. Other than the potential rcu issues that Paul already addressed, looks good. FWIW: this looks like what we are calling iosignalfd on the kvm land (unless I am misunderstanding). Do you have the equivalent of irqfd going the other way? Yes; lguest uses write() (offset indicates cpu #) rather than ioctls, but anyone can do the LHREQ_IRQ write to queue an interrupt for delivery. So the threads just get the same /dev/lguest fd and it's simple. Ah, ok. Thats workable, too. (This kind of detail would be buried in the lguest connector for vbus anyway, so it doesn't have to have a uniform eventfd_signal() interface to work. The fd concept alone is sufficiently flexible). Thanks Rusty, -Greg signature.asc Description: OpenPGP digital signature
Re: [RFC PATCH v2 00/19] virtual-bus
On Fri, 5 Jun 2009 03:00:10 pm Paul E. McKenney wrote: On Fri, Jun 05, 2009 at 02:25:01PM +0930, Rusty Russell wrote: + /* lg-eventfds is RCU-protected */ + preempt_disable(); Suggest changing to rcu_read_lock() to match the synchronize_rcu(). Ah yes, much better. As I was implementing it I warred with myself since lguest aims for simplicity above all else. But since we only ever add things to the array, RCU probably is simpler. + for (i = 0; i cpu-lg-num_eventfds; i++) { + if (cpu-lg-eventfds[i].addr == cpu-pending_notify) { + eventfd_signal(cpu-lg-eventfds[i].event, 1); Shouldn't this be something like the following? p = rcu_dereference(cpu-lg-eventfds); if (p[i].addr == cpu-pending_notify) { eventfd_signal(p[i].event, 1); Hmm, need to read num_eventfds first, too. It doesn't matter if we get the old -num_eventfds and the new -eventfds, but the other way around would be bad. Here's the inter-diff: diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -39,18 +39,24 @@ static int break_guest_out(struct lg_cpu bool send_notify_to_eventfd(struct lg_cpu *cpu) { - unsigned int i; + unsigned int i, num; + struct lg_eventfds *eventfds; + + /* Make sure we grab the total number before accessing the array. */ + cpu-lg-num_eventfds = num; + rmb(); /* lg-eventfds is RCU-protected */ rcu_read_lock(); - for (i = 0; i cpu-lg-num_eventfds; i++) { - if (cpu-lg-eventfds[i].addr == cpu-pending_notify) { - eventfd_signal(cpu-lg-eventfds[i].event, 1); + eventfds = rcu_dereference(cpu-lg-eventfds); + for (i = 0; i num; i++) { + if (eventfds[i].addr == cpu-pending_notify) { + eventfd_signal(eventfds[i].event, 1); cpu-pending_notify = 0; break; } } - preempt_enable(); + rcu_read_unlock(); return cpu-pending_notify == 0; } Thanks! Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 00/19] virtual-bus
On Sat, Jun 06, 2009 at 12:25:57AM +0930, Rusty Russell wrote: On Fri, 5 Jun 2009 03:00:10 pm Paul E. McKenney wrote: On Fri, Jun 05, 2009 at 02:25:01PM +0930, Rusty Russell wrote: + /* lg-eventfds is RCU-protected */ + preempt_disable(); Suggest changing to rcu_read_lock() to match the synchronize_rcu(). Ah yes, much better. As I was implementing it I warred with myself since lguest aims for simplicity above all else. But since we only ever add things to the array, RCU probably is simpler. ;-) + for (i = 0; i cpu-lg-num_eventfds; i++) { + if (cpu-lg-eventfds[i].addr == cpu-pending_notify) { + eventfd_signal(cpu-lg-eventfds[i].event, 1); Shouldn't this be something like the following? p = rcu_dereference(cpu-lg-eventfds); if (p[i].addr == cpu-pending_notify) { eventfd_signal(p[i].event, 1); Hmm, need to read num_eventfds first, too. It doesn't matter if we get the old -num_eventfds and the new -eventfds, but the other way around would be bad. Yep!!! ;-) Here's the inter-diff: diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -39,18 +39,24 @@ static int break_guest_out(struct lg_cpu bool send_notify_to_eventfd(struct lg_cpu *cpu) { - unsigned int i; + unsigned int i, num; + struct lg_eventfds *eventfds; + + /* Make sure we grab the total number before accessing the array. */ + cpu-lg-num_eventfds = num; + rmb(); /* lg-eventfds is RCU-protected */ rcu_read_lock(); - for (i = 0; i cpu-lg-num_eventfds; i++) { - if (cpu-lg-eventfds[i].addr == cpu-pending_notify) { - eventfd_signal(cpu-lg-eventfds[i].event, 1); + eventfds = rcu_dereference(cpu-lg-eventfds); + for (i = 0; i num; i++) { + if (eventfds[i].addr == cpu-pending_notify) { + eventfd_signal(eventfds[i].event, 1); cpu-pending_notify = 0; break; } } - preempt_enable(); + rcu_read_unlock(); return cpu-pending_notify == 0; } It is possible to get rid of the rmb() and wmb() as well, doing something like the following: struct lg_eventfds_num { unsigned int n; struct lg_eventfds a[0]; } Then the rcu_dereference() gets you a pointer to a struct lg_eventfds_num, which has the array and its length in guaranteed synchronization without the need for barriers. Does this work for you, or is there some complication that I am missing? Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 00/19] virtual-bus
Avi Kivity wrote: Gregory Haskins wrote: Avi, Gregory Haskins wrote: Todo: *) Develop some kind of hypercall registration mechanism for KVM so that we can use that as an integration point instead of directly hooking kvm hypercalls What would you like to see here? I now remember why I removed the original patch I had for registration...it requires some kind of discovery mechanism on its own. Note that this is hard, but I figured it would make the overall series simpler if I didn't go this route and instead just integrated with a statically allocated vector. That being said, I have no problem adding this back in but figure we should discuss the approach so I don't go down a rat-hole ;) One idea is similar to signalfd() or eventfd(). Provide a kvm ioctl that takes a gsi and returns an fd. Writes to the fd change the state of the line, possible triggering an interrupt. Another ioctl takes a hypercall number or pio port as well as an existing fd. Invocations of the hypercall or writes to the port write to the fd (using the same protocol as eventfd), so the other end can respond. The nice thing is that this can be used by both kernel and userspace components, and for kernel components, hypercalls can be either buffered or unbuffered. And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born. ;) (Michael FYI: so I will be pushing a vbus-v4 series at some point in the near future that is expressed in terms of irqfd/iosignalfd, per the conversation above. The patches in v3 and earlier are more intrusive to the KVM core than they will be in final form) -Greg signature.asc Description: OpenPGP digital signature
Re: [RFC PATCH v2 00/19] virtual-bus
On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote: Avi Kivity wrote: Gregory Haskins wrote: One idea is similar to signalfd() or eventfd() And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born. ;) The lguest patch queue already has such an interface :) And I have a partially complete in-kernel virtio_pci patch with the same trick. I switched from kernel created eventfd to userspace passes in eventfd after a while though; it lets you connect multiple virtqueues to a single fd if you want. Combined with a minor change to allow any process with access to the lguest fd to queue interrupts, this allowed lguest to move to a thread-per-virtqueue model which was a significant speedup as well as nice code reduction. Here's the relevant kernel patch for reading. Thanks! Rusty. lguest: use eventfds for device notification Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with an address: the main Launcher process returns with this address, and figures out what device to run. A far nicer model is to let processes bind an eventfd to an address: if we find one, we simply signal the eventfd. Signed-off-by: Rusty Russell ru...@rustcorp.com.au Cc: Davide Libenzi davi...@xmailserver.org --- drivers/lguest/Kconfig |2 - drivers/lguest/core.c |8 ++-- drivers/lguest/lg.h |9 drivers/lguest/lguest_user.c| 73 include/linux/lguest_launcher.h |1 5 files changed, 89 insertions(+), 4 deletions(-) diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig --- a/drivers/lguest/Kconfig +++ b/drivers/lguest/Kconfig @@ -1,6 +1,6 @@ config LGUEST tristate Linux hypervisor example code - depends on X86_32 EXPERIMENTAL !X86_PAE FUTEX + depends on X86_32 EXPERIMENTAL !X86_PAE EVENTFD select HVC_DRIVER ---help--- This is a very simple module which allows you to run diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c --- a/drivers/lguest/core.c +++ b/drivers/lguest/core.c @@ -198,9 +198,11 @@ int run_guest(struct lg_cpu *cpu, unsign /* It's possible the Guest did a NOTIFY hypercall to the * Launcher, in which case we return from the read() now. */ if (cpu-pending_notify) { - if (put_user(cpu-pending_notify, user)) - return -EFAULT; - return sizeof(cpu-pending_notify); + if (!send_notify_to_eventfd(cpu)) { + if (put_user(cpu-pending_notify, user)) + return -EFAULT; + return sizeof(cpu-pending_notify); + } } /* Check for signals */ diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -82,6 +82,11 @@ struct lg_cpu { struct lg_cpu_arch arch; }; +struct lg_eventfds { + unsigned long addr; + struct file *event; +}; + /* The private info the thread maintains about the guest. */ struct lguest { @@ -102,6 +107,9 @@ struct lguest unsigned int stack_pages; u32 tsc_khz; + unsigned int num_eventfds; + struct lg_eventfds *eventfds; + /* Dead? */ const char *dead; }; @@ -152,6 +160,7 @@ void setup_default_idt_entries(struct lg void copy_traps(const struct lg_cpu *cpu, struct desc_struct *idt, const unsigned long *def); void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta); +bool send_notify_to_eventfd(struct lg_cpu *cpu); void init_clockdev(struct lg_cpu *cpu); bool check_syscall_vector(struct lguest *lg); int init_interrupts(void); diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -7,6 +7,8 @@ #include linux/miscdevice.h #include linux/fs.h #include linux/sched.h +#include linux/eventfd.h +#include linux/file.h #include lg.h /*L:055 When something happens, the Waker process needs a way to stop the @@ -35,6 +37,70 @@ static int break_guest_out(struct lg_cpu } } +bool send_notify_to_eventfd(struct lg_cpu *cpu) +{ + unsigned int i; + + /* lg-eventfds is RCU-protected */ + preempt_disable(); + for (i = 0; i cpu-lg-num_eventfds; i++) { + if (cpu-lg-eventfds[i].addr == cpu-pending_notify) { + eventfd_signal(cpu-lg-eventfds[i].event, 1); + cpu-pending_notify = 0; + break; + } + } + preempt_enable(); + return cpu-pending_notify == 0; +} + +static int add_eventfd(struct lguest *lg, unsigned long addr, int fd) +{ + struct lg_eventfds *new, *old; + + if (!addr) + return -EINVAL; + + /* Replace the old array with the
Re: [RFC PATCH v2 00/19] virtual-bus
On Fri, Jun 05, 2009 at 02:25:01PM +0930, Rusty Russell wrote: On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote: Avi Kivity wrote: Gregory Haskins wrote: One idea is similar to signalfd() or eventfd() And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born. ;) The lguest patch queue already has such an interface :) And I have a partially complete in-kernel virtio_pci patch with the same trick. I switched from kernel created eventfd to userspace passes in eventfd after a while though; it lets you connect multiple virtqueues to a single fd if you want. Combined with a minor change to allow any process with access to the lguest fd to queue interrupts, this allowed lguest to move to a thread-per-virtqueue model which was a significant speedup as well as nice code reduction. Here's the relevant kernel patch for reading. Thanks! Rusty. lguest: use eventfds for device notification Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with an address: the main Launcher process returns with this address, and figures out what device to run. A far nicer model is to let processes bind an eventfd to an address: if we find one, we simply signal the eventfd. A couple of (probably misguided) RCU questions/suggestions interspersed. Signed-off-by: Rusty Russell ru...@rustcorp.com.au Cc: Davide Libenzi davi...@xmailserver.org --- drivers/lguest/Kconfig |2 - drivers/lguest/core.c |8 ++-- drivers/lguest/lg.h |9 drivers/lguest/lguest_user.c| 73 include/linux/lguest_launcher.h |1 5 files changed, 89 insertions(+), 4 deletions(-) diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig --- a/drivers/lguest/Kconfig +++ b/drivers/lguest/Kconfig @@ -1,6 +1,6 @@ config LGUEST tristate Linux hypervisor example code - depends on X86_32 EXPERIMENTAL !X86_PAE FUTEX + depends on X86_32 EXPERIMENTAL !X86_PAE EVENTFD select HVC_DRIVER ---help--- This is a very simple module which allows you to run diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c --- a/drivers/lguest/core.c +++ b/drivers/lguest/core.c @@ -198,9 +198,11 @@ int run_guest(struct lg_cpu *cpu, unsign /* It's possible the Guest did a NOTIFY hypercall to the * Launcher, in which case we return from the read() now. */ if (cpu-pending_notify) { - if (put_user(cpu-pending_notify, user)) - return -EFAULT; - return sizeof(cpu-pending_notify); + if (!send_notify_to_eventfd(cpu)) { + if (put_user(cpu-pending_notify, user)) + return -EFAULT; + return sizeof(cpu-pending_notify); + } } /* Check for signals */ diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -82,6 +82,11 @@ struct lg_cpu { struct lg_cpu_arch arch; }; +struct lg_eventfds { + unsigned long addr; + struct file *event; +}; + /* The private info the thread maintains about the guest. */ struct lguest { @@ -102,6 +107,9 @@ struct lguest unsigned int stack_pages; u32 tsc_khz; + unsigned int num_eventfds; + struct lg_eventfds *eventfds; + /* Dead? */ const char *dead; }; @@ -152,6 +160,7 @@ void setup_default_idt_entries(struct lg void copy_traps(const struct lg_cpu *cpu, struct desc_struct *idt, const unsigned long *def); void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta); +bool send_notify_to_eventfd(struct lg_cpu *cpu); void init_clockdev(struct lg_cpu *cpu); bool check_syscall_vector(struct lguest *lg); int init_interrupts(void); diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -7,6 +7,8 @@ #include linux/miscdevice.h #include linux/fs.h #include linux/sched.h +#include linux/eventfd.h +#include linux/file.h #include lg.h /*L:055 When something happens, the Waker process needs a way to stop the @@ -35,6 +37,70 @@ static int break_guest_out(struct lg_cpu } } +bool send_notify_to_eventfd(struct lg_cpu *cpu) +{ + unsigned int i; + + /* lg-eventfds is RCU-protected */ + preempt_disable(); Suggest changing to rcu_read_lock() to match the synchronize_rcu(). + for (i = 0; i cpu-lg-num_eventfds; i++) { + if (cpu-lg-eventfds[i].addr == cpu-pending_notify) { + eventfd_signal(cpu-lg-eventfds[i].event, 1); Shouldn't this be something like the following? p = rcu_dereference(cpu-lg-eventfds); if
Re: [RFC PATCH v2 00/19] virtual-bus
Gregory Haskins wrote: Avi, Gregory Haskins wrote: Todo: *) Develop some kind of hypercall registration mechanism for KVM so that we can use that as an integration point instead of directly hooking kvm hypercalls What would you like to see here? I now remember why I removed the original patch I had for registration...it requires some kind of discovery mechanism on its own. Note that this is hard, but I figured it would make the overall series simpler if I didn't go this route and instead just integrated with a statically allocated vector. That being said, I have no problem adding this back in but figure we should discuss the approach so I don't go down a rat-hole ;) One idea is similar to signalfd() or eventfd(). Provide a kvm ioctl that takes a gsi and returns an fd. Writes to the fd change the state of the line, possible triggering an interrupt. Another ioctl takes a hypercall number or pio port as well as an existing fd. Invocations of the hypercall or writes to the port write to the fd (using the same protocol as eventfd), so the other end can respond. The nice thing is that this can be used by both kernel and userspace components, and for kernel components, hypercalls can be either buffered or unbuffered. So, one thing we could do is use a string-identifier to discover hypercall resources. In this model, we would have one additional hypercall registered with kvm (in addition to the mmu-ops, etc) called KVM_HC_DYNHC or something like that. The support for DYNHC could be indicated in the cpuid (much like I do with the RESET, DYNIRQ, and VBUS support today. When hypercall provides register, the could provide a string such as vbus, and they would be allocated a hypercall id. Likewise, the HC_DYNHC interface would allow a guest to query the cpuid for the DYNHC feature, and then query the HC_DYNHC vector for a string to hc# translation. If the provider is not present, we return -1 for the hc#, otherwise we return the one that was allocated. I know how you feel about string-ids in general, but I am not quite sure how to design this otherwise without it looking eerily similar to what I already have (which is registering a new HC vector in kvm_para.h) No need for a string ID. Reserve a range of hypercall numbers for dynamic IDs. Userspace allocates one and gives it to the device using its configuration space (as applies to whatever bus it is using). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v2 00/19] virtual-bus
This is release v2. Changes since v1: *) Incorporated review feedback from Stephen Hemminger on vbus-enet driver *) Added support for connecting to vbus devices from userspace *) Added support for a virtio-vbus transport to allow virtio drivers to work with vbus (needs testing and backend models). (Avi, I know I still owe you a reply re the PCI debate) Todo: *) Develop some kind of hypercall registration mechanism for KVM so that we can use that as an integration point instead of directly hooking kvm hypercalls *) Beef up the userspace event channel ABI to support different event types *) Add memory-registration support *) Integrate with qemu PCI device model to render vbus objects as PCI *) Develop some virtio backend devices. *) Support ethtool_ops for venet. --- RFC: Virtual-bus applies to v2.6.29 (will port to git HEAD soon) FIRST OFF: Let me state that this is _not_ a KVM or networking specific technology. Virtual-Bus is a mechanism for defining and deploying software “devices” directly in a Linux kernel. These devices are designed to be directly accessed from a variety of environments in an arbitrarly nested fashion. The goal is provide for the potential for maxium IO performance by providing the shortest and most efficient path to the bare metal kernel, and thus the actual IO resources. For instance, an application can be written to run the same on baremetal as it does in guest userspace nested 10 levels deep, all the while providing direct access to the resource, thus reducing latency and boosting throughput. A good way to think of this is perhaps like software based SR-IOV that supports nesting of the pass-through. Due to its design as an in-kernel resource, it also provides very strong notions of protection and isolation so as to not introduce a security compromise when compared to traditional/alternative models where such guarantees are provided by something like userspace or hardware. The example use-case we have provided supports a “virtual-ethernet” device being utilized in a KVM guest environment, so comparisons to virtio-net will be natural. However, please note that this is but one use-case, of many we have planned for the future (such as userspace bypass and RT guest support). The goal for right now is to describe what a virual-bus is and why we believe it is useful. We are intent to get this core technology merged, even if the networking components are not accepted as is. It should be noted that, in many ways, virtio could be considered complimentary to the technology. We could in fact, have implemented the virtual-ethernet using a virtio-ring, but it would have required ABI changes that we didn't want to yet propose without having the concept in general vetted and accepted by the community. [Update: this release includes a virtio-vbus transport, so virtio-net and other such drivers can now run over vbus in addition to the venet system provided] To cut to the chase, we recently measured our virtual-ethernet on v2.6.29 on two 8-core x86_64 boxes with Chelsio T3 10GE connected back to back via cross over. We measured bare-metal performance, as well as a kvm guest (running the same kernel) connected to the T3 via a linux-bridge+tap configuration with a 1500 MTU. The results are as follows: Bare metal: tput = 4078Mb/s, round-trip = 25593pps (39us rtt) Virtio-net: tput = 4003Mb/s, round-trip = 320pps (3125us rtt) Venet: tput = 4050Mb/s, round-trip = 15255 (65us rtt) As you can see, all three technologies can achieve (MTU limited) line-rate, but the virtio-net solution is severely limited on the latency front (by a factor of 48:1) Note that the 320pps is technically artificially low in virtio-net, caused by a a known design limitation to use a timer for tx-mitigation. However, note that even when removing the timer from the path the best we could achieve was 350us-450us of latency, and doing so causes the tput to drop to 1300Mb/s. So even in this case, I think the in-kernel results presents a compelling argument for the new model presented. [Update: Anthony Ligouri is working on this userspace implementation problem currently and has obtained significant performance gains by utilizing some of the techniques we use in this patch set as well. More details to come.] When we jump to 9000 byte MTU, the situation looks similar Bare metal: tput = 9717Mb/s, round-trip = 30396pps (33us rtt) Virtio-net: tput = 4578Mb/s, round-trip = 249pps (4016us rtt) Venet: tput = 5802Mb/s, round-trip = 15127 (66us rtt) Note that even the throughput was slightly better in this test for venet, though neither venet nor virtio-net could achieve line-rate. I suspect some tuning may allow these numbers to improve, TBD. So with that said, lets jump into the description: Virtual-Bus: What is it? Virtual-Bus is a kernel based IO resource container technology. It is modeled on a concept similar to the Linux Device-Model
Re: [RFC PATCH v2 00/19] virtual-bus
Avi, Gregory Haskins wrote: Todo: *) Develop some kind of hypercall registration mechanism for KVM so that we can use that as an integration point instead of directly hooking kvm hypercalls What would you like to see here? I now remember why I removed the original patch I had for registration...it requires some kind of discovery mechanism on its own. Note that this is hard, but I figured it would make the overall series simpler if I didn't go this route and instead just integrated with a statically allocated vector. That being said, I have no problem adding this back in but figure we should discuss the approach so I don't go down a rat-hole ;) So, one thing we could do is use a string-identifier to discover hypercall resources. In this model, we would have one additional hypercall registered with kvm (in addition to the mmu-ops, etc) called KVM_HC_DYNHC or something like that. The support for DYNHC could be indicated in the cpuid (much like I do with the RESET, DYNIRQ, and VBUS support today. When hypercall provides register, the could provide a string such as vbus, and they would be allocated a hypercall id. Likewise, the HC_DYNHC interface would allow a guest to query the cpuid for the DYNHC feature, and then query the HC_DYNHC vector for a string to hc# translation. If the provider is not present, we return -1 for the hc#, otherwise we return the one that was allocated. I know how you feel about string-ids in general, but I am not quite sure how to design this otherwise without it looking eerily similar to what I already have (which is registering a new HC vector in kvm_para.h) Thoughts? -Greg signature.asc Description: OpenPGP digital signature