[PATCH 1/4] kvm tools: Add ioeventfd support
ioeventfd is way provided by KVM to receive notifications about reads and writes to PIO and MMIO areas within the guest. Such notifications are usefull if all we need to know is that a specific area of the memory has been changed, and we don't need a heavyweight exit to happen. The implementation uses epoll to scale to large number of ioeventfds. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/Makefile|1 + tools/kvm/include/kvm/ioeventfd.h | 27 tools/kvm/ioeventfd.c | 127 + tools/kvm/kvm-run.c |4 + 4 files changed, 159 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/include/kvm/ioeventfd.h create mode 100644 tools/kvm/ioeventfd.c diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 2ebc86c..e7ceb5c 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -48,6 +48,7 @@ OBJS+= irq.o OBJS+= rbtree.o OBJS+= util/rbtree-interval.o OBJS+= virtio/9p.o +OBJS+= ioeventfd.o FLAGS_BFD=$(CFLAGS) -lbfd diff --git a/tools/kvm/include/kvm/ioeventfd.h b/tools/kvm/include/kvm/ioeventfd.h new file mode 100644 index 000..fa57b41 --- /dev/null +++ b/tools/kvm/include/kvm/ioeventfd.h @@ -0,0 +1,27 @@ +#ifndef KVM__IOEVENTFD_H +#define KVM__IOEVENTFD_H + +#include linux/types.h +#include linux/list.h +#include sys/eventfd.h + +struct kvm; + +struct ioevent { + u64 start; + u8 len; + void(*ioevent_callback_fn)(struct kvm *kvm, void *ptr); + struct kvm *kvm; + void*ptr; + int event_fd; + u64 datamatch; + + struct list_headlist_used; +}; + +void ioeventfd__init(void); +void ioeventfd__start(void); +void ioeventfd__add_event(struct kvm *kvm, struct ioevent *ioevent); +void ioeventfd__del_event(struct kvm *kvm, u64 start, u64 datamatch); + +#endif diff --git a/tools/kvm/ioeventfd.c b/tools/kvm/ioeventfd.c new file mode 100644 index 000..532 --- /dev/null +++ b/tools/kvm/ioeventfd.c @@ -0,0 +1,127 @@ +#include sys/epoll.h +#include sys/ioctl.h +#include pthread.h +#include unistd.h +#include stdio.h +#include signal.h + +#include linux/kernel.h +#include linux/kvm.h +#include linux/types.h + +#include kvm/ioeventfd.h +#include kvm/kvm.h +#include kvm/util.h + +#define IOEVENTFD_MAX_EVENTS 20 + +static struct epoll_event events[IOEVENTFD_MAX_EVENTS]; +static int epoll_fd; +static LIST_HEAD(used_ioevents); + +void ioeventfd__init(void) +{ + epoll_fd = epoll_create(IOEVENTFD_MAX_EVENTS); + if (epoll_fd 0) + die(Failed creating epoll fd); +} + +void ioeventfd__add_event(struct kvm *kvm, struct ioevent *ioevent) +{ + struct kvm_ioeventfd kvm_ioevent; + struct epoll_event epoll_event; + struct ioevent *new_ioevent; + int event; + + new_ioevent = malloc(sizeof(*new_ioevent)); + if (new_ioevent == NULL) + die(Failed allocating memory for new ioevent); + + *new_ioevent = *ioevent; + event = new_ioevent-event_fd; + + kvm_ioevent = (struct kvm_ioeventfd) { + .addr = ioevent-start, + .len= ioevent-len, + .datamatch = ioevent-datamatch, + .fd = event, + .flags = KVM_IOEVENTFD_FLAG_PIO | KVM_IOEVENTFD_FLAG_DATAMATCH, + }; + + if (ioctl(kvm-vm_fd, KVM_IOEVENTFD, kvm_ioevent) != 0) + die(Failed creating new ioeventfd); + + epoll_event = (struct epoll_event) { + .events = EPOLLIN, + .data.ptr = new_ioevent, + }; + + if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, event, epoll_event) != 0) + die(Failed assigning new event to the epoll fd); + + list_add_tail(new_ioevent-list_used, used_ioevents); +} + +void ioeventfd__del_event(struct kvm *kvm, u64 start, u64 datamatch) +{ + struct kvm_ioeventfd kvm_ioevent; + struct ioevent *ioevent; + u8 found = 0; + + list_for_each_entry(ioevent, used_ioevents, list_used) { + if (ioevent-start == start) { + found = 1; + break; + } + } + + if (found == 0 || ioevent == NULL) + return; + + kvm_ioevent = (struct kvm_ioeventfd) { + .addr = ioevent-start, + .len= ioevent-len, + .datamatch = ioevent-datamatch, + .flags = KVM_IOEVENTFD_FLAG_PIO + | KVM_IOEVENTFD_FLAG_DEASSIGN + | KVM_IOEVENTFD_FLAG_DATAMATCH, + }; + + ioctl(kvm-vm_fd, KVM_IOEVENTFD,
Re: [PATCH 1/4] kvm tools: Add ioeventfd support
On Fri, May 27, 2011 at 11:36 AM, Sasha Levin levinsasha...@gmail.com wrote: ioeventfd is way provided by KVM to receive notifications about reads and writes to PIO and MMIO areas within the guest. Such notifications are usefull if all we need to know is that a specific area of the memory has been changed, and we don't need a heavyweight exit to happen. The implementation uses epoll to scale to large number of ioeventfds. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/Makefile | 1 + tools/kvm/include/kvm/ioeventfd.h | 27 tools/kvm/ioeventfd.c | 127 + tools/kvm/kvm-run.c | 4 + 4 files changed, 159 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/include/kvm/ioeventfd.h create mode 100644 tools/kvm/ioeventfd.c Did you run any benchmarks? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] kvm tools: Add ioeventfd support
On Fri, May 27, 2011 at 1:36 PM, Sasha Levin levinsasha...@gmail.com wrote: +void ioeventfd__start(void) +{ + pthread_t thread; + + pthread_create(thread, NULL, ioeventfd__thread, NULL); Please be more careful with error handling. If an API call can fail, there's almost never any reason to silently ignore it. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] kvm tools: Add ioeventfd support
* Sasha Levin levinsasha...@gmail.com wrote: ioeventfd is way provided by KVM to receive notifications about reads and writes to PIO and MMIO areas within the guest. Such notifications are usefull if all we need to know is that a specific area of the memory has been changed, and we don't need a heavyweight exit to happen. The implementation uses epoll to scale to large number of ioeventfds. Nice! :-) +struct ioevent { + u64 start; + u8 len; If that's an mmio address then it might be worth naming it ioevent-mmio_addr, ioevent-mmio_end. + void(*ioevent_callback_fn)(struct kvm *kvm, void *ptr); Please only 'fn', we already know this is an ioevent. + struct kvm *kvm; + void*ptr; what is the purpose of the pointer? AFAICS it the private data of the callback function. In such cases please name them in a harmonizing fashion, such as: void(*fn)(struct kvm *kvm, void *data); struct kvm *fn_kvm; void*fn_data; Also, will tools/kvm/ ever run with multiple 'struct kvm' instances present? A sidenote: i think 'struct kvm *kvm' was a naming mistake - it's way too aspecific, it tells us nothing. What is a 'kvm'? A much better name would be 'struct machine *machine', hm? Even if everyone agrees this would be a separate patch, obviously. Also, can ioevent-kvm *ever* be different from the kvm that the mmio-event receiving vcpu thread is associated with? If not then the fn_kvm field is really superfluous - we get the machine from the mmio handler and can pass it down to the callback function. + int event_fd; 'fd' + u64 datamatch; what's a datamatch? 'cookie'? 'key'? + + struct list_headlist_used; just 'list' is enough i think - it's obvious that ioevent-list is a list of ioevents, right? + kvm_ioevent = (struct kvm_ioeventfd) { + .addr = ioevent-start, + .len= ioevent-len, Do you see how confusing the start/len naming is? Here you are assigning a 'start' field to an 'addr' and the reviewer is kept wondering whether that's right. If it was -mmio_addr then it would be a lot more obvious what is going on. +static void *ioeventfd__thread(void *param) +{ + for (;;) { + int nfds, i; + + nfds = epoll_wait(epoll_fd, events, IOEVENTFD_MAX_EVENTS, -1); + for (i = 0; i nfds; i++) { + u64 tmp; + struct ioevent *ioevent; + + ioevent = events[i].data.ptr; + + if (read(ioevent-event_fd, tmp, sizeof(tmp)) 0) + die(Failed reading event); + + ioevent-ioevent_callback_fn(ioevent-kvm, ioevent-ptr); + } + } + + return NULL; +} + +void ioeventfd__start(void) +{ + pthread_t thread; + + pthread_create(thread, NULL, ioeventfd__thread, NULL); +} Shouldnt this use the thread pool, so that we know about each and every worker thread we have started, in one central place? (This might have relevance, see the big-reader-lock mail i sent earlier today.) Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] kvm tools: Add ioeventfd support
On Fri, 2011-05-27 at 11:47 +0100, Stefan Hajnoczi wrote: On Fri, May 27, 2011 at 11:36 AM, Sasha Levin levinsasha...@gmail.com wrote: ioeventfd is way provided by KVM to receive notifications about reads and writes to PIO and MMIO areas within the guest. Such notifications are usefull if all we need to know is that a specific area of the memory has been changed, and we don't need a heavyweight exit to happen. The implementation uses epoll to scale to large number of ioeventfds. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/Makefile|1 + tools/kvm/include/kvm/ioeventfd.h | 27 tools/kvm/ioeventfd.c | 127 + tools/kvm/kvm-run.c |4 + 4 files changed, 159 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/include/kvm/ioeventfd.h create mode 100644 tools/kvm/ioeventfd.c Did you run any benchmarks? Stefan Yes, they showed a nice improvements - I'll post them with a V2 of the patch. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] kvm tools: Add ioeventfd support
On Fri, 2011-05-27 at 12:54 +0200, Ingo Molnar wrote: * Sasha Levin levinsasha...@gmail.com wrote: ioeventfd is way provided by KVM to receive notifications about reads and writes to PIO and MMIO areas within the guest. Such notifications are usefull if all we need to know is that a specific area of the memory has been changed, and we don't need a heavyweight exit to happen. The implementation uses epoll to scale to large number of ioeventfds. Nice! :-) +struct ioevent { + u64 start; + u8 len; If that's an mmio address then it might be worth naming it ioevent-mmio_addr, ioevent-mmio_end. + void(*ioevent_callback_fn)(struct kvm *kvm, void *ptr); Please only 'fn', we already know this is an ioevent. + struct kvm *kvm; + void*ptr; what is the purpose of the pointer? AFAICS it the private data of the callback function. In such cases please name them in a harmonizing fashion, such as: void(*fn)(struct kvm *kvm, void *data); struct kvm *fn_kvm; void*fn_data; Also, will tools/kvm/ ever run with multiple 'struct kvm' instances present? I'm not sure. We pass it around to all our functions instead of using a global, so I assumed we might have several guests under one process. A sidenote: i think 'struct kvm *kvm' was a naming mistake - it's way too aspecific, it tells us nothing. What is a 'kvm'? A much better name would be 'struct machine *machine', hm? Even if everyone agrees this would be a separate patch, obviously. Also, can ioevent-kvm *ever* be different from the kvm that the mmio-event receiving vcpu thread is associated with? If not then the fn_kvm field is really superfluous - we get the machine from the mmio handler and can pass it down to the callback function. + int event_fd; 'fd' + u64 datamatch; what's a datamatch? 'cookie'? 'key'? The kernel-side ioeventfd matches the value written to the PIO port and signals the event only if both values match. It's named this way in the kernel code so I wanted to be consistent. + + struct list_headlist_used; just 'list' is enough i think - it's obvious that ioevent-list is a list of ioevents, right? We might have a list of free ioevents if we ever decide to scale it beyond the max 20 event limit we currently have, so I would rather be specific with the list names. + kvm_ioevent = (struct kvm_ioeventfd) { + .addr = ioevent-start, + .len= ioevent-len, Do you see how confusing the start/len naming is? Here you are assigning a 'start' field to an 'addr' and the reviewer is kept wondering whether that's right. If it was -mmio_addr then it would be a lot more obvious what is going on. Yes, I'll rename them to addr/len to match with KVM naming. +static void *ioeventfd__thread(void *param) +{ + for (;;) { + int nfds, i; + + nfds = epoll_wait(epoll_fd, events, IOEVENTFD_MAX_EVENTS, -1); + for (i = 0; i nfds; i++) { + u64 tmp; + struct ioevent *ioevent; + + ioevent = events[i].data.ptr; + + if (read(ioevent-event_fd, tmp, sizeof(tmp)) 0) + die(Failed reading event); + + ioevent-ioevent_callback_fn(ioevent-kvm, ioevent-ptr); + } + } + + return NULL; +} + +void ioeventfd__start(void) +{ + pthread_t thread; + + pthread_create(thread, NULL, ioeventfd__thread, NULL); +} Shouldnt this use the thread pool, so that we know about each and every worker thread we have started, in one central place? Our thread pool currently responds to events - it runs a callback if it has received a notification to do so. It doesn't manage threads which have to run all the time like in this case. Though once we return from epoll_wait() here we do minimal work before sending the IO event into the thread pool. (This might have relevance, see the big-reader-lock mail i sent earlier today.) Thanks, Ingo -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] kvm tools: Add ioeventfd support
Hi Ingo, On Fri, May 27, 2011 at 1:54 PM, Ingo Molnar mi...@elte.hu wrote: A sidenote: i think 'struct kvm *kvm' was a naming mistake - it's way too aspecific, it tells us nothing. What is a 'kvm'? Why, an instance of a kernel virtual machine, of course! It was the very first thing I wrote for this project! On Fri, May 27, 2011 at 1:54 PM, Ingo Molnar mi...@elte.hu wrote: A much better name would be 'struct machine *machine', hm? Even if everyone agrees this would be a separate patch, obviously. Well, I don't really think 'struct machine' is that much better. The obvious benefits of 'struct kvm' is that it's the same name that's used by Qemu and libkvm and maps nicely to '/dev/kvm'. If you really, really wanna change it, I could use some more convincing or bribes of some sort. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] kvm tools: Add ioeventfd support
* Pekka Enberg penb...@kernel.org wrote: Hi Ingo, On Fri, May 27, 2011 at 1:54 PM, Ingo Molnar mi...@elte.hu wrote: A sidenote: i think 'struct kvm *kvm' was a naming mistake - it's way too aspecific, it tells us nothing. What is a 'kvm'? Why, an instance of a kernel virtual machine, of course! It was the very first thing I wrote for this project! Oh, that is way too funny: i never thought of 'KVM' as a 'kernel virtual machine' :-/ It's the name of a cool kernel subsystem - not the name of one instance of a virtual machine. On Fri, May 27, 2011 at 1:54 PM, Ingo Molnar mi...@elte.hu wrote: A much better name would be 'struct machine *machine', hm? Even if everyone agrees this would be a separate patch, obviously. Well, I don't really think 'struct machine' is that much better. The obvious benefits of 'struct kvm' is that it's the same name that's used by Qemu and libkvm and maps nicely to '/dev/kvm'. To me /dev/kvm is what interfaces to 'KVM' - where 'KVM' is the magic, axiomatic name for the aforementioned cool kernel subsystem! :-) If you really, really wanna change it, I could use some more convincing or bribes of some sort. No, i guess the naming is just fine, i just need to rewire 5 years worth of neural pathways ;-) I'll save the bribes for a worthier goal! :-) Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html