Re: TODO list for qemu+KVM networking performance v2
Rusty Russell wrote: On Fri, 5 Jun 2009 02:13:20 am Michael S. Tsirkin wrote: I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as well, and intend to dump updates there from time to time. Hi Michael, Sorry for the delay. I'm weaning myself off my virtio work, but virtio_net performance is an issue which still needs lots of love. BTW a non-wiki on the wiki?. You should probably rename it to MST_Networking_Performance or allow editing :) - skbs in flight are kept in send queue linked list, so that we can flush them when device is removed [ mst: optimization idea: virtqueue already tracks posted buffers. Add flush/purge operation and use that instead? Interesting idea, but not really an optimization. (flush_buf() which does a get_buf() but for unused buffers). ] - skb is reformatted to scattergather format [ mst: idea to try: this does a copy for skb head, which might be costly especially for small/linear packets. Try to avoid this? Might need to tweak virtio interface. ] There's no copy here that I can see? - network driver adds the packet buffer on TX ring - network driver does a kick which causes a VM exit [ mst: any way to mitigate # of VM exits here? Possibly could be done on host side as well. ] [ markmc: All of our efforts there have been on the host side, I think that's preferable than trying to do anything on the guest side. ] The current theoretical hole is that the host suppresses notifications using the VIRTIO_AVAIL_F_NO_NOTIFY flag, but we can get a number of notifications in before it gets to that suppression. You can use a counter to improve this: you only notify when they're equal, and inc when you notify. That way you suppress further notifications even if the other side takes ages to wake up. In practice, this shouldn't be played with until we have full aio (or equiv in kernel) for other side: host xmit tends to be too fast at the moment and we get a notification per packet anyway. Xen ring has the exact optimization for ages. imho we should have it too, regardless of aio. It reduces #vmexits/spurious wakeups and it is very simple to implement. - Full queue: we keep a single extra skb around: if we fail to transmit, we queue it [ mst: idea to try: what does it do to performance if we queue more packets? ] Bad idea!! We already have two queues, this is a third. We should either stop the queue before it gets full, or fix TX_BUSY handling. I've been arguing on netdev for the latter (see thread[PATCH 2/4] virtio_net: return NETDEV_TX_BUSY instead of queueing an extra skb.). [ markmc: the queue might soon be going away: 200905292346.04815.ru...@rustcorp.com.au Ah, yep, that one. http://archive.netbsd.se/?ml=linux-netdeva=2009-05m=10788575 ] - We get each buffer from host as it is completed and free it - TX interrupts are only enabled when queue is stopped, and when it is originally created (we disable them on completion) [ mst: idea: second part is probably unintentional. todo: we probably should disable interrupts when device is created. ] Yep, minor wart. - We poll for buffer completions: 1. Before each TX 2. On a timer tasklet (unless 3 is supported) 3. When host sends us interrupt telling us that the queue is empty [ mst: idea to try: instead of empty, enable send interrupts on xmit when buffer is almost full (e.g. at least half empty): we are running out of buffers, it's important to free them ASAP. Can be done from host or from guest. ] [ Rusty proposing that we don't need (2) or (3) if the skbs are orphaned before start_xmit(). See subj net: skb_orphan on dev_hard_start_xmit.] [ rusty also seems to be suggesting that disabling VIRTIO_F_NOTIFY_ON_EMPTY on the host should help the case where the host out-paces the guest ] Yes, that's more fruitful. - Each skb has a 128 byte buffer at head and a single page for data. Only full pages are passed to virtio buffers. [ mst: for large packets, managing the 128 head buffers is wasted effort. Try allocating skbs on rcv path when needed. ]. [ mst: to clarify the previos suggestion: I am talking about merging here. We currently allocate skbs and pages for them. If a packet spans multiple pages, we discard the extra skbs. Instead, let's allocate pages but not skbs. Allocate and fill skbs on receive path. ] Yep. There's another issue here, which is alignment: packets which get placed into pages are misaligned (that 14 byte ethernet header). We should add a feature to allow the
Re: TODO list for qemu+KVM networking performance v2
On Wed, 10 Jun 2009 03:56:31 pm Dor Laor wrote: Rusty Russell wrote: The current theoretical hole is that the host suppresses notifications using the VIRTIO_AVAIL_F_NO_NOTIFY flag, but we can get a number of notifications in before it gets to that suppression. You can use a counter to improve this: you only notify when they're equal, and inc when you notify. That way you suppress further notifications even if the other side takes ages to wake up. In practice, this shouldn't be played with until we have full aio (or equiv in kernel) for other side: host xmit tends to be too fast at the moment and we get a notification per packet anyway. Xen ring has the exact optimization for ages. imho we should have it too, regardless of aio. It reduces #vmexits/spurious wakeups and it is very simple to implement. But look at number of wakeups received vs notifications sent: I just don't see any benefit there at the moment. As I said, improving the host code might change that significantly. And implementing it the other way is v. v. hard given the nature of interrupts (shared and coalesced). Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
On Thu, Jun 11, 2009 at 12:09:33AM +0930, Rusty Russell wrote: On Wed, 10 Jun 2009 03:56:31 pm Dor Laor wrote: Rusty Russell wrote: The current theoretical hole is that the host suppresses notifications using the VIRTIO_AVAIL_F_NO_NOTIFY flag, but we can get a number of notifications in before it gets to that suppression. You can use a counter to improve this: you only notify when they're equal, and inc when you notify. That way you suppress further notifications even if the other side takes ages to wake up. In practice, this shouldn't be played with until we have full aio (or equiv in kernel) for other side: host xmit tends to be too fast at the moment and we get a notification per packet anyway. Xen ring has the exact optimization for ages. imho we should have it too, regardless of aio. It reduces #vmexits/spurious wakeups and it is very simple to implement. But look at number of wakeups received vs notifications sent: I just don't see any benefit there at the moment. As I said, improving the host code might change that significantly. And implementing it the other way is v. v. hard given the nature of interrupts (shared and coalesced). I agree it's not such a simple thing to implement race-free, so I do buy the argument that we shouldn't unless it gives a performance benefit. But I don't understand how aio will make implementing it easier - or are you merely saying that it will make it worthwhile? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
Michael S. Tsirkin wrote: But I don't understand how aio will make implementing it easier - or are you merely saying that it will make it worthwhile? If you have aio, the the NIC and the guest proceed in parallel. If the guest is faster (likely), then when it sends the next packet it will see that interrupts are disabled and not notify again. Once aio complete we can recheck the queue; if it's empty we reenable notifications. If there's still stuff in it we submit it with notifications disabled. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
On Wed, Jun 10, 2009 at 06:18:01PM +0300, Avi Kivity wrote: Michael S. Tsirkin wrote: But I don't understand how aio will make implementing it easier - or are you merely saying that it will make it worthwhile? If you have aio, the the NIC and the guest proceed in parallel. If the guest is faster (likely), then when it sends the next packet it will see that interrupts are disabled and not notify again. Once aio complete we can recheck the queue; if it's empty we reenable notifications. If there's still stuff in it we submit it with notifications disabled. So you are saying that with aio we won't need this optimization at all? I guess it's late in the day, and my mind is fuzzy... -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
Michael S. Tsirkin wrote: On Wed, Jun 10, 2009 at 06:18:01PM +0300, Avi Kivity wrote: Michael S. Tsirkin wrote: But I don't understand how aio will make implementing it easier - or are you merely saying that it will make it worthwhile? If you have aio, the the NIC and the guest proceed in parallel. If the guest is faster (likely), then when it sends the next packet it will see that interrupts are disabled and not notify again. Once aio complete we can recheck the queue; if it's empty we reenable notifications. If there's still stuff in it we submit it with notifications disabled. So you are saying that with aio we won't need this optimization at all? I guess it's late in the day, and my mind is fuzzy... No, I'm saying with aio the optimization becomes worthwhile. But I joined late in the thread so we may be talking about different things. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
On Wed, Jun 10, 2009 at 07:08:49PM +0300, Avi Kivity wrote: Michael S. Tsirkin wrote: On Wed, Jun 10, 2009 at 06:18:01PM +0300, Avi Kivity wrote: Michael S. Tsirkin wrote: But I don't understand how aio will make implementing it easier - or are you merely saying that it will make it worthwhile? If you have aio, the the NIC and the guest proceed in parallel. If the guest is faster (likely), then when it sends the next packet it will see that interrupts are disabled and not notify again. Once aio complete we can recheck the queue; if it's empty we reenable notifications. If there's still stuff in it we submit it with notifications disabled. So you are saying that with aio we won't need this optimization at all? I guess it's late in the day, and my mind is fuzzy... No, I'm saying with aio the optimization becomes worthwhile. But I joined late in the thread so we may be talking about different things. Oh, I see that. What Rusty's saying is that it's not as trivial as it seems, and I agree. And at some point it seemed like he was saying it's easier to implement with aio, but that probably was just my misunderstanding. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
On Fri, 5 Jun 2009 02:13:20 am Michael S. Tsirkin wrote: I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as well, and intend to dump updates there from time to time. Hi Michael, Sorry for the delay. I'm weaning myself off my virtio work, but virtio_net performance is an issue which still needs lots of love. BTW a non-wiki on the wiki?. You should probably rename it to MST_Networking_Performance or allow editing :) - skbs in flight are kept in send queue linked list, so that we can flush them when device is removed [ mst: optimization idea: virtqueue already tracks posted buffers. Add flush/purge operation and use that instead? Interesting idea, but not really an optimization. (flush_buf() which does a get_buf() but for unused buffers). ] - skb is reformatted to scattergather format [ mst: idea to try: this does a copy for skb head, which might be costly especially for small/linear packets. Try to avoid this? Might need to tweak virtio interface. ] There's no copy here that I can see? - network driver adds the packet buffer on TX ring - network driver does a kick which causes a VM exit [ mst: any way to mitigate # of VM exits here? Possibly could be done on host side as well. ] [ markmc: All of our efforts there have been on the host side, I think that's preferable than trying to do anything on the guest side. ] The current theoretical hole is that the host suppresses notifications using the VIRTIO_AVAIL_F_NO_NOTIFY flag, but we can get a number of notifications in before it gets to that suppression. You can use a counter to improve this: you only notify when they're equal, and inc when you notify. That way you suppress further notifications even if the other side takes ages to wake up. In practice, this shouldn't be played with until we have full aio (or equiv in kernel) for other side: host xmit tends to be too fast at the moment and we get a notification per packet anyway. - Full queue: we keep a single extra skb around: if we fail to transmit, we queue it [ mst: idea to try: what does it do to performance if we queue more packets? ] Bad idea!! We already have two queues, this is a third. We should either stop the queue before it gets full, or fix TX_BUSY handling. I've been arguing on netdev for the latter (see thread[PATCH 2/4] virtio_net: return NETDEV_TX_BUSY instead of queueing an extra skb.). [ markmc: the queue might soon be going away: 200905292346.04815.ru...@rustcorp.com.au Ah, yep, that one. http://archive.netbsd.se/?ml=linux-netdeva=2009-05m=10788575 ] - We get each buffer from host as it is completed and free it - TX interrupts are only enabled when queue is stopped, and when it is originally created (we disable them on completion) [ mst: idea: second part is probably unintentional. todo: we probably should disable interrupts when device is created. ] Yep, minor wart. - We poll for buffer completions: 1. Before each TX 2. On a timer tasklet (unless 3 is supported) 3. When host sends us interrupt telling us that the queue is empty [ mst: idea to try: instead of empty, enable send interrupts on xmit when buffer is almost full (e.g. at least half empty): we are running out of buffers, it's important to free them ASAP. Can be done from host or from guest. ] [ Rusty proposing that we don't need (2) or (3) if the skbs are orphaned before start_xmit(). See subj net: skb_orphan on dev_hard_start_xmit.] [ rusty also seems to be suggesting that disabling VIRTIO_F_NOTIFY_ON_EMPTY on the host should help the case where the host out-paces the guest ] Yes, that's more fruitful. - Each skb has a 128 byte buffer at head and a single page for data. Only full pages are passed to virtio buffers. [ mst: for large packets, managing the 128 head buffers is wasted effort. Try allocating skbs on rcv path when needed. ]. [ mst: to clarify the previos suggestion: I am talking about merging here. We currently allocate skbs and pages for them. If a packet spans multiple pages, we discard the extra skbs. Instead, let's allocate pages but not skbs. Allocate and fill skbs on receive path. ] Yep. There's another issue here, which is alignment: packets which get placed into pages are misaligned (that 14 byte ethernet header). We should add a feature to allow the host to say I've skipped this many bytes at the front. - Buffers are replenished after packet is received, when number of buffers becomes low (below 1/2 max). This serves to reduce the number of kicks (VMexits) for RX. [ mst:
Re: TODO list for qemu+KVM networking performance v2
Michael S. Tsirkin wrote: As I'm new to qemu/kvm, to figure out how networking performance can be improved, I went over the code and took some notes. As I did this, I tried to record ideas from recent discussions and ideas that came up on improving performance. Thus this list. This includes a partial overview of networking code in a virtual environment, with focus on performance: I'm only interested in sending and receiving packets, ignoring configuration etc. I have likely missed a ton of clever ideas and older discussions, and probably misunderstood some code. Please pipe up with corrections, additions, etc. And please don't take offence if I didn't attribute the idea correctly - most of them are marked mst by I don't claim they are original. Just let me know. And there are a couple of trivial questions on the code - I'll add answers here as they become available. I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as well, and intend to dump updates there from time to time. Hi Michael, Not sure if you have seen this, but I've already started to work on the code for in-kernel devices and have a (currently non-virtio based) proof-of-concept network device which you can for comparative data. You can find details here: http://lkml.org/lkml/2009/4/21/408 snip (Will look at your list later, to see if I can add anything) --- Short term plans: I plan to start out with trying out the following ideas: save a copy in qemu on RX side in case of a single nic in vlan implement virtio-host kernel module *detail on virtio-host-net kernel module project* virtio-host-net is a simple character device which gets memory layout information from qemu, and uses this to convert between virtio descriptors to skbs. The skbs are then passed to/from raw socket (or we could bind virtio-host to physical device like raw socket does TBD). Interrupts will be reported to eventfd descriptors, and device will poll eventfd descriptors to get kicks from guest. I currently have a virtio transport for vbus implemented, but it still needs a virtio-net device-model backend written. If you are interested, we can work on this together to implement your idea. Its on my todo list for vbus anyway, but I am currently distracted with the irqfd/iosignalfd projects which are prereqs for vbus to be considered for merge. Basically vbus is a framework for declaring in-kernel devices (not kvm specific, per se) with a full security/containment model, a hot-pluggable configuration engine, and a dynamically loadable device-model. The framework takes care of the details of signal-path and memory routing for you so that something like a virtio-net model can be implemented once and work in a variety of environments such as kvm, lguest, etc. Interested? -Greg signature.asc Description: OpenPGP digital signature
Re: TODO list for qemu+KVM networking performance v2
On Thu, Jun 04, 2009 at 01:16:05PM -0400, Gregory Haskins wrote: Michael S. Tsirkin wrote: As I'm new to qemu/kvm, to figure out how networking performance can be improved, I went over the code and took some notes. As I did this, I tried to record ideas from recent discussions and ideas that came up on improving performance. Thus this list. This includes a partial overview of networking code in a virtual environment, with focus on performance: I'm only interested in sending and receiving packets, ignoring configuration etc. I have likely missed a ton of clever ideas and older discussions, and probably misunderstood some code. Please pipe up with corrections, additions, etc. And please don't take offence if I didn't attribute the idea correctly - most of them are marked mst by I don't claim they are original. Just let me know. And there are a couple of trivial questions on the code - I'll add answers here as they become available. I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as well, and intend to dump updates there from time to time. Hi Michael, Not sure if you have seen this, but I've already started to work on the code for in-kernel devices and have a (currently non-virtio based) proof-of-concept network device which you can for comparative data. You can find details here: http://lkml.org/lkml/2009/4/21/408 snip Thanks (Will look at your list later, to see if I can add anything) --- Short term plans: I plan to start out with trying out the following ideas: save a copy in qemu on RX side in case of a single nic in vlan implement virtio-host kernel module *detail on virtio-host-net kernel module project* virtio-host-net is a simple character device which gets memory layout information from qemu, and uses this to convert between virtio descriptors to skbs. The skbs are then passed to/from raw socket (or we could bind virtio-host to physical device like raw socket does TBD). Interrupts will be reported to eventfd descriptors, and device will poll eventfd descriptors to get kicks from guest. I currently have a virtio transport for vbus implemented, but it still needs a virtio-net device-model backend written. You mean virtio-ring implementation? I intended to basically start by reusing the code from Documentation/lguest/lguest.c Isn't this all there is to it? If you are interested, we can work on this together to implement your idea. Its on my todo list for vbus anyway, but I am currently distracted with the irqfd/iosignalfd projects which are prereqs for vbus to be considered for merge. Basically vbus is a framework for declaring in-kernel devices (not kvm specific, per se) with a full security/containment model, a hot-pluggable configuration engine, and a dynamically loadable device-model. The framework takes care of the details of signal-path and memory routing for you so that something like a virtio-net model can be implemented once and work in a variety of environments such as kvm, lguest, etc. Interested? -Greg It seems that a character device with a couple of ioctls would be simpler for an initial prototype. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
On Thu, Jun 04, 2009 at 01:50:20PM -0400, Gregory Haskins wrote: Suit yourself, but I suspect that by the time you build the prototype you will either end up re-solving all the same problems anyway, or have diminished functionality (or both). /me goes to look at vbus patches. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
Michael S. Tsirkin wrote: On Thu, Jun 04, 2009 at 01:16:05PM -0400, Gregory Haskins wrote: Michael S. Tsirkin wrote: As I'm new to qemu/kvm, to figure out how networking performance can be improved, I went over the code and took some notes. As I did this, I tried to record ideas from recent discussions and ideas that came up on improving performance. Thus this list. This includes a partial overview of networking code in a virtual environment, with focus on performance: I'm only interested in sending and receiving packets, ignoring configuration etc. I have likely missed a ton of clever ideas and older discussions, and probably misunderstood some code. Please pipe up with corrections, additions, etc. And please don't take offence if I didn't attribute the idea correctly - most of them are marked mst by I don't claim they are original. Just let me know. And there are a couple of trivial questions on the code - I'll add answers here as they become available. I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as well, and intend to dump updates there from time to time. Hi Michael, Not sure if you have seen this, but I've already started to work on the code for in-kernel devices and have a (currently non-virtio based) proof-of-concept network device which you can for comparative data. You can find details here: http://lkml.org/lkml/2009/4/21/408 snip Thanks (Will look at your list later, to see if I can add anything) --- Short term plans: I plan to start out with trying out the following ideas: save a copy in qemu on RX side in case of a single nic in vlan implement virtio-host kernel module *detail on virtio-host-net kernel module project* virtio-host-net is a simple character device which gets memory layout information from qemu, and uses this to convert between virtio descriptors to skbs. The skbs are then passed to/from raw socket (or we could bind virtio-host to physical device like raw socket does TBD). Interrupts will be reported to eventfd descriptors, and device will poll eventfd descriptors to get kicks from guest. I currently have a virtio transport for vbus implemented, but it still needs a virtio-net device-model backend written. You mean virtio-ring implementation? Right. I intended to basically start by reusing the code from Documentation/lguest/lguest.c Isn't this all there is to it? Not sure. I reused the ring code already in the kernel. If you are interested, we can work on this together to implement your idea. Its on my todo list for vbus anyway, but I am currently distracted with the irqfd/iosignalfd projects which are prereqs for vbus to be considered for merge. Basically vbus is a framework for declaring in-kernel devices (not kvm specific, per se) with a full security/containment model, a hot-pluggable configuration engine, and a dynamically loadable device-model. The framework takes care of the details of signal-path and memory routing for you so that something like a virtio-net model can be implemented once and work in a variety of environments such as kvm, lguest, etc. Interested? -Greg It seems that a character device with a couple of ioctls would be simpler for an initial prototype. Suit yourself, but I suspect that by the time you build the prototype you will either end up re-solving all the same problems anyway, or have diminished functionality (or both). Its actually very simple to declare a new virtio-vbus device, but the choice is yours. I can crank out a skeleton for you, if you like. -Greg signature.asc Description: OpenPGP digital signature