Re: [PATCH 0/5] Add vhost-blk support
On 07/17/2012 12:21 PM, Asias He wrote: On 07/17/2012 04:52 PM, Paolo Bonzini wrote: Il 17/07/2012 10:29, Asias He ha scritto: So, vhost-blk at least saves ~6 syscalls for us in each request. Are they really 6? If I/O is coalesced by a factor of 3, for example (i.e. each exit processes 3 requests), it's really 2 syscalls per request. Well. I am counting the number of syscalls in one notify and response process. Sure the IO can be coalesced. Note that Asias is using very fast "disks" (FusionIO & Ramdisk). - This might affect the level of coalescing both ways, depending on the scenario and algorithm. - This also means that the 5%-15% gain will probably be lower in real life. Ronen. Also, is there anything we can improve? Perhaps we can modify epoll and ask it to clear the eventfd for us (would save 2 reads)? Or io_getevents (would save 1)? I guess you mean qemu here. Yes, in theory, qemu's block layer can be improved to achieve similar performance as vhost-blk or kvm tool's userspace virito-blk has. But I think it makes no sense to prevent one solution becase there is another in theory solution called: we can do similar in qemu. It depends. Like vhost-scsi, vhost-blk has the problem of a crippled feature set: no support for block device formats, non-raw protocols, etc. This makes it different from vhost-net. Data-plane qemu also has this cripppled feature set problem, no? Does user always choose to use block devices format like qcow2? What if they prefer raw image or raw block device? So it begs the question, is it going to be used in production, or just a useful reference tool? This should be decided by user, I can not speak for them. What is wrong with adding one option for user which they can decide? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On 07/17/2012 09:02 PM, Paolo Bonzini wrote: Il 17/07/2012 14:48, Michael S. Tsirkin ha scritto: On Tue, Jul 17, 2012 at 01:03:39PM +0100, Stefan Hajnoczi wrote: On Tue, Jul 17, 2012 at 12:54 PM, Michael S. Tsirkin wrote: Knowing the answer to that is important before anyone can say whether this approach is good or not. Stefan Why is it? Because there might be a fix to kvmtool which closes the gap. It would be embarassing if vhost-blk was pushed just because no one looked into what is actually going on. Embarrasing to whom? Is someone working on an optimization that makes the work in question redundant, with posting just around the corner? Then maybe the thing to do is just wait a bit. Of course there is work going on to make QEMU perform better. Not sure about lkvm. Of course for lkvm also. And on the flipside, hard evidence of an overhead that cannot be resolved could be good reason to do more vhost devices in the future. How can one have hard evidence of an overhead that cannot be resolved? Since we do have two completely independent userspaces (lkvm and data-plane QEMU), you can build up some compelling evidence of an overhead that cannot be resolved in user space. This does not build the hard evidence either. How can one prove that userspace lkvm and data-plane QEMU can not be improved further? The same for vhost-blk. Either way, it's useful to do this before going further. I think each work should be discussed on its own merits. Maybe vhost-blk is just well written. So? What is your conclusion? If it's just that vhost-blk is written well, my conclusion is that lkvm people should look into improving their virtio-blk userspace. We take hints from each other all the time, for example virtio-scsi will have unlocked kick in 3.6. Why can't vhost-* just get into staging, and we call it a day? OK. I'm fine with staging. -- Asias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Wed, Jul 18, 2012 at 9:12 AM, Asias He wrote: > On 07/17/2012 07:11 PM, Stefan Hajnoczi wrote: >> >> On Tue, Jul 17, 2012 at 10:21 AM, Asias He wrote: >>> >>> On 07/17/2012 04:52 PM, Paolo Bonzini wrote: Il 17/07/2012 10:29, Asias He ha scritto: > > > So, vhost-blk at least saves ~6 syscalls for us in each request. Are they really 6? If I/O is coalesced by a factor of 3, for example (i.e. each exit processes 3 requests), it's really 2 syscalls per request. >>> >>> >>> >>> Well. I am counting the number of syscalls in one notify and response >>> process. Sure the IO can be coalesced. >> >> >> Linux AIO also supports batching in io_submit() and io_getevents(). >> Depending on the request pattern in the vring when you process it, you >> should be able to do better than 1 set of syscalls per host I/O >> request. >> >> Are you taking advantage of that at the moment in your userspace >> benchmark? > > > OK. I know that batching in io_submit() and io_getevetns(). There was a > patch for kvm tool long time ago. Now, both vhost-blk and kvm tool are not > taking advantage of that atm. There are issues: e.g. How many number of > request we want to batch? Does this batching hurt latency? I didn't mean introducing a delay so that multiple requests can be batched. I was just thinking of the simple case: when there are a lot of parallel requests the chance increases that a single vring interrupt provides several I/O requests. In that case it's easy for the virtio-blk implementation to issue them all in one io_submit(2) call. The same is true for io_getevents(2), there might be several completed host I/O requests. The reason I mentioned this was because the actual syscall pattern per request might not require 1 io_submit(2)/io_getevents(2) if you are processing a lot of requests in parallel. The only way to know why kvmtool is slower is by profiling... Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On 07/17/2012 07:11 PM, Stefan Hajnoczi wrote: On Tue, Jul 17, 2012 at 10:21 AM, Asias He wrote: On 07/17/2012 04:52 PM, Paolo Bonzini wrote: Il 17/07/2012 10:29, Asias He ha scritto: So, vhost-blk at least saves ~6 syscalls for us in each request. Are they really 6? If I/O is coalesced by a factor of 3, for example (i.e. each exit processes 3 requests), it's really 2 syscalls per request. Well. I am counting the number of syscalls in one notify and response process. Sure the IO can be coalesced. Linux AIO also supports batching in io_submit() and io_getevents(). Depending on the request pattern in the vring when you process it, you should be able to do better than 1 set of syscalls per host I/O request. Are you taking advantage of that at the moment in your userspace benchmark? OK. I know that batching in io_submit() and io_getevetns(). There was a patch for kvm tool long time ago. Now, both vhost-blk and kvm tool are not taking advantage of that atm. There are issues: e.g. How many number of request we want to batch? Does this batching hurt latency? -- Asias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 03:02:45PM +0200, Paolo Bonzini wrote: > Il 17/07/2012 14:48, Michael S. Tsirkin ha scritto: > > On Tue, Jul 17, 2012 at 01:03:39PM +0100, Stefan Hajnoczi wrote: > >> On Tue, Jul 17, 2012 at 12:54 PM, Michael S. Tsirkin > >> wrote: > Knowing the answer to that is important before anyone can say whether > this approach is good or not. > > Stefan > >>> > >>> Why is it? > >> > >> Because there might be a fix to kvmtool which closes the gap. It > >> would be embarassing if vhost-blk was pushed just because no one > >> looked into what is actually going on. > > > > Embarrasing to whom? Is someone working on an optimization that > > makes the work in question redundant, with posting just around > > the corner? Then maybe the thing to do is just wait a bit. > > Of course there is work going on to make QEMU perform better. > Not sure about lkvm. > > >> And on the flipside, hard evidence of an overhead that cannot be > >> resolved could be good reason to do more vhost devices in the future. > > > > How can one have hard evidence of an overhead that cannot be resolved? > > Since we do have two completely independent userspaces (lkvm and > data-plane QEMU), you can build up some compelling evidence of an > overhead that cannot be resolved in user space. OK, so what you are saying benchmark against data-plane QEMU? I agree actually. Asias, any data? > >> Either way, it's useful to do this before going further. > > > > I think each work should be discussed on its own merits. Maybe > > vhost-blk is just well written. So? What is your conclusion? > > If it's just that vhost-blk is written well, my conclusion is that lkvm > people should look into improving their virtio-blk userspace. We take > hints from each other all the time, for example virtio-scsi will have > unlocked kick in 3.6. > > Why can't vhost-* just get into staging, and we call it a day? > > Paolo staging is not a destination. Even if we put a driver in staging it won't stay there indefinitely if qemu-kvm is not using it, something that doesn't seem to be addressed for vhost-blk yet. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
Il 17/07/2012 14:48, Michael S. Tsirkin ha scritto: > On Tue, Jul 17, 2012 at 01:03:39PM +0100, Stefan Hajnoczi wrote: >> On Tue, Jul 17, 2012 at 12:54 PM, Michael S. Tsirkin wrote: Knowing the answer to that is important before anyone can say whether this approach is good or not. Stefan >>> >>> Why is it? >> >> Because there might be a fix to kvmtool which closes the gap. It >> would be embarassing if vhost-blk was pushed just because no one >> looked into what is actually going on. > > Embarrasing to whom? Is someone working on an optimization that > makes the work in question redundant, with posting just around > the corner? Then maybe the thing to do is just wait a bit. Of course there is work going on to make QEMU perform better. Not sure about lkvm. >> And on the flipside, hard evidence of an overhead that cannot be >> resolved could be good reason to do more vhost devices in the future. > > How can one have hard evidence of an overhead that cannot be resolved? Since we do have two completely independent userspaces (lkvm and data-plane QEMU), you can build up some compelling evidence of an overhead that cannot be resolved in user space. >> Either way, it's useful to do this before going further. > > I think each work should be discussed on its own merits. Maybe > vhost-blk is just well written. So? What is your conclusion? If it's just that vhost-blk is written well, my conclusion is that lkvm people should look into improving their virtio-blk userspace. We take hints from each other all the time, for example virtio-scsi will have unlocked kick in 3.6. Why can't vhost-* just get into staging, and we call it a day? Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 01:03:39PM +0100, Stefan Hajnoczi wrote: > On Tue, Jul 17, 2012 at 12:54 PM, Michael S. Tsirkin wrote: > >> Knowing the answer to that is important before anyone can say whether > >> this approach is good or not. > >> > >> Stefan > > > > Why is it? > > Because there might be a fix to kvmtool which closes the gap. It > would be embarassing if vhost-blk was pushed just because no one > looked into what is actually going on. Embarrasing to whom? Is someone working on an optimization that makes the work in question redundant, with posting just around the corner? Then maybe the thing to do is just wait a bit. Or are there no specific plans even, just a vague "sometime in the future someone might take Asias' patches and profile them see if anything maybe sticks out"? Then maybe not making everyone wait years for this will serve users better. > And on the flipside, hard evidence of an overhead that cannot be > resolved could be good reason to do more vhost devices in the future. How can one have hard evidence of an overhead that cannot be resolved? Any problem can be resolved in an infinite number of ways. > Either way, it's useful to do this before going further. > > Stefan I think each work should be discussed on its own merits. Maybe vhost-blk is just well written. So? What is your conclusion? -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 12:54 PM, Michael S. Tsirkin wrote: >> Knowing the answer to that is important before anyone can say whether >> this approach is good or not. >> >> Stefan > > Why is it? Because there might be a fix to kvmtool which closes the gap. It would be embarassing if vhost-blk was pushed just because no one looked into what is actually going on. And on the flipside, hard evidence of an overhead that cannot be resolved could be good reason to do more vhost devices in the future. Either way, it's useful to do this before going further. Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 12:42:13PM +0100, Stefan Hajnoczi wrote: > On Tue, Jul 17, 2012 at 12:26 PM, Michael S. Tsirkin wrote: > > On Tue, Jul 17, 2012 at 12:11:15PM +0100, Stefan Hajnoczi wrote: > >> On Tue, Jul 17, 2012 at 10:21 AM, Asias He wrote: > >> > On 07/17/2012 04:52 PM, Paolo Bonzini wrote: > >> >> > >> >> Il 17/07/2012 10:29, Asias He ha scritto: > >> >>> > >> >>> So, vhost-blk at least saves ~6 syscalls for us in each request. > >> >> > >> >> > >> >> Are they really 6? If I/O is coalesced by a factor of 3, for example > >> >> (i.e. each exit processes 3 requests), it's really 2 syscalls per > >> >> request. > >> > > >> > > >> > Well. I am counting the number of syscalls in one notify and response > >> > process. Sure the IO can be coalesced. > >> > >> Linux AIO also supports batching in io_submit() and io_getevents(). > >> Depending on the request pattern in the vring when you process it, you > >> should be able to do better than 1 set of syscalls per host I/O > >> request. > >> > >> Are you taking advantage of that at the moment in your userspace benchmark? > >> > >> Stefan > > > > Injecting an interrupt directly from kernel bypasses two context switches. > > Yes some worloads can coalesce interrupts efficiently but others can't. > > It is not really hard to speculate more. > > > > Personally I don't understand where all this speculation leads us. > > Are you guys disputing the measurements posted? If not would not > > it be better if discussion focused on the amount of extra code versus > > measured gain? > > 5-15% is nice. But what causes the performance advantage? Well, check the number of interrupts. If it's high then that is part of it. > Knowing the answer to that is important before anyone can say whether > this approach is good or not. > > Stefan Why is it? -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 12:42 PM, Stefan Hajnoczi wrote: > On Tue, Jul 17, 2012 at 12:26 PM, Michael S. Tsirkin wrote: >> On Tue, Jul 17, 2012 at 12:11:15PM +0100, Stefan Hajnoczi wrote: >>> On Tue, Jul 17, 2012 at 10:21 AM, Asias He wrote: >>> > On 07/17/2012 04:52 PM, Paolo Bonzini wrote: >>> >> >>> >> Il 17/07/2012 10:29, Asias He ha scritto: >>> >>> >>> >>> So, vhost-blk at least saves ~6 syscalls for us in each request. >>> >> >>> >> >>> >> Are they really 6? If I/O is coalesced by a factor of 3, for example >>> >> (i.e. each exit processes 3 requests), it's really 2 syscalls per >>> >> request. >>> > >>> > >>> > Well. I am counting the number of syscalls in one notify and response >>> > process. Sure the IO can be coalesced. >>> >>> Linux AIO also supports batching in io_submit() and io_getevents(). >>> Depending on the request pattern in the vring when you process it, you >>> should be able to do better than 1 set of syscalls per host I/O >>> request. >>> >>> Are you taking advantage of that at the moment in your userspace benchmark? >>> >>> Stefan >> >> Injecting an interrupt directly from kernel bypasses two context switches. >> Yes some worloads can coalesce interrupts efficiently but others can't. >> It is not really hard to speculate more. >> >> Personally I don't understand where all this speculation leads us. >> Are you guys disputing the measurements posted? If not would not >> it be better if discussion focused on the amount of extra code versus >> measured gain? > > 5-15% is nice. But what causes the performance advantage? To be clear, I suggest posting profiling results and explaining the improvement over kvmtool. Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 12:26 PM, Michael S. Tsirkin wrote: > On Tue, Jul 17, 2012 at 12:11:15PM +0100, Stefan Hajnoczi wrote: >> On Tue, Jul 17, 2012 at 10:21 AM, Asias He wrote: >> > On 07/17/2012 04:52 PM, Paolo Bonzini wrote: >> >> >> >> Il 17/07/2012 10:29, Asias He ha scritto: >> >>> >> >>> So, vhost-blk at least saves ~6 syscalls for us in each request. >> >> >> >> >> >> Are they really 6? If I/O is coalesced by a factor of 3, for example >> >> (i.e. each exit processes 3 requests), it's really 2 syscalls per request. >> > >> > >> > Well. I am counting the number of syscalls in one notify and response >> > process. Sure the IO can be coalesced. >> >> Linux AIO also supports batching in io_submit() and io_getevents(). >> Depending on the request pattern in the vring when you process it, you >> should be able to do better than 1 set of syscalls per host I/O >> request. >> >> Are you taking advantage of that at the moment in your userspace benchmark? >> >> Stefan > > Injecting an interrupt directly from kernel bypasses two context switches. > Yes some worloads can coalesce interrupts efficiently but others can't. > It is not really hard to speculate more. > > Personally I don't understand where all this speculation leads us. > Are you guys disputing the measurements posted? If not would not > it be better if discussion focused on the amount of extra code versus > measured gain? 5-15% is nice. But what causes the performance advantage? Knowing the answer to that is important before anyone can say whether this approach is good or not. Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 9:29 AM, Asias He wrote: > On 07/16/2012 07:58 PM, Stefan Hajnoczi wrote: >> Does the vhost-blk implementation do anything fundamentally different >> from userspace? Where is the overhead that userspace virtio-blk has? > > > > Currently, no. But we could play with bio directly in vhost-blk as Christoph > suggested which could make the IO path from guest to host's real storage > even shorter in vhost-blk. Wait :). My point is that writing new code without systematically investigating performance means that we're essentially throwing random things and seeing what sticks. Adding bio mode would make vhost-blk and kvmtool more different. It'll probably make vhost-blk slightly faster but harder to compare against kvmtool. It's easier to start profiling before making that change. The reason I said "special-purpose kernel module" is because kvmtool could be suffering from a bottleneck that can be fixed. Other userspace applications would also benefit from that fix - it would be generally useful. Adding a vhost-blk kernel module works around this but only benefits KVM specifically. Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 12:11:15PM +0100, Stefan Hajnoczi wrote: > On Tue, Jul 17, 2012 at 10:21 AM, Asias He wrote: > > On 07/17/2012 04:52 PM, Paolo Bonzini wrote: > >> > >> Il 17/07/2012 10:29, Asias He ha scritto: > >>> > >>> So, vhost-blk at least saves ~6 syscalls for us in each request. > >> > >> > >> Are they really 6? If I/O is coalesced by a factor of 3, for example > >> (i.e. each exit processes 3 requests), it's really 2 syscalls per request. > > > > > > Well. I am counting the number of syscalls in one notify and response > > process. Sure the IO can be coalesced. > > Linux AIO also supports batching in io_submit() and io_getevents(). > Depending on the request pattern in the vring when you process it, you > should be able to do better than 1 set of syscalls per host I/O > request. > > Are you taking advantage of that at the moment in your userspace benchmark? > > Stefan Injecting an interrupt directly from kernel bypasses two context switches. Yes some worloads can coalesce interrupts efficiently but others can't. It is not really hard to speculate more. Personally I don't understand where all this speculation leads us. Are you guys disputing the measurements posted? If not would not it be better if discussion focused on the amount of extra code versus measured gain? -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 10:21 AM, Asias He wrote: > On 07/17/2012 04:52 PM, Paolo Bonzini wrote: >> >> Il 17/07/2012 10:29, Asias He ha scritto: >>> >>> So, vhost-blk at least saves ~6 syscalls for us in each request. >> >> >> Are they really 6? If I/O is coalesced by a factor of 3, for example >> (i.e. each exit processes 3 requests), it's really 2 syscalls per request. > > > Well. I am counting the number of syscalls in one notify and response > process. Sure the IO can be coalesced. Linux AIO also supports batching in io_submit() and io_getevents(). Depending on the request pattern in the vring when you process it, you should be able to do better than 1 set of syscalls per host I/O request. Are you taking advantage of that at the moment in your userspace benchmark? Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 12:56:31PM +0200, Paolo Bonzini wrote: > Il 17/07/2012 12:49, Michael S. Tsirkin ha scritto: > >> Ok, that would make more sense. One difference between vhost-blk and > >> vhost-net is that for vhost-blk there are also management actions that > >> would trigger the switch, for example a live snapshot. > >> So a prerequisite for vhost-blk would be that it is possible to disable > >> it on the fly while the VM is running, as soon as all in-flight I/O is > >> completed. > > > > It applies for vhost-net too. For example if you bring link down, > > we switch to userspace. So vhost-net supports this switch on the fly. > > Cool. > > >> (Note that, however, this is not possible for vhost-scsi, because it > >> really exposes different hardware to the guest. It must not happen that > >> a kernel upgrade or downgrade toggles between userspace SCSI and > >> vhost-scsi, for example). > > > > I would say this is not a prerequisite for merging in qemu. > > It might be a required feature for production but it > > is also solvable at the management level. > > I'm thinking of the level interrupts here. You cannot make a change in > the guest, and have it do completely unrelated changes the hardware that > the guest sees. Absolutely. So the right thing for vhost-scsi might be to just support level (equivalent for "force" flag in vhost-net). We don't in vhost-net because it is triggered by some old guests but all virtio-scsi guests use MSI so level is just a spec compatibility issue. We might also gain kernel support for level at some point. > having to > support the API; having to handle transition from one more thing when > something better comes out. > >>> > >>> Well this is true for any code. If the limited featureset which > >>> vhost-blk can accelerate is something many people use, then accelerating > >>> by 5-15% might outweight support costs. > >> > >> It is definitely what people use if they are interested in performance. > > > > In that case it seems to me we should stop using the feature set as > > an argument and focus on whether the extra code is worth the 5-15% gain. > > No one seems to have commented on that so everyone on list thinks that > > aspect is OK? > > I would like to see a breakdown of _where_ the 5-15% lies, something > like http://www.linux-kvm.org/page/Virtio/Block/Latency. Yes but I think it's also nice to have. It's hard to argue IMO that virtio as kernel interface cuts out some overhead. > > Kernel merge windows is coming up and I would like to see whether > > any of vhost-blk / vhost-scsi is going to be actually used by userspace. > > I guess we could tag it for staging but would be nice to avoid that. > > Staging would be fine by me for both vhost-blk and vhost-scsi. > > Paolo The reason I say staging is because there seems to be a deadlock where userspace waits for kernel to merge a driver and kernel does not want to commit to an ABI that will then go unused. So even if it gets tagged as staging it would only make sense for it to stay there for one cycle. And then either get removed if no userspace materializes or lose staging tag if it does. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
Il 17/07/2012 12:49, Michael S. Tsirkin ha scritto: >> Ok, that would make more sense. One difference between vhost-blk and >> vhost-net is that for vhost-blk there are also management actions that >> would trigger the switch, for example a live snapshot. >> So a prerequisite for vhost-blk would be that it is possible to disable >> it on the fly while the VM is running, as soon as all in-flight I/O is >> completed. > > It applies for vhost-net too. For example if you bring link down, > we switch to userspace. So vhost-net supports this switch on the fly. Cool. >> (Note that, however, this is not possible for vhost-scsi, because it >> really exposes different hardware to the guest. It must not happen that >> a kernel upgrade or downgrade toggles between userspace SCSI and >> vhost-scsi, for example). > > I would say this is not a prerequisite for merging in qemu. > It might be a required feature for production but it > is also solvable at the management level. I'm thinking of the level interrupts here. You cannot make a change in the guest, and have it do completely unrelated changes the hardware that the guest sees. having to support the API; having to handle transition from one more thing when something better comes out. >>> >>> Well this is true for any code. If the limited featureset which >>> vhost-blk can accelerate is something many people use, then accelerating >>> by 5-15% might outweight support costs. >> >> It is definitely what people use if they are interested in performance. > > In that case it seems to me we should stop using the feature set as > an argument and focus on whether the extra code is worth the 5-15% gain. > No one seems to have commented on that so everyone on list thinks that > aspect is OK? I would like to see a breakdown of _where_ the 5-15% lies, something like http://www.linux-kvm.org/page/Virtio/Block/Latency. > Kernel merge windows is coming up and I would like to see whether > any of vhost-blk / vhost-scsi is going to be actually used by userspace. > I guess we could tag it for staging but would be nice to avoid that. Staging would be fine by me for both vhost-blk and vhost-scsi. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 12:14:33PM +0200, Paolo Bonzini wrote: > Il 17/07/2012 11:45, Michael S. Tsirkin ha scritto: > >> So it begs the question, is it going to be used in production, or just a > >> useful reference tool? > > > > Sticking to raw already makes virtio-blk faster, doesn't it? > > In that vhost-blk looks to me like just another optimization option. > > Ideally I think user just should not care where do we handle virtio: > > in-kernel or in userspace. One can imagine it being enabled/disabled > > automatically if none of the features unsupported by it are used. > > Ok, that would make more sense. One difference between vhost-blk and > vhost-net is that for vhost-blk there are also management actions that > would trigger the switch, for example a live snapshot. > So a prerequisite for vhost-blk would be that it is possible to disable > it on the fly while the VM is running, as soon as all in-flight I/O is > completed. It applies for vhost-net too. For example if you bring link down, we switch to userspace. So vhost-net supports this switch on the fly. > (Note that, however, this is not possible for vhost-scsi, > because it > really exposes different hardware to the guest. It must not happen that > a kernel upgrade or downgrade toggles between userspace SCSI and > vhost-scsi, for example). I would say this is not a prerequisite for merging in qemu. It might be a required feature for production but it is also solvable at the management level. Imagine an "enable-live-snapshots" flag in libvirt, on by default. Can only be changed while guest is down. If you turn it off, you get a bit more speed since vhost-blk/vhost-scsi gets enabled. Also pls note that a backend can support live snapshots. If it does libvirt thinkably could detect that and enable vhost-scsi even with enable-live-snapshots on. > >> having to > >> support the API; having to handle transition from one more thing when > >> something better comes out. > > > > Well this is true for any code. If the limited featureset which > > vhost-blk can accelerate is something many people use, then accelerating > > by 5-15% might outweight support costs. > > It is definitely what people use if they are interested in performance. > > Paolo In that case it seems to me we should stop using the featureset as an argument and focus on whether the extra code is worth the 5-15% gain. No one seems to have commented on that so everyone on list thinks that aspect is OK? Any explicit ACKs? Kernel merge windows is coming up and I would like to see whether any of vhost-blk / vhost-scsi is going to be actually used by userspace. I guess we could tag it for staging but would be nice to avoid that. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
Il 17/07/2012 11:45, Michael S. Tsirkin ha scritto: >> So it begs the question, is it going to be used in production, or just a >> useful reference tool? > > Sticking to raw already makes virtio-blk faster, doesn't it? > In that vhost-blk looks to me like just another optimization option. > Ideally I think user just should not care where do we handle virtio: > in-kernel or in userspace. One can imagine it being enabled/disabled > automatically if none of the features unsupported by it are used. Ok, that would make more sense. One difference between vhost-blk and vhost-net is that for vhost-blk there are also management actions that would trigger the switch, for example a live snapshot. So a prerequisite for vhost-blk would be that it is possible to disable it on the fly while the VM is running, as soon as all in-flight I/O is completed. (Note that, however, this is not possible for vhost-scsi, because it really exposes different hardware to the guest. It must not happen that a kernel upgrade or downgrade toggles between userspace SCSI and vhost-scsi, for example). >> having to >> support the API; having to handle transition from one more thing when >> something better comes out. > > Well this is true for any code. If the limited featureset which > vhost-blk can accelerate is something many people use, then accelerating > by 5-15% might outweight support costs. It is definitely what people use if they are interested in performance. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 11:32:45AM +0200, Paolo Bonzini wrote: > Il 17/07/2012 11:21, Asias He ha scritto: > >> It depends. Like vhost-scsi, vhost-blk has the problem of a crippled > >> feature set: no support for block device formats, non-raw protocols, > >> etc. This makes it different from vhost-net. > > > > Data-plane qemu also has this cripppled feature set problem, no? > > Yes, but that is just a proof of concept. We can implement a separate > I/O thread within the QEMU block layer, and add fast paths that resemble > data-path QEMU, without limiting the feature set. > > > Does user always choose to use block devices format like qcow2? What > > if they prefer raw image or raw block device? > > If they do, the code should hit fast paths and be fast. But it should > be automatic, without the need for extra knobs. aio=thread vs. > aio=native is already one knob too much IMHO. Well one extra knob at qemu level is harmless IMO since the complexity can be handled by libvirt. For vhost-net libvirt already enables vhost automatically dependeing on backend used and I imagine a similar thing can happen here. > >> So it begs the question, is it going to be used in production, or just a > >> useful reference tool? > > > > This should be decided by user, I can not speak for them. What is wrong > > with adding one option for user which they can decide? > > Having to explain the user about the relative benefits; This can just be done automatically by libvirt. > having to > support the API; having to handle transition from one more thing when > something better comes out. > > Paolo Well this is true for any code. If the limited featureset which vhost-blk can accelerate is something many people use, then accelerating by 5-15% might outweight support costs. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Tue, Jul 17, 2012 at 10:52:10AM +0200, Paolo Bonzini wrote: > Il 17/07/2012 10:29, Asias He ha scritto: > > So, vhost-blk at least saves ~6 syscalls for us in each request. > > Are they really 6? If I/O is coalesced by a factor of 3, for example > (i.e. each exit processes 3 requests), it's really 2 syscalls per request. > > Also, is there anything we can improve? Perhaps we can modify epoll and > ask it to clear the eventfd for us (would save 2 reads)? Or > io_getevents (would save 1)? > > > I guess you mean qemu here. Yes, in theory, qemu's block layer can be > > improved to achieve similar performance as vhost-blk or kvm tool's > > userspace virito-blk has. But I think it makes no sense to prevent one > > solution becase there is another in theory solution called: we can do > > similar in qemu. > > It depends. Like vhost-scsi, vhost-blk has the problem of a crippled > feature set: no support for block device formats, non-raw protocols, > etc. This makes it different from vhost-net. Well vhost-net is also more limited than virtio-net: no support for userspace networking, no support for level interrupts, no support for legacy qemu vlans, can not trace datapath in userspace, only virtio is supported. None of these is fundamental but this is how our implementation currently behaves so from user's point of view that's how it is. There are also fundamental limitations - e.g. it's linux only, a special module needs to be loaded and user needs to get an fd to the char device ... The way we addressed it, is by making it seamless for the user: basically if your setup matches what vhost-net can accelerate, it gets enabled, if not - userspace is used. Most of the logic is in libvirt. > So it begs the question, is it going to be used in production, or just a > useful reference tool? > > Paolo Sticking to raw already makes virtio-blk faster, doesn't it? In that vhost-blk looks to me like just another optimization option. Ideally I think user just should not care where do we handle virtio: in-kernel or in userspace. One can imagine it being enabled/disabled automatically if none of the features unsupported by it are used. For example currently you specify vhost=on for tap backend and then if you try to setup an unsupported by it like level interrupts, it gets disabled and userspace virtio is used. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
Il 17/07/2012 11:21, Asias He ha scritto: >> It depends. Like vhost-scsi, vhost-blk has the problem of a crippled >> feature set: no support for block device formats, non-raw protocols, >> etc. This makes it different from vhost-net. > > Data-plane qemu also has this cripppled feature set problem, no? Yes, but that is just a proof of concept. We can implement a separate I/O thread within the QEMU block layer, and add fast paths that resemble data-path QEMU, without limiting the feature set. > Does user always choose to use block devices format like qcow2? What > if they prefer raw image or raw block device? If they do, the code should hit fast paths and be fast. But it should be automatic, without the need for extra knobs. aio=thread vs. aio=native is already one knob too much IMHO. >> So it begs the question, is it going to be used in production, or just a >> useful reference tool? > > This should be decided by user, I can not speak for them. What is wrong > with adding one option for user which they can decide? Having to explain the user about the relative benefits; having to support the API; having to handle transition from one more thing when something better comes out. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On 07/17/2012 04:52 PM, Paolo Bonzini wrote: Il 17/07/2012 10:29, Asias He ha scritto: So, vhost-blk at least saves ~6 syscalls for us in each request. Are they really 6? If I/O is coalesced by a factor of 3, for example (i.e. each exit processes 3 requests), it's really 2 syscalls per request. Well. I am counting the number of syscalls in one notify and response process. Sure the IO can be coalesced. Also, is there anything we can improve? Perhaps we can modify epoll and ask it to clear the eventfd for us (would save 2 reads)? Or io_getevents (would save 1)? I guess you mean qemu here. Yes, in theory, qemu's block layer can be improved to achieve similar performance as vhost-blk or kvm tool's userspace virito-blk has. But I think it makes no sense to prevent one solution becase there is another in theory solution called: we can do similar in qemu. It depends. Like vhost-scsi, vhost-blk has the problem of a crippled feature set: no support for block device formats, non-raw protocols, etc. This makes it different from vhost-net. Data-plane qemu also has this cripppled feature set problem, no? Does user always choose to use block devices format like qcow2? What if they prefer raw image or raw block device? So it begs the question, is it going to be used in production, or just a useful reference tool? This should be decided by user, I can not speak for them. What is wrong with adding one option for user which they can decide? -- Asias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
Il 17/07/2012 10:29, Asias He ha scritto: > So, vhost-blk at least saves ~6 syscalls for us in each request. Are they really 6? If I/O is coalesced by a factor of 3, for example (i.e. each exit processes 3 requests), it's really 2 syscalls per request. Also, is there anything we can improve? Perhaps we can modify epoll and ask it to clear the eventfd for us (would save 2 reads)? Or io_getevents (would save 1)? > I guess you mean qemu here. Yes, in theory, qemu's block layer can be > improved to achieve similar performance as vhost-blk or kvm tool's > userspace virito-blk has. But I think it makes no sense to prevent one > solution becase there is another in theory solution called: we can do > similar in qemu. It depends. Like vhost-scsi, vhost-blk has the problem of a crippled feature set: no support for block device formats, non-raw protocols, etc. This makes it different from vhost-net. So it begs the question, is it going to be used in production, or just a useful reference tool? Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On 07/16/2012 07:58 PM, Stefan Hajnoczi wrote: On Thu, Jul 12, 2012 at 4:35 PM, Asias He wrote: This patchset adds vhost-blk support. vhost-blk is a in kernel virito-blk device accelerator. Compared to userspace virtio-blk implementation, vhost-blk gives about 5% to 15% performance improvement. Why is it 5-15% faster? vhost-blk and the userspace virtio-blk you benchmarked should be doing basically the same thing: 1. An eventfd file descriptor is signalled when the vring has new requests available from the guest. 2. A thread wakes up and processes the virtqueue. 3. Linux AIO is used to issue host I/O. 4. An interrupt is injected into the guest. Yes. This is how both of them work. Though, there are some differences in details. e.g. In vhost-blk, we use the vhost's work infrastructure to handle the requests. In kvm tool, we use a dedicated thread. In vhost-blk, we use irqfd to inject interrupts. In kvm tool, we use ioctl to inject interrupts. Does the vhost-blk implementation do anything fundamentally different from userspace? Where is the overhead that userspace virtio-blk has? Currently, no. But we could play with bio directly in vhost-blk as Christoph suggested which could make the IO path from guest to host's real storage even shorter in vhost-blk. I've been trying my best to reduce the overhead of virtio-blk at kvm tool side. I do not see any significant overhead out there. Compared to vhost-blk, the overhead we have in userspace virito-blk is syscalls. In each IO request, we have epoll_wait() & read(): wait for the eventfd which guest notifies us io_submit(): submit the aio read(): read the aio complete eventfd io_getevents(): reap the aio complete result ioctl(): trigger the interrupt So, vhost-blk at least saves ~6 syscalls for us in each request. I'm asking because it would be beneficial to fix the overhead (especially it that could speed up all userspace applications) instead of adding a special-purpose kernel module to work around the overhead. I guess you mean qemu here. Yes, in theory, qemu's block layer can be improved to achieve similar performance as vhost-blk or kvm tool's userspace virito-blk has. But I think it makes no sense to prevent one solution becase there is another in theory solution called: we can do similar in qemu. What do you mean by specail-purpose here, we need general-purpose kernel module? Is vhost-net a special purpose kernel module? Is xen-blkback a special-purpose kernel module? And I think vhost-blk is beneficial to qemu too, as well as any other kvm host side implementation. -- Asias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
On Thu, Jul 12, 2012 at 4:35 PM, Asias He wrote: > This patchset adds vhost-blk support. vhost-blk is a in kernel virito-blk > device accelerator. Compared to userspace virtio-blk implementation, vhost-blk > gives about 5% to 15% performance improvement. Why is it 5-15% faster? vhost-blk and the userspace virtio-blk you benchmarked should be doing basically the same thing: 1. An eventfd file descriptor is signalled when the vring has new requests available from the guest. 2. A thread wakes up and processes the virtqueue. 3. Linux AIO is used to issue host I/O. 4. An interrupt is injected into the guest. Does the vhost-blk implementation do anything fundamentally different from userspace? Where is the overhead that userspace virtio-blk has? I'm asking because it would be beneficial to fix the overhead (especially it that could speed up all userspace applications) instead of adding a special-purpose kernel module to work around the overhead. Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
Hello Jeff, On 07/13/2012 12:06 AM, Jeff Moyer wrote: Asias He writes: Hi folks, This patchset adds vhost-blk support. vhost-blk is a in kernel virito-blk device accelerator. Compared to userspace virtio-blk implementation, vhost-blk gives about 5% to 15% performance improvement. Asias He (5): aio: Export symbols and struct kiocb_batch for in kernel aio usage eventfd: Export symbol eventfd_file_create() vhost: Make vhost a separate module vhost-net: Use VHOST_NET_FEATURES for vhost-net vhost-blk: Add vhost-blk support I only saw patches 0 and 1. Where are the other 4? If the answer is, "not on lkml," then please resend them, CC'ing lkml. I did send all the 0-5 patches to lkml, but I somehow messed up the thread. Will CC you next time. I'd like to be able to see the usage of the aio routines. OK. It'd be nice if you could review. Thanks. -- Asias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add vhost-blk support
Asias He writes: > Hi folks, > > This patchset adds vhost-blk support. vhost-blk is a in kernel virito-blk > device accelerator. Compared to userspace virtio-blk implementation, vhost-blk > gives about 5% to 15% performance improvement. > > Asias He (5): > aio: Export symbols and struct kiocb_batch for in kernel aio usage > eventfd: Export symbol eventfd_file_create() > vhost: Make vhost a separate module > vhost-net: Use VHOST_NET_FEATURES for vhost-net > vhost-blk: Add vhost-blk support I only saw patches 0 and 1. Where are the other 4? If the answer is, "not on lkml," then please resend them, CC'ing lkml. I'd like to be able to see the usage of the aio routines. Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/5] Add vhost-blk support
Hi folks, This patchset adds vhost-blk support. vhost-blk is a in kernel virito-blk device accelerator. Compared to userspace virtio-blk implementation, vhost-blk gives about 5% to 15% performance improvement. Asias He (5): aio: Export symbols and struct kiocb_batch for in kernel aio usage eventfd: Export symbol eventfd_file_create() vhost: Make vhost a separate module vhost-net: Use VHOST_NET_FEATURES for vhost-net vhost-blk: Add vhost-blk support drivers/vhost/Kconfig | 20 +- drivers/vhost/Makefile |6 +- drivers/vhost/blk.c| 600 drivers/vhost/net.c|4 +- drivers/vhost/test.c |4 +- drivers/vhost/vhost.c | 48 drivers/vhost/vhost.h | 18 +- fs/aio.c | 37 ++- fs/eventfd.c |1 + include/linux/aio.h| 21 ++ include/linux/vhost.h |1 + 11 files changed, 729 insertions(+), 31 deletions(-) create mode 100644 drivers/vhost/blk.c -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/