Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Wed, Aug 19, 2009 at 01:36:14AM -0400, Gregory Haskins wrote: So where is the problem here? If virtio net in guest could be improved instead, everyone would benefit. So if I whip up a virtio-net backend for vbus with a PCI compliant connector, you are happy? I'm currently worried about venet versus virtio-net guest situation, if you drop it and switch to virtio net instead that issue's resolved. I don't have an opinion on vbus versus pci, and I only speak for myself. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Wed, Aug 19, 2009 at 11:37:16PM +0300, Avi Kivity wrote: On 08/19/2009 09:26 PM, Gregory Haskins wrote: This is for things like the setup of queue-pairs, and the transport of door-bells, and ib-verbs. I am not on the team doing that work, so I am not an expert in this area. What I do know is having a flexible and low-latency signal-path was deemed a key requirement. That's not a full bypass, then. AFAIK kernel bypass has userspace talking directly to the device. Like I said, I am not an expert on the details here. I only work on the vbus plumbing. FWIW, the work is derivative from the Xen-IB project http://www.openib.org/archives/nov2006sc/xen-ib-presentation.pdf There were issues with getting Xen-IB to map well into the Xen model. Vbus was specifically designed to address some of those short-comings. Well I'm not an Infiniband expert. But from what I understand VMM bypass means avoiding the call to the VMM entirely by exposing hardware registers directly to the guest. The original IB VMM bypass work predates SR-IOV (i.e., does not assume that the adapter has multiple hardware register windows for multiple devices). The way it worked was to split all device operations into `privileged' and `non-privileged'. Privileged operations such as mapping and pinning memory went through the hypervisor. Non-privileged operations such reading or writing previously mapped memory went directly to the adpater. Now-days with SR-IOV devices, VMM bypass usually means bypassing the hypervisor completely. Cheers, Muli -- Muli Ben-Yehuda | m...@il.ibm.com | +972-4-8281080 Manager, Virtualization and Systems Architecture Master Inventor, IBM Haifa Research Laboratory -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Avi Kivity wrote: On 08/18/2009 05:46 PM, Gregory Haskins wrote: Can you explain how vbus achieves RDMA? I also don't see the connection to real time guests. Both of these are still in development. Trying to stay true to the release early and often mantra, the core vbus technology is being pushed now so it can be reviewed. Stay tuned for these other developments. Hopefully you can outline how it works. AFAICT, RDMA and kernel bypass will need device assignment. If you're bypassing the call into the host kernel, it doesn't really matter how that call is made, does it? This is for things like the setup of queue-pairs, and the transport of door-bells, and ib-verbs. I am not on the team doing that work, so I am not an expert in this area. What I do know is having a flexible and low-latency signal-path was deemed a key requirement. For real-time, a big part of it is relaying the guest scheduler state to the host, but in a smart way. For instance, the cpu priority for each vcpu is in a shared-table. When the priority is raised, we can simply update the table without taking a VMEXIT. When it is lowered, we need to inform the host of the change in case the underlying task needs to reschedule. This is where the really fast call() type mechanism is important. Its also about having the priority flow-end to end, and having the vcpu interrupt state affect the task-priority, etc (e.g. pending interrupts affect the vcpu task prio). etc, etc. I can go on and on (as you know ;), but will wait till this work is more concrete and proven. I also designed it in such a way that we could, in theory, write one set of (linux-based) backends, and have them work across a variety of environments (such as containers/VMs like KVM, lguest, openvz, but also physical systems like blade enclosures and clusters, or even applications running on the host). Sorry, I'm still confused. Why would openvz need vbus? Its just an example. The point is that I abstracted what I think are the key points of fast-io, memory routing, signal routing, etc, so that it will work in a variety of (ideally, _any_) environments. There may not be _performance_ motivations for certain classes of VMs because they already have decent support, but they may want a connector anyway to gain some of the new features available in vbus. And looking forward, the idea is that we have commoditized the backend so we don't need to redo this each time a new container comes along. I'll wait until a concrete example shows up as I still don't understand. Ok. One point of contention is that this is all managementy stuff and should be kept out of the host kernel. Exposing shared memory, interrupts, and guest hypercalls can all be easily done from userspace (as virtio demonstrates). True, some devices need kernel acceleration, but that's no reason to put everything into the host kernel. See my last reply to Anthony. My two points here are that: a) having it in-kernel makes it a complete subsystem, which perhaps has diminished value in kvm, but adds value in most other places that we are looking to use vbus. It's not a complete system unless you want users to administer VMs using echo and cat and configfs. Some userspace support will always be necessary. Well, more specifically, it doesn't require a userspace app to hang around. For instance, you can set up your devices with udev scripts, or whatever. But that is kind of a silly argument, since the kernel always needs userspace around to give it something interesting, right? ;) Basically, what it comes down to is both vbus and vhost need configuration/management. Vbus does it with sysfs/configfs, and vhost does it with ioctls. I ultimately decided to go with sysfs/configfs because, at least that the time I looked, it seemed like the blessed way to do user-kernel interfaces. b) the in-kernel code is being overstated as complex. We are not talking about your typical virt thing, like an emulated ICH/PCI chipset. Its really a simple list of devices with a handful of attributes. They are managed using established linux interfaces, like sysfs/configfs. They need to be connected to the real world somehow. What about security? can any user create a container and devices and link them to real interfaces? If not, do you need to run the VM as root? Today it has to be root as a result of weak mode support in configfs, so you have me there. I am looking for help patching this limitation, though. Also, venet-tap uses a bridge, which of course is not as slick as a raw-socket w.r.t. perms. virtio and vhost-net solve these issues. Does vbus? The code may be simple to you. But the question is whether it's necessary, not whether it's simple or complex. Exposing devices as PCI is an important issue for me, as I have to consider non-Linux guests. Thats your prerogative, but
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 8/19/2009 at 1:48 AM, in message 4a8b9241.20...@redhat.com, Avi Kivity a...@redhat.com wrote: On 08/19/2009 08:36 AM, Gregory Haskins wrote: If virtio net in guest could be improved instead, everyone would benefit. So if I whip up a virtio-net backend for vbus with a PCI compliant connector, you are happy? This doesn't improve virtio-net in any way. Any why not? (Did you notice I said PCI compliant, i.e. over virtio-pci) I am doing this, and I wish more people would join. Instead, you change ABI in a incompatible way. Only by choice of my particular connector. The ABI is a function of the connector design. So one such model is to terminate the connector in qemu, and surface the resulting objects as PCI devices. I choose not to use this particular design for my connector that I am pushing upstream because I am of the opinion that I can do better by terminating it in the guest directly as a PV optimized bus. However, both connectors can theoretically coexist peacefully. virtio already supports this model; see lguest and s390. Transporting virtio over vbus and vbus over something else doesn't gain anything over directly transporting virtio over that something else. This is not what I am advocating. Kind Regards, -Greg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/19/2009 09:28 AM, Gregory Haskins wrote: Avi Kivity wrote: On 08/18/2009 05:46 PM, Gregory Haskins wrote: Can you explain how vbus achieves RDMA? I also don't see the connection to real time guests. Both of these are still in development. Trying to stay true to the release early and often mantra, the core vbus technology is being pushed now so it can be reviewed. Stay tuned for these other developments. Hopefully you can outline how it works. AFAICT, RDMA and kernel bypass will need device assignment. If you're bypassing the call into the host kernel, it doesn't really matter how that call is made, does it? This is for things like the setup of queue-pairs, and the transport of door-bells, and ib-verbs. I am not on the team doing that work, so I am not an expert in this area. What I do know is having a flexible and low-latency signal-path was deemed a key requirement. That's not a full bypass, then. AFAIK kernel bypass has userspace talking directly to the device. Given that both virtio and vbus can use ioeventfds, I don't see how one can perform better than the other. For real-time, a big part of it is relaying the guest scheduler state to the host, but in a smart way. For instance, the cpu priority for each vcpu is in a shared-table. When the priority is raised, we can simply update the table without taking a VMEXIT. When it is lowered, we need to inform the host of the change in case the underlying task needs to reschedule. This is best done using cr8/tpr so you don't have to exit at all. See also my vtpr support for Windows which does this in software, generally avoiding the exit even when lowering priority. This is where the really fast call() type mechanism is important. Its also about having the priority flow-end to end, and having the vcpu interrupt state affect the task-priority, etc (e.g. pending interrupts affect the vcpu task prio). etc, etc. I can go on and on (as you know ;), but will wait till this work is more concrete and proven. Generally cpu state shouldn't flow through a device but rather through MSRs, hypercalls, and cpu registers. Basically, what it comes down to is both vbus and vhost need configuration/management. Vbus does it with sysfs/configfs, and vhost does it with ioctls. I ultimately decided to go with sysfs/configfs because, at least that the time I looked, it seemed like the blessed way to do user-kernel interfaces. I really dislike that trend but that's an unrelated discussion. They need to be connected to the real world somehow. What about security? can any user create a container and devices and link them to real interfaces? If not, do you need to run the VM as root? Today it has to be root as a result of weak mode support in configfs, so you have me there. I am looking for help patching this limitation, though. Well, do you plan to address this before submission for inclusion? I hope everyone agrees that it's an important issue for me and that I have to consider non-Linux guests. I also hope that you're considering non-Linux guests since they have considerable market share. I didn't mean non-Linux guests are not important. I was disagreeing with your assertion that it only works if its PCI. There are numerous examples of IHV/ISV bridge implementations deployed in Windows, no? I don't know. If vbus is exposed as a PCI-BRIDGE, how is this different? Technically it would work, but given you're not interested in Windows, who would write a driver? Given I'm not the gateway to inclusion of vbus/venet, you don't need to ask me anything. I'm still free to give my opinion. Agreed, and I didn't mean to suggest otherwise. It not clear if you are wearing the kvm maintainer hat, or the lkml community member hat at times, so its important to make that distinction. Otherwise, its not clear if this is edict as my superior, or input as my peer. ;) When I wear a hat, it is a Red Hat. However I am bareheaded most often. (that is, look at the contents of my message, not who wrote it or his role). With virtio, the number is 1 (or less if you amortize). Set up the ring entries and kick. Again, I am just talking about basic PCI here, not the things we build on top. Whatever that means, it isn't interesting. Performance is measure for the whole stack. The point is: the things we build on top have costs associated with them, and I aim to minimize it. For instance, to do a call() kind of interface, you generally need to pre-setup some per-cpu mappings so that you can just do a single iowrite32() to kick the call off. Those per-cpu mappings have a cost if you want them to be high-performance, so my argument is that you ideally want to limit the number of times you have to do this. My current design reduces this to once. Do you mean minimizing the setup cost? Seriously? There's no such
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/19/2009 09:40 AM, Gregory Haskins wrote: So if I whip up a virtio-net backend for vbus with a PCI compliant connector, you are happy? This doesn't improve virtio-net in any way. Any why not? (Did you notice I said PCI compliant, i.e. over virtio-pci) Because virtio-net will have gained nothing that it didn't have before. virtio already supports this model; see lguest and s390. Transporting virtio over vbus and vbus over something else doesn't gain anything over directly transporting virtio over that something else. This is not what I am advocating. What are you advocating? As far as I can tell your virtio-vbus connector plus the vbus-kvm connector is just that. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 8/19/2009 at 3:13 AM, in message 4a8ba635.9010...@redhat.com, Avi Kivity a...@redhat.com wrote: On 08/19/2009 09:40 AM, Gregory Haskins wrote: So if I whip up a virtio-net backend for vbus with a PCI compliant connector, you are happy? This doesn't improve virtio-net in any way. Any why not? (Did you notice I said PCI compliant, i.e. over virtio-pci) Because virtio-net will have gained nothing that it didn't have before. ?? *) ABI is virtio-pci compatible, as you like *) fast-path is in-kernel, as we all like *) model is in vbus so it would work in all environments that vbus supports. virtio already supports this model; see lguest and s390. Transporting virtio over vbus and vbus over something else doesn't gain anything over directly transporting virtio over that something else. This is not what I am advocating. What are you advocating? As far as I can tell your virtio-vbus connector plus the vbus-kvm connector is just that. I wouldn't classify it anything like that, no. Its just virtio over vbus. -Greg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/19/2009 02:40 PM, Gregory Haskins wrote: So if I whip up a virtio-net backend for vbus with a PCI compliant connector, you are happy? This doesn't improve virtio-net in any way. Any why not? (Did you notice I said PCI compliant, i.e. over virtio-pci) Because virtio-net will have gained nothing that it didn't have before. ?? *) ABI is virtio-pci compatible, as you like That's not a gain, that's staying in the same place. *) fast-path is in-kernel, as we all like That's not a gain as we have vhost-net (sure, in development, but your proposed backend isn't even there yet). *) model is in vbus so it would work in all environments that vbus supports. The ABI can be virtio-pci compatible or it can be vbus-comaptible. How can it be both? The ABIs are different. Note that if you had submitted a virtio-net backend I'd have asked you to strip away all the management / bus layers and we'd have ended up with vhost-net. virtio already supports this model; see lguest and s390. Transporting virtio over vbus and vbus over something else doesn't gain anything over directly transporting virtio over that something else. This is not what I am advocating. What are you advocating? As far as I can tell your virtio-vbus connector plus the vbus-kvm connector is just that. I wouldn't classify it anything like that, no. Its just virtio over vbus. We're in a loop. Doesn't virtio over vbus need a virtio-vbus connector? and doesn't vbus need a connector to talk to the hypervisor? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Avi Kivity wrote: On 08/19/2009 02:40 PM, Gregory Haskins wrote: So if I whip up a virtio-net backend for vbus with a PCI compliant connector, you are happy? This doesn't improve virtio-net in any way. Any why not? (Did you notice I said PCI compliant, i.e. over virtio-pci) Because virtio-net will have gained nothing that it didn't have before. ?? *) ABI is virtio-pci compatible, as you like That's not a gain, that's staying in the same place. *) fast-path is in-kernel, as we all like That's not a gain as we have vhost-net (sure, in development, but your proposed backend isn't even there yet). *) model is in vbus so it would work in all environments that vbus supports. The ABI can be virtio-pci compatible or it can be vbus-comaptible. How can it be both? The ABIs are different. Note that if you had submitted a virtio-net backend I'd have asked you to strip away all the management / bus layers and we'd have ended up with vhost-net. Sigh... virtio already supports this model; see lguest and s390. Transporting virtio over vbus and vbus over something else doesn't gain anything over directly transporting virtio over that something else. This is not what I am advocating. What are you advocating? As far as I can tell your virtio-vbus connector plus the vbus-kvm connector is just that. I wouldn't classify it anything like that, no. Its just virtio over vbus. We're in a loop. Doesn't virtio over vbus need a virtio-vbus connector? and doesn't vbus need a connector to talk to the hypervisor? No, it doesnt work like that. There is only one connector. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Avi Kivity wrote: On 08/19/2009 07:27 AM, Gregory Haskins wrote: This thread started because i asked you about your technical arguments why we'd want vbus instead of virtio. (You mean vbus vs pci, right? virtio works fine, is untouched, and is out-of-scope here) I guess he meant venet vs virtio-net. Without venet vbus is currently userless. Right, and I do believe I answered your questions. Do you feel as though this was not a satisfactory response? Others and I have shown you its wrong. No, you have shown me that you disagree. I'm sorry, but do not assume they are the same. Case in point: You also said that threading the ethernet model was wrong when I proposed it, and later conceded when I showed you the numbers that you were wrong. I don't say this to be a jerk. I am wrong myself all the time too. I only say it to highlight that perhaps we just don't (yet) see each others POV. Therefore, do not be so quick to put a wrong label on something, especially when the line of questioning/debate indicates to me that there are still fundamental issues in understanding exactly how things work. There's no inherent performance problem in pci. The vbus approach has inherent problems (the biggest of which is compatibility Trying to be backwards compatible in all dimensions is not a design goal, as already stated. , the second managability). Where are the management problems? Your answer above now basically boils down to: because I want it so, why dont you leave me alone. Well, with all due respect, please do not put words in my mouth. This is not what I am saying at all. What I *am* saying is: fact: this thread is about linux guest drivers to support vbus fact: these drivers do not touch kvm code. fact: these drivers to not force kvm to alter its operation in any way. fact: these drivers do not alter ABIs that KVM currently supports. Therefore, all this talk about abandoning, supporting, and changing things in KVM is, premature, irrelevant, and/or, FUD. No one proposed such changes, so I am highlighting this fact to bring the thread back on topic. That KVM talk is merely a distraction at this point in time. s/kvm/kvm stack/. virtio/pci is part of the kvm stack, even if it is not part of kvm itself. If vbus/venet were to be merged, users and developers would have to choose one or the other. That's the fragmentation I'm worried about. And you can prefix that with fact: as well. Noted We all love faster code and better management interfaces and tons of your prior patches got accepted by Avi. This time you didnt even _try_ to improve virtio. Im sorry, but you are mistaken: http://lkml.indiana.edu/hypermail/linux/kernel/0904.2/02443.html That does nothing to improve virtio. I'm sorry, but thats just plain false. Existing guests (Linux and Windows) which support virtio will cease to work if the host moves to vbus-virtio. Sigh...please re-read fact section. And even if this work is accepted upstream as it is, how you configure the host and guest is just that: a configuration. If your guest and host both speak vbus, use it. If they don't, don't use it. Simple as that. Saying anything else is just more FUD, and I can say the same thing about a variety of other configuration options currently available. Existing hosts (running virtio-pci) won't be able to talk to newer guests running virtio-vbus. The patch doesn't improve performance without the entire vbus stack in the host kernel and a vbus-virtio-net-host host kernel driver. rewind years=2Existing hosts (running realtek emulation) won't be able to talk to newer guests running virtio-net. Virtio-net doesn't do anything to improve realtek emulation without the entire virtio stack in the host./rewind You gotta start somewhere. You're argument buys you nothing other than backwards compat, which I've already stated is not a specific goal here. I am not against modprobe vbus-pcibridge, and I am sure there are users out that that do not object to this either. Perhaps if you posted everything needed to make vbus-virtio work and perform we could compare that to vhost-net and you'll see another reason why vhost-net is the better approach. Yet, you must recognize that an alternative outcome is that we can look at issues outside of virtio-net on KVM and perhaps you will see vbus is a better approach. You are also wrong to say that I didn't try to avoid creating a downstream effort first. I believe the public record of the mailing lists will back me up that I tried politely pushing this directly though kvm first. It was only after Avi recently informed me that they would be building their own version of an in-kernel backend in lieu of working with me to adapt vbus to their needs that I decided to put my own project together. There's no way we can adapt vbus to our needs. Really? Did you ever bother to ask
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Hi Nicholas Nicholas A. Bellinger wrote: On Wed, 2009-08-19 at 10:11 +0300, Avi Kivity wrote: On 08/19/2009 09:28 AM, Gregory Haskins wrote: Avi Kivity wrote: SNIP Basically, what it comes down to is both vbus and vhost need configuration/management. Vbus does it with sysfs/configfs, and vhost does it with ioctls. I ultimately decided to go with sysfs/configfs because, at least that the time I looked, it seemed like the blessed way to do user-kernel interfaces. I really dislike that trend but that's an unrelated discussion. They need to be connected to the real world somehow. What about security? can any user create a container and devices and link them to real interfaces? If not, do you need to run the VM as root? Today it has to be root as a result of weak mode support in configfs, so you have me there. I am looking for help patching this limitation, though. Well, do you plan to address this before submission for inclusion? Greetings Avi and Co, I have been following this thread, and although I cannot say that I am intimately fimilar with all of the virtualization considerations involved to really add anything use to that side of the discussion, I think you guys are doing a good job of explaining the technical issues for the non virtualization wizards following this thread. :-) Anyways, I was wondering if you might be interesting in sharing your concerns wrt to configfs (conigfs maintainer CC'ed), at some point..? So for those tuning in, the reference here is the use of configfs for the management of this component of AlacrityVM, called virtual-bus http://developer.novell.com/wiki/index.php/Virtual-bus As you may recall, I have been using configfs extensively for the 3.x generic target core infrastructure and iSCSI fabric modules living in lio-core-2.6.git/drivers/target/target_core_configfs.c and lio-core-2.6.git/drivers/lio-core/iscsi_target_config.c, and have found it to be extraordinarly useful for the purposes of a implementing a complex kernel level target mode stack that is expected to manage massive amounts of metadata, allow for real-time configuration, share data structures (eg: SCSI Target Ports) between other kernel fabric modules and manage the entire set of fabrics using only intrepetered userspace code. I concur. Configfs provided me a very natural model to express resource-containers and their respective virtual-device objects. Using the 1 1:1 mapped TCM Virtual HBA+FILEIO LUNs - iSCSI Target Endpoints inside of a KVM Guest (from the results in May posted with IOMMU aware 10 Gb on modern Nahelem hardware, see http://linux-iscsi.org/index.php/KVM-LIO-Target), we have been able to dump the entire running target fabric configfs hierarchy to a single struct file on a KVM Guest root device using python code on the order of ~30 seconds for those 1 active iSCSI endpoints. In configfs terms, this means: *) 7 configfs groups (directories), ~50 configfs attributes (files) per Virtual HBA+FILEIO LUN *) 15 configfs groups (directories), ~60 configfs attributes (files per iSCSI fabric Endpoint Which comes out to a total of ~22 groups and ~110 attributes active configfs objects living in the configfs_dir_cache that are being dumped inside of the single KVM guest instances, including symlinks between the fabric modules to establish the SCSI ports containing complete set of SPC-4 and RFC-3720 features, et al. Also on the kernel - user API interaction compatibility side, I have found the 3.x configfs enabled code adventagous over the LIO 2.9 code (that used an ioctl for everything) because it allows us to do backwards compat for future versions without using any userspace C code, which in IMHO makes maintaining userspace packages for complex kernel stacks with massive amounts of metadata + real-time configuration considerations. No longer having ioctl compatibility issues between LIO versions as the structures passed via ioctl change, and being able to do backwards compat with small amounts of interpreted code against configfs layout changes makes maintaining the kernel - user API really have made this that much easier for me. Anyways, I though these might be useful to the discussion as it releates to potental uses of configfs on the KVM Host or other projects that really make sense, and/or to improve the upstream implementation so that other users (like myself) can benefit from improvements to configfs. Many thanks for your most valuable of time, Thank you for the explanation of your setup. Configfs mostly works for the vbus project as is. As Avi pointed out, I currently have a limitation w.r.t. perms. Forgive me if what I am about to say is overly simplistic. Its been quite a few months since I worked on the configfs portion of the code, so my details may be fuzzy. What it boiled down to is I need is a way to better manage perms (and to be able to do it cross
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Wed, 2009-08-19 at 14:39 -0400, Gregory Haskins wrote: Hi Nicholas Nicholas A. Bellinger wrote: On Wed, 2009-08-19 at 10:11 +0300, Avi Kivity wrote: On 08/19/2009 09:28 AM, Gregory Haskins wrote: Avi Kivity wrote: SNIP Basically, what it comes down to is both vbus and vhost need configuration/management. Vbus does it with sysfs/configfs, and vhost does it with ioctls. I ultimately decided to go with sysfs/configfs because, at least that the time I looked, it seemed like the blessed way to do user-kernel interfaces. I really dislike that trend but that's an unrelated discussion. They need to be connected to the real world somehow. What about security? can any user create a container and devices and link them to real interfaces? If not, do you need to run the VM as root? Today it has to be root as a result of weak mode support in configfs, so you have me there. I am looking for help patching this limitation, though. Well, do you plan to address this before submission for inclusion? Greetings Avi and Co, I have been following this thread, and although I cannot say that I am intimately fimilar with all of the virtualization considerations involved to really add anything use to that side of the discussion, I think you guys are doing a good job of explaining the technical issues for the non virtualization wizards following this thread. :-) Anyways, I was wondering if you might be interesting in sharing your concerns wrt to configfs (conigfs maintainer CC'ed), at some point..? So for those tuning in, the reference here is the use of configfs for the management of this component of AlacrityVM, called virtual-bus http://developer.novell.com/wiki/index.php/Virtual-bus As you may recall, I have been using configfs extensively for the 3.x generic target core infrastructure and iSCSI fabric modules living in lio-core-2.6.git/drivers/target/target_core_configfs.c and lio-core-2.6.git/drivers/lio-core/iscsi_target_config.c, and have found it to be extraordinarly useful for the purposes of a implementing a complex kernel level target mode stack that is expected to manage massive amounts of metadata, allow for real-time configuration, share data structures (eg: SCSI Target Ports) between other kernel fabric modules and manage the entire set of fabrics using only intrepetered userspace code. I concur. Configfs provided me a very natural model to express resource-containers and their respective virtual-device objects. Using the 1 1:1 mapped TCM Virtual HBA+FILEIO LUNs - iSCSI Target Endpoints inside of a KVM Guest (from the results in May posted with IOMMU aware 10 Gb on modern Nahelem hardware, see http://linux-iscsi.org/index.php/KVM-LIO-Target), we have been able to dump the entire running target fabric configfs hierarchy to a single struct file on a KVM Guest root device using python code on the order of ~30 seconds for those 1 active iSCSI endpoints. In configfs terms, this means: *) 7 configfs groups (directories), ~50 configfs attributes (files) per Virtual HBA+FILEIO LUN *) 15 configfs groups (directories), ~60 configfs attributes (files per iSCSI fabric Endpoint Which comes out to a total of ~22 groups and ~110 attributes active configfs objects living in the configfs_dir_cache that are being dumped inside of the single KVM guest instances, including symlinks between the fabric modules to establish the SCSI ports containing complete set of SPC-4 and RFC-3720 features, et al. Also on the kernel - user API interaction compatibility side, I have found the 3.x configfs enabled code adventagous over the LIO 2.9 code (that used an ioctl for everything) because it allows us to do backwards compat for future versions without using any userspace C code, which in IMHO makes maintaining userspace packages for complex kernel stacks with massive amounts of metadata + real-time configuration considerations. No longer having ioctl compatibility issues between LIO versions as the structures passed via ioctl change, and being able to do backwards compat with small amounts of interpreted code against configfs layout changes makes maintaining the kernel - user API really have made this that much easier for me. Anyways, I though these might be useful to the discussion as it releates to potental uses of configfs on the KVM Host or other projects that really make sense, and/or to improve the upstream implementation so that other users (like myself) can benefit from improvements to configfs. Many thanks for your most valuable of time, Thank you for the explanation of your setup. Configfs mostly works for the vbus project as is. As Avi pointed out, I currently have a limitation w.r.t. perms. Forgive me if what I am about to say is overly simplistic. Its been quite a few months since I
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
* Avi Kivity a...@redhat.com wrote: IIRC we reuse the PCI IDs for non-PCI. You already know how I feel about this gem. The earth keeps rotating despite the widespread use of PCI IDs. Btw., PCI IDs are a great way to arbitrate interfaces planet-wide, in an OS-neutral, depoliticized and well-established way. It's a bit like CPUID for CPUs, just on a much larger scope. Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 04:08 AM, Anthony Liguori wrote: I believe strongly that we should avoid putting things in the kernel unless they absolutely have to be. I'm definitely interested in playing with vhost to see if there are ways to put even less in the kernel. In particular, I think it would be a big win to avoid knowledge of slots in the kernel by doing ring translation in userspace. This implies a userspace transition in the fast path. This may or may not be acceptable. I think this is going to be a very interesting experiment and will ultimately determine whether my intuition about the cost of dropping to userspace is right or wrong. I believe with a perfectly scaling qemu this should be feasible. Currently qemu is far from scaling perfectly, but inefficient userspace is not a reason to put things into the kernel. Having a translated ring is also a nice solution for migration - userspace can mark the pages dirty while translating the receive ring. Still, in-kernel translation is simple enough that I think we should keep it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/17/2009 10:33 PM, Gregory Haskins wrote: There is a secondary question of venet (a vbus native device) verses virtio-net (a virtio native device that works with PCI or VBUS). If this contention is really around venet vs virtio-net, I may possibly conceed and retract its submission to mainline. I've been pushing it to date because people are using it and I don't see any reason that the driver couldn't be upstream. That's probably the cause of much confusion. The primary kvm pain point is now networking, so in any vbus discussion we're concentrating on that aspect. Also, are you willing to help virtio to become faster? Yes, that is not a problem. Note that virtio in general, and virtio-net/venet in particular are not the primary goal here, however. Improved 802.x and block IO are just positive side-effects of the effort. I started with 802.x networking just to demonstrate the IO layer capabilities, and to test it. It ended up being so good on contrast to existing facilities, that developers in the vbus community started using it for production development. Ultimately, I created vbus to address areas of performance that have not yet been addressed in things like KVM. Areas such as real-time guests, or RDMA (host bypass) interfaces. Can you explain how vbus achieves RDMA? I also don't see the connection to real time guests. I also designed it in such a way that we could, in theory, write one set of (linux-based) backends, and have them work across a variety of environments (such as containers/VMs like KVM, lguest, openvz, but also physical systems like blade enclosures and clusters, or even applications running on the host). Sorry, I'm still confused. Why would openvz need vbus? It already has zero-copy networking since it's a shared kernel. Shared memory should also work seamlessly, you just need to expose the shared memory object on a shared part of the namespace. And of course, anything in the kernel is already shared. Or do you have arguments why that is impossible to do so and why the only possible solution is vbus? Avi says no such arguments were offered so far. Not for lack of trying. I think my points have just been missed everytime I try to describe them. ;) Basically I write a message very similar to this one, and the next conversation starts back from square one. But I digress, let me try again.. Noting that this discussion is really about the layer *below* virtio, not virtio itself (e.g. PCI vs vbus). Lets start with a little background: -- Background -- So on one level, we have the resource-container technology called vbus. It lets you create a container on the host, fill it with virtual devices, and assign that container to some context (such as a KVM guest). These devices are LKMs, and each device has a very simple verb namespace consisting of a synchronous call() method, and a shm() method for establishing async channels. The async channels are just shared-memory with a signal path (e.g. interrupts and hypercalls), which the device+driver can use to overlay things like rings (virtqueues, IOQs), or other shared-memory based constructs of their choosing (such as a shared table). The signal path is designed to minimize enter/exits and reduce spurious signals in a unified way (see shm-signal patch). call() can be used both for config-space like details, as well as fast-path messaging that require synchronous behavior (such as guest scheduler updates). All of this is managed via sysfs/configfs. One point of contention is that this is all managementy stuff and should be kept out of the host kernel. Exposing shared memory, interrupts, and guest hypercalls can all be easily done from userspace (as virtio demonstrates). True, some devices need kernel acceleration, but that's no reason to put everything into the host kernel. On the guest, we have a vbus-proxy which is how the guest gets access to devices assigned to its container. (as an aside, virtio devices can be populated in the container, and then surfaced up to the virtio-bus via that virtio-vbus patch I mentioned). There is a thing called a vbus-connector which is the guest specific part. Its job is to connect the vbus-proxy in the guest, to the vbus container on the host. How it does its job is specific to the connector implementation, but its role is to transport messages between the guest and the host (such as for call() and shm() invocations) and to handle things like discovery and hotswap. virtio has an exact parallel here (virtio-pci and friends). Out of all this, I think the biggest contention point is the design of the vbus-connector that I use in AlacrityVM (Avi, correct me if I am wrong and you object to other aspects as well). I suspect that if I had designed the vbus-connector to surface vbus devices as PCI devices via QEMU, the patches would potentially have been pulled in a while ago. Exposing devices as PCI is an important
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Mon, Aug 17, 2009 at 03:33:30PM -0400, Gregory Haskins wrote: There is a secondary question of venet (a vbus native device) verses virtio-net (a virtio native device that works with PCI or VBUS). If this contention is really around venet vs virtio-net, I may possibly conceed and retract its submission to mainline. For me yes, venet+ioq competing with virtio+virtqueue. I've been pushing it to date because people are using it and I don't see any reason that the driver couldn't be upstream. If virtio is just as fast, they can just use it without knowing it. Clearly, that's better since we support virtio anyway ... -- Issues -- Out of all this, I think the biggest contention point is the design of the vbus-connector that I use in AlacrityVM (Avi, correct me if I am wrong and you object to other aspects as well). I suspect that if I had designed the vbus-connector to surface vbus devices as PCI devices via QEMU, the patches would potentially have been pulled in a while ago. There are, of course, reasons why vbus does *not* render as PCI, so this is the meat of of your question, I believe. At a high level, PCI was designed for software-to-hardware interaction, so it makes assumptions about that relationship that do not necessarily apply to virtualization. I'm not hung up on PCI, myself. An idea that might help you get Avi on-board: do setup in userspace, over PCI. Negotiate hypercall support (e.g. with a PCI capability) and then switch to that for fastpath. Hmm? As another example, the connector design coalesces *all* shm-signals into a single interrupt (by prio) that uses the same context-switch mitigation techniques that help boost things like networking. This effectively means we can detect and optimize out ack/eoi cycles from the APIC as the IO load increases (which is when you need it most). PCI has no such concept. Could you elaborate on this one for me? How does context-switch mitigation work? In addition, the signals and interrupts are priority aware, which is useful for things like 802.1p networking where you may establish 8-tx and 8-rx queues for your virtio-net device. x86 APIC really has no usable equivalent, so PCI is stuck here. By the way, multiqueue support in virtio would be very nice to have, and seems mostly unrelated to vbus. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 12:53 PM, Michael S. Tsirkin wrote: I'm not hung up on PCI, myself. An idea that might help you get Avi on-board: do setup in userspace, over PCI. Negotiate hypercall support (e.g. with a PCI capability) and then switch to that for fastpath. Hmm? Hypercalls don't nest well. When a nested guest issues a hypercall, you have to assume it is destined to the enclosing guest, so you can't assign a hypercall-capable device to a nested guest. mmio and pio don't have this problem since the host can use the address to locate the destination. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 01:09 PM, Michael S. Tsirkin wrote: mmio and pio don't have this problem since the host can use the address to locate the destination. So userspace could map hypercall to address during setup and tell the host kernel? Suppose a nested guest has two devices. One a virtual device backed by its host (our guest), and one a virtual device backed by us (the real host), and assigned by the guest to the nested guest. If both devices use hypercalls, there is no way to distinguish between them. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tue, Aug 18, 2009 at 01:13:57PM +0300, Avi Kivity wrote: On 08/18/2009 01:09 PM, Michael S. Tsirkin wrote: mmio and pio don't have this problem since the host can use the address to locate the destination. So userspace could map hypercall to address during setup and tell the host kernel? Suppose a nested guest has two devices. One a virtual device backed by its host (our guest), and one a virtual device backed by us (the real host), and assigned by the guest to the nested guest. If both devices use hypercalls, there is no way to distinguish between them. Not sure I understand. What I had in mind is that devices would have to either use different hypercalls and map hypercall to address during setup, or pass address with each hypercall. We get the hypercall, translate the address as if it was pio access, and know the destination? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 01:28 PM, Michael S. Tsirkin wrote: Suppose a nested guest has two devices. One a virtual device backed by its host (our guest), and one a virtual device backed by us (the real host), and assigned by the guest to the nested guest. If both devices use hypercalls, there is no way to distinguish between them. Not sure I understand. What I had in mind is that devices would have to either use different hypercalls and map hypercall to address during setup, or pass address with each hypercall. We get the hypercall, translate the address as if it was pio access, and know the destination? There are no different hypercalls. There's just one hypercall instruction, and there's no standard on how it's used. If a nested call issues a hypercall instruction, you have no idea if it's calling a Hyper-V hypercall or a vbus/virtio kick. You could have a protocol where you register the hypercall instruction's address with its recipient, but it quickly becomes a tangled mess. And for what? pio and hypercalls have the same performance characteristics. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 02:07 PM, Michael S. Tsirkin wrote: On Tue, Aug 18, 2009 at 01:45:05PM +0300, Avi Kivity wrote: On 08/18/2009 01:28 PM, Michael S. Tsirkin wrote: Suppose a nested guest has two devices. One a virtual device backed by its host (our guest), and one a virtual device backed by us (the real host), and assigned by the guest to the nested guest. If both devices use hypercalls, there is no way to distinguish between them. Not sure I understand. What I had in mind is that devices would have to either use different hypercalls and map hypercall to address during setup, or pass address with each hypercall. We get the hypercall, translate the address as if it was pio access, and know the destination? There are no different hypercalls. There's just one hypercall instruction, and there's no standard on how it's used. If a nested call issues a hypercall instruction, you have no idea if it's calling a Hyper-V hypercall or a vbus/virtio kick. userspace will know which it is, because hypercall capability in the device has been activated, and can tell kernel, using something similar to iosignalfd. No? The host kernel sees a hypercall vmexit. How does it know if it's a nested-guest-to-guest hypercall or a nested-guest-to-host hypercall? The two are equally valid at the same time. You could have a protocol where you register the hypercall instruction's address with its recipient, but it quickly becomes a tangled mess. I really thought we could pass the io address in register as an input parameter. Is there a way to do this in a secure manner? Hmm. Doesn't kvm use hypercalls now? How does this work with nesting? For example, in this code in arch/x86/kvm/x86.c: switch (nr) { case KVM_HC_VAPIC_POLL_IRQ: ret = 0; break; case KVM_HC_MMU_OP: r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2),ret); break; default: ret = -KVM_ENOSYS; break; } how do we know that it's the guest and not the nested guest performing the hypercall? The host knows whether the guest or nested guest are running. If the guest is running, it's a guest-to-host hypercall. If the nested guest is running, it's a nested-guest-to-guest hypercall. We don't have nested-guest-to-host hypercalls (and couldn't unless we get agreement on a protocol from all hypervisor vendors). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tue, Aug 18, 2009 at 02:15:57PM +0300, Avi Kivity wrote: On 08/18/2009 02:07 PM, Michael S. Tsirkin wrote: On Tue, Aug 18, 2009 at 01:45:05PM +0300, Avi Kivity wrote: On 08/18/2009 01:28 PM, Michael S. Tsirkin wrote: Suppose a nested guest has two devices. One a virtual device backed by its host (our guest), and one a virtual device backed by us (the real host), and assigned by the guest to the nested guest. If both devices use hypercalls, there is no way to distinguish between them. Not sure I understand. What I had in mind is that devices would have to either use different hypercalls and map hypercall to address during setup, or pass address with each hypercall. We get the hypercall, translate the address as if it was pio access, and know the destination? There are no different hypercalls. There's just one hypercall instruction, and there's no standard on how it's used. If a nested call issues a hypercall instruction, you have no idea if it's calling a Hyper-V hypercall or a vbus/virtio kick. userspace will know which it is, because hypercall capability in the device has been activated, and can tell kernel, using something similar to iosignalfd. No? The host kernel sees a hypercall vmexit. How does it know if it's a nested-guest-to-guest hypercall or a nested-guest-to-host hypercall? The two are equally valid at the same time. Here is how this can work - it is similar to MSI if you like: - by default, the device uses pio kicks - nested guest driver can enable hypercall capability in the device, probably with pci config cycle - guest userspace (hypervisor running in guest) will see this request and perform pci config cycle on the real device, telling it to which nested guest this device is assigned - host userspace (hypervisor running in host) will see this. it now knows both which guest the hypercalls will be for, and that the device in question is an emulated one, and can set up kvm appropriately You could have a protocol where you register the hypercall instruction's address with its recipient, but it quickly becomes a tangled mess. I really thought we could pass the io address in register as an input parameter. Is there a way to do this in a secure manner? Hmm. Doesn't kvm use hypercalls now? How does this work with nesting? For example, in this code in arch/x86/kvm/x86.c: switch (nr) { case KVM_HC_VAPIC_POLL_IRQ: ret = 0; break; case KVM_HC_MMU_OP: r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2),ret); break; default: ret = -KVM_ENOSYS; break; } how do we know that it's the guest and not the nested guest performing the hypercall? The host knows whether the guest or nested guest are running. If the guest is running, it's a guest-to-host hypercall. If the nested guest is running, it's a nested-guest-to-guest hypercall. We don't have nested-guest-to-host hypercalls (and couldn't unless we get agreement on a protocol from all hypervisor vendors). Not necessarily. What I am saying is we could make this protocol part of guest paravirt driver. the guest that loads the driver and enables the capability, has to agree to the protocol. If it doesn't want to, it does not have to use that driver. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 02:49 PM, Michael S. Tsirkin wrote: The host kernel sees a hypercall vmexit. How does it know if it's a nested-guest-to-guest hypercall or a nested-guest-to-host hypercall? The two are equally valid at the same time. Here is how this can work - it is similar to MSI if you like: - by default, the device uses pio kicks - nested guest driver can enable hypercall capability in the device, probably with pci config cycle - guest userspace (hypervisor running in guest) will see this request and perform pci config cycle on the real device, telling it to which nested guest this device is assigned So far so good. - host userspace (hypervisor running in host) will see this. it now knows both which guest the hypercalls will be for, and that the device in question is an emulated one, and can set up kvm appropriately No it doesn't. The fact that one device uses hypercalls doesn't mean all hypercalls are for that device. Hypercalls are a shared resource, and there's no way to tell for a given hypercall what device it is associated with (if any). The host knows whether the guest or nested guest are running. If the guest is running, it's a guest-to-host hypercall. If the nested guest is running, it's a nested-guest-to-guest hypercall. We don't have nested-guest-to-host hypercalls (and couldn't unless we get agreement on a protocol from all hypervisor vendors). Not necessarily. What I am saying is we could make this protocol part of guest paravirt driver. the guest that loads the driver and enables the capability, has to agree to the protocol. If it doesn't want to, it does not have to use that driver. It would only work for kvm-on-kvm. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Anthony Liguori wrote: Gregory Haskins wrote: Note: No one has ever proposed to change the virtio-ABI. virtio-pci is part of the virtio ABI. You are proposing changing that. I'm sorry, but I respectfully disagree with you here. virtio has an ABI...I am not modifying that. virtio-pci has an ABI...I am not modifying that either. The subsystem in question is virtio-vbus, and is a completely standalone addition to the virtio ecosystem. By your argument, virtio amd virtio-pci should fuse together, and virtio-lguest and virtio-s390 should go away because they diverge from the virtio-pci ABI, right? I seriously doubt you would agree with that statement. The fact is, the design of virtio not only permits modular replacement of its transport ABI, it encourages it. So how is virtio-vbus any different from the other three? I understand that it means you need to load a new driver in the guest, and I am ok with that. virtio-pci was once a non-upstream driver too and required someone to explicitly load it, wasn't it? You gotta crawl before you can walk... You cannot add new kernel modules to guests and expect them to remain supported. ??? Of course you can. How is this different from any other driver? So there is value in reusing existing ABIs Well, I wont argue with you on that one. There is certainly value there. My contention is that sometimes the liability of that ABI is greater than its value, and thats when its time to evaluate the design decisions that lead to re-use vs re-design. I think the reason vbus gets better performance for networking today is that vbus' backends are in the kernel while virtio's backends are currently in userspace. Well, with all due respect, you also said initially when I announced vbus that in-kernel doesn't matter, and tried to make virtio-net run as fast as venet from userspace ;) Given that we never saw those userspace patches from you that in fact equaled my performance, I assume you were wrong about that statement. Perhaps you were wrong about other things too? I'm wrong about a lot of things :-) I haven't yet been convinced that I'm wrong here though. One of the gray areas here is what constitutes an in-kernel backend. tun/tap is a sort of an in-kernel backend. Userspace is still involved in all of the paths. vhost seems to be an intermediate step between tun/tap and vbus. The fast paths avoid userspace completely. Many of the slow paths involve userspace still (like migration apparently). With vbus, userspace is avoided entirely. In some ways, you could argue that slirp and vbus are opposite ends of the virtual I/O spectrum. I believe strongly that we should avoid putting things in the kernel unless they absolutely have to be. I would generally agree with you on that. Particularly in the case of kvm, having slow-path bus-management code in-kernel is not strictly necessary because KVM has qemu in userspace. The issue here is that vbus is designed to be a generic solution to in-kernel virtual-IO. It will support (via abstraction of key subsystems) a variety of environments that may or may not be similar in facilities to KVM, and therefore it represents the least-common-denominator as far as what external dependencies it requires. The bottom line is this: despite the tendency for people to jump at don't put much in the kernel!, the fact is that a bus designed for software to software (such as vbus) is almost laughably trivial. Its essentially a list of objects that have an int (dev-id) and char* (dev-type) attribute. All the extra goo that you see me setting up in something like the kvm-connector needs to be done for fast-path _anyway_, so transporting the verbs to query this simple list is not really a big deal. If we were talking about full ICH emulation for a PCI bus, I would agree with you. In the case of vbus, I think its overstated. I'm definitely interested in playing with vhost to see if there are ways to put even less in the kernel. In particular, I think it would be a big win to avoid knowledge of slots in the kernel by doing ring translation in userspace. Ultimately I think that would not be a very good proposition. Ring translation is actually not that hard, and that would definitely be a measurable latency source to try and do as you propose. But, I will not discourage you from trying if that is what you want to do. This implies a userspace transition in the fast path. This may or may not be acceptable. I think this is going to be a very interesting experiment and will ultimately determine whether my intuition about the cost of dropping to userspace is right or wrong. I can already tell you its wrong, just based on the fact that even extra kthread switches can hurt from my own experience playing in this area... Conversely, I am not afraid of requiring a new driver to optimize the general PV interface. In the long term, this will reduce the amount of
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Avi Kivity wrote: On 08/17/2009 10:33 PM, Gregory Haskins wrote: There is a secondary question of venet (a vbus native device) verses virtio-net (a virtio native device that works with PCI or VBUS). If this contention is really around venet vs virtio-net, I may possibly conceed and retract its submission to mainline. I've been pushing it to date because people are using it and I don't see any reason that the driver couldn't be upstream. That's probably the cause of much confusion. The primary kvm pain point is now networking, so in any vbus discussion we're concentrating on that aspect. Also, are you willing to help virtio to become faster? Yes, that is not a problem. Note that virtio in general, and virtio-net/venet in particular are not the primary goal here, however. Improved 802.x and block IO are just positive side-effects of the effort. I started with 802.x networking just to demonstrate the IO layer capabilities, and to test it. It ended up being so good on contrast to existing facilities, that developers in the vbus community started using it for production development. Ultimately, I created vbus to address areas of performance that have not yet been addressed in things like KVM. Areas such as real-time guests, or RDMA (host bypass) interfaces. Can you explain how vbus achieves RDMA? I also don't see the connection to real time guests. Both of these are still in development. Trying to stay true to the release early and often mantra, the core vbus technology is being pushed now so it can be reviewed. Stay tuned for these other developments. I also designed it in such a way that we could, in theory, write one set of (linux-based) backends, and have them work across a variety of environments (such as containers/VMs like KVM, lguest, openvz, but also physical systems like blade enclosures and clusters, or even applications running on the host). Sorry, I'm still confused. Why would openvz need vbus? Its just an example. The point is that I abstracted what I think are the key points of fast-io, memory routing, signal routing, etc, so that it will work in a variety of (ideally, _any_) environments. There may not be _performance_ motivations for certain classes of VMs because they already have decent support, but they may want a connector anyway to gain some of the new features available in vbus. And looking forward, the idea is that we have commoditized the backend so we don't need to redo this each time a new container comes along. It already has zero-copy networking since it's a shared kernel. Shared memory should also work seamlessly, you just need to expose the shared memory object on a shared part of the namespace. And of course, anything in the kernel is already shared. Or do you have arguments why that is impossible to do so and why the only possible solution is vbus? Avi says no such arguments were offered so far. Not for lack of trying. I think my points have just been missed everytime I try to describe them. ;) Basically I write a message very similar to this one, and the next conversation starts back from square one. But I digress, let me try again.. Noting that this discussion is really about the layer *below* virtio, not virtio itself (e.g. PCI vs vbus). Lets start with a little background: -- Background -- So on one level, we have the resource-container technology called vbus. It lets you create a container on the host, fill it with virtual devices, and assign that container to some context (such as a KVM guest). These devices are LKMs, and each device has a very simple verb namespace consisting of a synchronous call() method, and a shm() method for establishing async channels. The async channels are just shared-memory with a signal path (e.g. interrupts and hypercalls), which the device+driver can use to overlay things like rings (virtqueues, IOQs), or other shared-memory based constructs of their choosing (such as a shared table). The signal path is designed to minimize enter/exits and reduce spurious signals in a unified way (see shm-signal patch). call() can be used both for config-space like details, as well as fast-path messaging that require synchronous behavior (such as guest scheduler updates). All of this is managed via sysfs/configfs. One point of contention is that this is all managementy stuff and should be kept out of the host kernel. Exposing shared memory, interrupts, and guest hypercalls can all be easily done from userspace (as virtio demonstrates). True, some devices need kernel acceleration, but that's no reason to put everything into the host kernel. See my last reply to Anthony. My two points here are that: a) having it in-kernel makes it a complete subsystem, which perhaps has diminished value in kvm, but adds value in most other places that we are looking to use vbus. b) the in-kernel code is being overstated as
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Avi Kivity wrote: On 08/18/2009 04:16 PM, Gregory Haskins wrote: The issue here is that vbus is designed to be a generic solution to in-kernel virtual-IO. It will support (via abstraction of key subsystems) a variety of environments that may or may not be similar in facilities to KVM, and therefore it represents the least-common-denominator as far as what external dependencies it requires. Maybe it will be easier to evaluate it in the context of these other environments. It's difficult to assess this without an example. When they are ready, I will cross post the announcement to KVM. The bottom line is this: despite the tendency for people to jump at don't put much in the kernel!, the fact is that a bus designed for software to software (such as vbus) is almost laughably trivial. Its essentially a list of objects that have an int (dev-id) and char* (dev-type) attribute. All the extra goo that you see me setting up in something like the kvm-connector needs to be done for fast-path _anyway_, so transporting the verbs to query this simple list is not really a big deal. It's not laughably trivial when you try to support the full feature set of kvm (for example, live migration will require dirty memory tracking, and exporting all state stored in the kernel to userspace). Doesn't vhost suffer from the same issue? If not, could I also apply the same technique to support live-migration in vbus? Note that I didn't really want to go that route. As you know, I tried pushing this straight through kvm first since earlier this year, but I was met with reluctance to even bother truly understanding what I was proposing, comments like tell me your ideas so I can steal them, and Oh come on, I wrote steal as a convenient shorthand for cross-pollinate your ideas into our code according to the letter and spirit of the GNU General Public License. Is that supposed to make me feel better about working with you? I mean, writing, testing, polishing patches for LKML-type submission is time consuming. If all you are going to do is take those ideas and rewrite it yourself, why should I go through that effort? And its not like that was the first time you have said that to me. Since we're all trying to improve Linux we may as well cooperate. Well, I don't think anyone can say that I haven't been trying. sorry, we are going to reinvent our own instead. No. Adopting venet/vbus would mean reinventing something that already existed. But yet, it doesn't. Continuing to support virtio/pci is not reinventing anything. No one asked you to do otherwise. This isn't exactly going to motivate someone to continue pushing these ideas within that community. I was made to feel (purposely?) unwelcome at times. So I can either roll over and die, or start my own project. You haven't convinced me that your ideas are worth the effort of abandoning virtio/pci or maintaining both venet/vbus and virtio/pci. With all due respect, I didnt ask you do to anything, especially not abandon something you are happy with. All I did was push guest drivers to LKML. The code in question is independent of KVM, and its proven to improve the experience of using Linux as a platform. There are people interested in using them (by virtue of the number of people that have signed up for the AlacrityVM list, and have mailed me privately about this work). So where is the problem here? I'm sorry if that made you feel unwelcome. There's no reason to interpret disagreement as malice though. Ok. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tue, Aug 18, 2009 at 11:39:25AM -0400, Gregory Haskins wrote: Michael S. Tsirkin wrote: On Mon, Aug 17, 2009 at 03:33:30PM -0400, Gregory Haskins wrote: There is a secondary question of venet (a vbus native device) verses virtio-net (a virtio native device that works with PCI or VBUS). If this contention is really around venet vs virtio-net, I may possibly conceed and retract its submission to mainline. For me yes, venet+ioq competing with virtio+virtqueue. I've been pushing it to date because people are using it and I don't see any reason that the driver couldn't be upstream. If virtio is just as fast, they can just use it without knowing it. Clearly, that's better since we support virtio anyway ... More specifically: kvm can support whatever it wants. I am not asking kvm to support venet. If we (the alacrityvm community) decide to keep maintaining venet, _we_ will support it, and I have no problem with that. As of right now, we are doing some interesting things with it in the lab and its certainly more flexible for us as a platform since we maintain the ABI and feature set. So for now, I do not think its a big deal if they both co-exist, and it has no bearing on KVM upstream. As someone who extended them recently, both ABI and feature set with virtio are pretty flexible. What's the problem? Will every single contributor now push a driver with an incompatible ABI upstream because this way he maintains both ABI and feature set? Oh well ... -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 06:51 PM, Gregory Haskins wrote: It's not laughably trivial when you try to support the full feature set of kvm (for example, live migration will require dirty memory tracking, and exporting all state stored in the kernel to userspace). Doesn't vhost suffer from the same issue? If not, could I also apply the same technique to support live-migration in vbus? It does. There are two possible solutions to that: dropping the entire protocol to userspace, or the one I prefer, proxying the ring and eventfds in userspace but otherwise letting vhost-net run normally. This way userspace gets to see descriptors and mark the pages as dirty. Both these approaches rely on vhost-net being an accelerator to a userspace based component, but maybe you can adapt venet to use something similar. Oh come on, I wrote steal as a convenient shorthand for cross-pollinate your ideas into our code according to the letter and spirit of the GNU General Public License. Is that supposed to make me feel better about working with you? I mean, writing, testing, polishing patches for LKML-type submission is time consuming. If all you are going to do is take those ideas and rewrite it yourself, why should I go through that effort? If you're posting your ideas for everyone to read in the form of code, why not post them in the form of design ideas as well? In any case you've given up any secrets. In the worst case you've lost nothing, in the best case you may get some hopefully constructive criticism and maybe improvements. I'm perfectly happy picking up ideas from competing projects (and I have) and seeing my ideas picked up in competing projects (which I also have). Really, isn't that the point of open source? Share code, but also share ideas? And its not like that was the first time you have said that to me. And I meant it every time. Haven't you just asked how vhost-net plans to do live migration? Since we're all trying to improve Linux we may as well cooperate. Well, I don't think anyone can say that I haven't been trying. I'd be obliged if you reveal some of your secret sauce then (only the parts you plan to GPL anyway of course). sorry, we are going to reinvent our own instead. No. Adopting venet/vbus would mean reinventing something that already existed. But yet, it doesn't. We'll need to do the agree to disagree thing again here. Continuing to support virtio/pci is not reinventing anything. No one asked you to do otherwise. Right, and I'm not keen on supporting both. See why I want to stick to virtio/pci as long as I possibly can? You haven't convinced me that your ideas are worth the effort of abandoning virtio/pci or maintaining both venet/vbus and virtio/pci. With all due respect, I didnt ask you do to anything, especially not abandon something you are happy with. All I did was push guest drivers to LKML. The code in question is independent of KVM, and its proven to improve the experience of using Linux as a platform. There are people interested in using them (by virtue of the number of people that have signed up for the AlacrityVM list, and have mailed me privately about this work). So where is the problem here? I'm unhappy with the duplication of effort and potential fragmentation of the developer and user communities, that's all. I'd rather see the work going into vbus/venet going into virtio. I think it's a legitimate concern. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tue, Aug 18, 2009 at 11:51:59AM -0400, Gregory Haskins wrote: It's not laughably trivial when you try to support the full feature set of kvm (for example, live migration will require dirty memory tracking, and exporting all state stored in the kernel to userspace). Doesn't vhost suffer from the same issue? If not, could I also apply the same technique to support live-migration in vbus? vhost does this by switching to userspace for the duration of live migration. venet could do this I guess, but you'd need to write a userspace implementation. vhost just reuses existing userspace virtio. With all due respect, I didnt ask you do to anything, especially not abandon something you are happy with. All I did was push guest drivers to LKML. The code in question is independent of KVM, and its proven to improve the experience of using Linux as a platform. There are people interested in using them (by virtue of the number of people that have signed up for the AlacrityVM list, and have mailed me privately about this work). So where is the problem here? If virtio net in guest could be improved instead, everyone would benefit. I am doing this, and I wish more people would join. Instead, you change ABI in a incompatible way. So now, there's no single place to work on kvm networking performance. Now, it would all be understandable if the reason was e.g. better performance. But you say yourself it isn't. See the problem? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 05:46 PM, Gregory Haskins wrote: Can you explain how vbus achieves RDMA? I also don't see the connection to real time guests. Both of these are still in development. Trying to stay true to the release early and often mantra, the core vbus technology is being pushed now so it can be reviewed. Stay tuned for these other developments. Hopefully you can outline how it works. AFAICT, RDMA and kernel bypass will need device assignment. If you're bypassing the call into the host kernel, it doesn't really matter how that call is made, does it? I also designed it in such a way that we could, in theory, write one set of (linux-based) backends, and have them work across a variety of environments (such as containers/VMs like KVM, lguest, openvz, but also physical systems like blade enclosures and clusters, or even applications running on the host). Sorry, I'm still confused. Why would openvz need vbus? Its just an example. The point is that I abstracted what I think are the key points of fast-io, memory routing, signal routing, etc, so that it will work in a variety of (ideally, _any_) environments. There may not be _performance_ motivations for certain classes of VMs because they already have decent support, but they may want a connector anyway to gain some of the new features available in vbus. And looking forward, the idea is that we have commoditized the backend so we don't need to redo this each time a new container comes along. I'll wait until a concrete example shows up as I still don't understand. One point of contention is that this is all managementy stuff and should be kept out of the host kernel. Exposing shared memory, interrupts, and guest hypercalls can all be easily done from userspace (as virtio demonstrates). True, some devices need kernel acceleration, but that's no reason to put everything into the host kernel. See my last reply to Anthony. My two points here are that: a) having it in-kernel makes it a complete subsystem, which perhaps has diminished value in kvm, but adds value in most other places that we are looking to use vbus. It's not a complete system unless you want users to administer VMs using echo and cat and configfs. Some userspace support will always be necessary. b) the in-kernel code is being overstated as complex. We are not talking about your typical virt thing, like an emulated ICH/PCI chipset. Its really a simple list of devices with a handful of attributes. They are managed using established linux interfaces, like sysfs/configfs. They need to be connected to the real world somehow. What about security? can any user create a container and devices and link them to real interfaces? If not, do you need to run the VM as root? virtio and vhost-net solve these issues. Does vbus? The code may be simple to you. But the question is whether it's necessary, not whether it's simple or complex. Exposing devices as PCI is an important issue for me, as I have to consider non-Linux guests. Thats your prerogative, but obviously not everyone agrees with you. I hope everyone agrees that it's an important issue for me and that I have to consider non-Linux guests. I also hope that you're considering non-Linux guests since they have considerable market share. Getting non-Linux guests to work is my problem if you chose to not be part of the vbus community. I won't be writing those drivers in any case. Another issue is the host kernel management code which I believe is superfluous. In your opinion, right? Yes, this is why I wrote I believe. Given that, why spread to a new model? Note: I haven't asked you to (at least, not since April with the vbus-v3 release). Spreading to a new model is currently the role of the AlacrityVM project, since we disagree on the utility of a new model. Given I'm not the gateway to inclusion of vbus/venet, you don't need to ask me anything. I'm still free to give my opinion. A) hardware can only generate byte/word sized requests at a time because that is all the pcb-etch and silicon support. So hardware is usually expressed in terms of some number of registers. No, hardware happily DMAs to and fro main memory. Yes, now walk me through how you set up DMA to do something like a call when you do not know addresses apriori. Hint: count the number of MMIO/PIOs you need. If the number is 1, you've lost. With virtio, the number is 1 (or less if you amortize). Set up the ring entries and kick. Some hardware of course uses mmio registers extensively, but not virtio hardware. With the recent MSI support no registers are touched in the fast path. Note we are not talking about virtio here. Just raw PCI and why I advocate vbus over it. There's no such thing as raw PCI. Every PCI device has a protocol. The protocol virtio chose is optimized for virtualization.
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tuesday 18 August 2009, Gregory Haskins wrote: Avi Kivity wrote: On 08/17/2009 10:33 PM, Gregory Haskins wrote: One point of contention is that this is all managementy stuff and should be kept out of the host kernel. Exposing shared memory, interrupts, and guest hypercalls can all be easily done from userspace (as virtio demonstrates). True, some devices need kernel acceleration, but that's no reason to put everything into the host kernel. See my last reply to Anthony. My two points here are that: a) having it in-kernel makes it a complete subsystem, which perhaps has diminished value in kvm, but adds value in most other places that we are looking to use vbus. b) the in-kernel code is being overstated as complex. We are not talking about your typical virt thing, like an emulated ICH/PCI chipset. Its really a simple list of devices with a handful of attributes. They are managed using established linux interfaces, like sysfs/configfs. IMHO the complexity of the code is not so much of a problem. What I see as a problem is the complexity a kernel/user space interface that manages a the devices with global state. One of the greatest features of Michaels vhost driver is that all the state is associated with open file descriptors that either exist already or belong to the vhost_net misc device. When a process dies, all the file descriptors get closed and the whole state is cleaned up implicitly. AFAICT, you can't do that with the vbus host model. What performance oriented items have been left unaddressed? Well, the interrupt model to name one. The performance aspects of your interrupt model are independent of the vbus proxy, or at least they should be. Let's assume for now that your event notification mechanism gives significant performance improvements (which we can't measure independently right now). I don't see a reason why we could not get the same performance out of a paravirtual interrupt controller that uses the same method, and it would be straightforward to implement one and use that together with all the existing emulated PCI devices and virtio devices including vhost_net. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 09:20 PM, Arnd Bergmann wrote: Well, the interrupt model to name one. The performance aspects of your interrupt model are independent of the vbus proxy, or at least they should be. Let's assume for now that your event notification mechanism gives significant performance improvements (which we can't measure independently right now). I don't see a reason why we could not get the same performance out of a paravirtual interrupt controller that uses the same method, and it would be straightforward to implement one and use that together with all the existing emulated PCI devices and virtio devices including vhost_net. Interesting. You could even configure those vectors using the standard MSI configuration mechanism; simply replace the address/data pair with something meaningful to the paravirt interrupt controller. I'd have to see really hard numbers to be tempted to merge something like this though. We've merged paravirt mmu, for example, and now it underperforms both hardware two-level paging and software shadow paging. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Ingo Molnar wrote: * Gregory Haskins gregory.hask...@gmail.com wrote: You haven't convinced me that your ideas are worth the effort of abandoning virtio/pci or maintaining both venet/vbus and virtio/pci. With all due respect, I didnt ask you do to anything, especially not abandon something you are happy with. All I did was push guest drivers to LKML. The code in question is independent of KVM, and its proven to improve the experience of using Linux as a platform. There are people interested in using them (by virtue of the number of people that have signed up for the AlacrityVM list, and have mailed me privately about this work). This thread started because i asked you about your technical arguments why we'd want vbus instead of virtio. (You mean vbus vs pci, right? virtio works fine, is untouched, and is out-of-scope here) Right, and I do believe I answered your questions. Do you feel as though this was not a satisfactory response? Your answer above now basically boils down to: because I want it so, why dont you leave me alone. Well, with all due respect, please do not put words in my mouth. This is not what I am saying at all. What I *am* saying is: fact: this thread is about linux guest drivers to support vbus fact: these drivers do not touch kvm code. fact: these drivers to not force kvm to alter its operation in any way. fact: these drivers do not alter ABIs that KVM currently supports. Therefore, all this talk about abandoning, supporting, and changing things in KVM is, premature, irrelevant, and/or, FUD. No one proposed such changes, so I am highlighting this fact to bring the thread back on topic. That KVM talk is merely a distraction at this point in time. What you are doing here is to in essence to fork KVM, No, that is incorrect. What I am doing here is a downstream development point for the integration of KVM and vbus. Its akin to kvm.git or tip.git to develop a subsystem intended for eventual inclusion upstream. If and when the code goes upstream in a manner acceptable to all parties involved, and AlacrityVM exceeds its utility as a separate project, I will _gladly_ dissolve it and migrate to use upstream KVM instead. As stated on the project wiki: It is a goal of AlacrityVM to work towards upstream acceptance of the project on a timeline that suits the community. In the meantime, this wiki will serve as the central coordination point for development and discussion of the technology (citation: http://developer.novell.com/wiki/index.php/AlacrityVM) And I meant it when I said it. Until then, the project is a much more efficient way for us (the vbus developers) to work together than pointing people at my patch series posted to k...@vger. I tried that way first. It sucked, and didn't work. Users were having trouble patching the various pieces, building, etc. Now I can offer a complete solution from a central point, with all the proper pieces in place to play around with it. Ultimately, it is up to upstream to decide if this is to become merged or remain out of tree forever as a fork. Not me. I will continue to make every effort to find common ground with my goals coincident with the blessing of upstream, as I have been from the beginning. Now I have a more official forum to do it in. regardless of the technical counter arguments given against such a fork and regardless of the ample opportunity given to you to demostrate the technical advantages of your code. (in which case KVM would happily migrate to your code) In an ideal world, perhaps. Avi and I currently have a fundamental disagreement about the best way to do PV. He sees the world through PCI glasses, and I don't. Despite attempts on both sides to rectify this disagreement, we currently do not see eye to eye on every front. This doesn't mean he is right, and I am wrong per se. It just means we disagree. Period. Avi is a sharp guy, and I respect him. But upstream KVM doesn't have a corner on correct ;) The community as a whole will ultimately decide if my ideas live or die, wouldn't you agree? Avi can correct me if I am wrong, but what we _do_ agree on is that core KVM doesn't need to be directly involved in this vbus (or vhost) discussion, per se. It just wants to have the hooks to support various PV solutions (such as irqfd/ioeventfd), and vbus is one such solution. We all love faster code and better management interfaces and tons of your prior patches got accepted by Avi. This time you didnt even _try_ to improve virtio. Im sorry, but you are mistaken: http://lkml.indiana.edu/hypermail/linux/kernel/0904.2/02443.html It's not like you posted a lot of virtio patches which were not applied. You didnt even try and you need to try _much_ harder than that before forking a project. I really do not think you are in a position to say when someone can or cannot fork a project, so please do not try to lecture on that. Perhaps you could offer
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/19/2009 07:27 AM, Gregory Haskins wrote: This thread started because i asked you about your technical arguments why we'd want vbus instead of virtio. (You mean vbus vs pci, right? virtio works fine, is untouched, and is out-of-scope here) I guess he meant venet vs virtio-net. Without venet vbus is currently userless. Right, and I do believe I answered your questions. Do you feel as though this was not a satisfactory response? Others and I have shown you its wrong. There's no inherent performance problem in pci. The vbus approach has inherent problems (the biggest of which is compatibility, the second managability). Your answer above now basically boils down to: because I want it so, why dont you leave me alone. Well, with all due respect, please do not put words in my mouth. This is not what I am saying at all. What I *am* saying is: fact: this thread is about linux guest drivers to support vbus fact: these drivers do not touch kvm code. fact: these drivers to not force kvm to alter its operation in any way. fact: these drivers do not alter ABIs that KVM currently supports. Therefore, all this talk about abandoning, supporting, and changing things in KVM is, premature, irrelevant, and/or, FUD. No one proposed such changes, so I am highlighting this fact to bring the thread back on topic. That KVM talk is merely a distraction at this point in time. s/kvm/kvm stack/. virtio/pci is part of the kvm stack, even if it is not part of kvm itself. If vbus/venet were to be merged, users and developers would have to choose one or the other. That's the fragmentation I'm worried about. And you can prefix that with fact: as well. We all love faster code and better management interfaces and tons of your prior patches got accepted by Avi. This time you didnt even _try_ to improve virtio. Im sorry, but you are mistaken: http://lkml.indiana.edu/hypermail/linux/kernel/0904.2/02443.html That does nothing to improve virtio. Existing guests (Linux and Windows) which support virtio will cease to work if the host moves to vbus-virtio. Existing hosts (running virtio-pci) won't be able to talk to newer guests running virtio-vbus. The patch doesn't improve performance without the entire vbus stack in the host kernel and a vbus-virtio-net-host host kernel driver. Perhaps if you posted everything needed to make vbus-virtio work and perform we could compare that to vhost-net and you'll see another reason why vhost-net is the better approach. You are also wrong to say that I didn't try to avoid creating a downstream effort first. I believe the public record of the mailing lists will back me up that I tried politely pushing this directly though kvm first. It was only after Avi recently informed me that they would be building their own version of an in-kernel backend in lieu of working with me to adapt vbus to their needs that I decided to put my own project together. There's no way we can adapt vbus to our needs. Don't you think we'd preferred it rather than writing our own? the current virtio-net issues are hurting us. Our needs are compatibility, performance, and managability. vbus fails all three, your impressive venet numbers notwithstanding. What should I have done otherwise, in your opinion? You could come up with uses where vbus truly is superior to virtio/pci/whatever (not words about etch constraints). Showing some of those non-virt uses, for example. The fact that your only user duplicates existing functionality doesn't help. And fragmentation matters quite a bit. To Linux users, developers, administrators, packagers it's a big deal whether two overlapping pieces of functionality for the same thing exist within the same kernel. So the only thing that could be construed as overlapping here is venet vs virtio-net. If I dropped the contentious venet and focused on making a virtio-net backend that we can all re-use, do you see that as a path of compromise here? That's a step in the right direction. I certainly dont want that. Instead we (at great expense and work) try to reach the best technical solution. This is all I want, as well. Note whenever I mention migration, large guests, or Windows you say these are not your design requirements. The best technical solution will have to consider those. If the community wants this then why cannot you convince one of the most prominent representatives of that community, the KVM developers? Its a chicken and egg at times. Perhaps the KVM developers do not have the motivation or time to properly consider such a proposal _until_ the community presents its demand. I've spent quite a lot of time arguing with you, no doubt influenced by the fact that you can write a lot faster than I can read. Furthermore, 99% of your work is KVM Actually, no. Almost none of it is. I think there are about 2-3 patches
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Michael S. Tsirkin wrote: On Tue, Aug 18, 2009 at 11:51:59AM -0400, Gregory Haskins wrote: It's not laughably trivial when you try to support the full feature set of kvm (for example, live migration will require dirty memory tracking, and exporting all state stored in the kernel to userspace). Doesn't vhost suffer from the same issue? If not, could I also apply the same technique to support live-migration in vbus? vhost does this by switching to userspace for the duration of live migration. venet could do this I guess, but you'd need to write a userspace implementation. vhost just reuses existing userspace virtio. With all due respect, I didnt ask you do to anything, especially not abandon something you are happy with. All I did was push guest drivers to LKML. The code in question is independent of KVM, and its proven to improve the experience of using Linux as a platform. There are people interested in using them (by virtue of the number of people that have signed up for the AlacrityVM list, and have mailed me privately about this work). So where is the problem here? If virtio net in guest could be improved instead, everyone would benefit. So if I whip up a virtio-net backend for vbus with a PCI compliant connector, you are happy? I am doing this, and I wish more people would join. Instead, you change ABI in a incompatible way. Only by choice of my particular connector. The ABI is a function of the connector design. So one such model is to terminate the connector in qemu, and surface the resulting objects as PCI devices. I choose not to use this particular design for my connector that I am pushing upstream because I am of the opinion that I can do better by terminating it in the guest directly as a PV optimized bus. However, both connectors can theoretically coexist peacefully. The advantage that this would give us is that one in-kernel virtio-net model could be surfaced to all vbus users (pci, or otherwise), which will hopefully be growing over time. This would have gained vbus a virtio-net backend, and it would have saved you from re-inventing the various abstractions and management interfaces that vbus has in place. So now, there's no single place to work on kvm networking performance. Now, it would all be understandable if the reason was e.g. better performance. But you say yourself it isn't. Actually, I really didn't say that. As far as I know, your patch hasnt been performance proven to my knowledge, but I just gave you the benefit of the doubt. What I said was that for a limited type of benchmark, it *may* get similar numbers if you implemented vhost optimally. For others (for instance, when we can start to take advantage of priority, or scaling the number of interfaces) it may not since my proposed connector was designed to optimize this over raw PCI facilities. But I digress. Please post results when you have numbers, as I had to give up my 10GE rig in the lab. I suspect you will have performance issues until you at least address GSO, but you may already be there by now. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Ingo Molnar wrote: * Gregory Haskins ghask...@novell.com wrote: This will generally be used for hypervisors to publish any host-side virtual devices up to a guest. The guest will have the opportunity to consume any devices present on the vbus-proxy as if they were platform devices, similar to existing buses like PCI. Signed-off-by: Gregory Haskins ghask...@novell.com --- MAINTAINERS |6 ++ arch/x86/Kconfig|2 + drivers/Makefile|1 drivers/vbus/Kconfig| 14 drivers/vbus/Makefile |3 + drivers/vbus/bus-proxy.c| 152 +++ include/linux/vbus_driver.h | 73 + 7 files changed, 251 insertions(+), 0 deletions(-) create mode 100644 drivers/vbus/Kconfig create mode 100644 drivers/vbus/Makefile create mode 100644 drivers/vbus/bus-proxy.c create mode 100644 include/linux/vbus_driver.h Is there a consensus on this with the KVM folks? (i've added the KVM list to the Cc:) Hi Ingo, Avi can correct me if I am wrong, but the agreement that he and I came to a few months ago was something to the effect of: kvm will be neutral towards various external IO subsystems, and instead provide various hooks (see irqfd, ioeventfd) to permit these IO subsystems to interface with kvm. AlacrityVM is one of the first projects to take advantage of that interface. AlacrityVM is kvm-core + vbus-core + vbus-kvm-connector + vbus-enhanced qemu + guest drivers. This thread is part of the guest-drivers portion. Note that it is specific to alacrityvm, not kvm, which is why the kvm list was not included in the conversation (also an agreement with Avi: http://lkml.org/lkml/2009/8/6/231). Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Ingo Molnar wrote: I think the reason vbus gets better performance for networking today is that vbus' backends are in the kernel while virtio's backends are currently in userspace. Since Michael has a functioning in-kernel backend for virtio-net now, I suspect we're weeks (maybe days) away from performance results. My expectation is that vhost + virtio-net will be as good as venet + vbus. If that's the case, then I don't see any reason to adopt vbus unless Greg things there are other compelling features over virtio. Keeping virtio's backend in user-space was rather stupid IMHO. I don't think it's quite so clear. There's nothing about vhost_net that would prevent a userspace application from using it as a higher performance replacement for tun/tap. The fact that we can avoid userspace for most of the fast paths is nice but that's really an issue of vhost_net vs. tun/tap. From the kernel's perspective, a KVM guest is just a userspace process. Having new userspace interfaces that are only useful to KVM guests would be a bad thing. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Anthony Liguori wrote: Ingo Molnar wrote: * Gregory Haskins ghask...@novell.com wrote: This will generally be used for hypervisors to publish any host-side virtual devices up to a guest. The guest will have the opportunity to consume any devices present on the vbus-proxy as if they were platform devices, similar to existing buses like PCI. Signed-off-by: Gregory Haskins ghask...@novell.com --- MAINTAINERS |6 ++ arch/x86/Kconfig|2 + drivers/Makefile|1 drivers/vbus/Kconfig| 14 drivers/vbus/Makefile |3 + drivers/vbus/bus-proxy.c| 152 +++ include/linux/vbus_driver.h | 73 + 7 files changed, 251 insertions(+), 0 deletions(-) create mode 100644 drivers/vbus/Kconfig create mode 100644 drivers/vbus/Makefile create mode 100644 drivers/vbus/bus-proxy.c create mode 100644 include/linux/vbus_driver.h Is there a consensus on this with the KVM folks? (i've added the KVM list to the Cc:) I'll let Avi comment about it from a KVM perspective but from a QEMU perspective, I don't think we want to support two paravirtual IO frameworks. I'd like to see them converge. Since there's an install base of guests today with virtio drivers, there really ought to be a compelling reason to change the virtio ABI in a non-backwards compatible way. Note: No one has ever proposed to change the virtio-ABI. In fact, this thread in question doesn't even touch virtio, and even the patches that I have previously posted to add virtio-capability do it in a backwards compatible way Case in point: Take an upstream kernel and you can modprobe the vbus-pcibridge in and virtio devices will work over that transport unmodified. See http://lkml.org/lkml/2009/8/6/244 for details. Note that I have tentatively dropped the virtio-vbus patch from the queue due to lack of interest, but I can resurrect it if need be. This means convergence really ought to be adding features to virtio. virtio is a device model. vbus is a bus model and a host backend facility. Adding features to virtio would be orthogonal to some kind of convergence goal. virtio can run unmodified or add new features within its own namespace independent of vbus, as it pleases. vbus will simply transport those changes. On paper, I don't think vbus really has any features over virtio. Again, do not confuse vbus with virtio. They are different layers of the stack. vbus does things in different ways (paravirtual bus vs. pci for discovery) but I think we're happy with how virtio does things today. Thats fine. KVM can stick with virtio-pci if it wants. AlacrityVM will support virtio-pci and vbus (with possible convergence with virtio-vbus). If at some point KVM thinks vbus is interesting, I will gladly work with getting it integrated into upstream KVM as well. Until then, they can happily coexist without issue between the two projects. I think the reason vbus gets better performance for networking today is that vbus' backends are in the kernel while virtio's backends are currently in userspace. Well, with all due respect, you also said initially when I announced vbus that in-kernel doesn't matter, and tried to make virtio-net run as fast as venet from userspace ;) Given that we never saw those userspace patches from you that in fact equaled my performance, I assume you were wrong about that statement. Perhaps you were wrong about other things too? Since Michael has a functioning in-kernel backend for virtio-net now, I suspect we're weeks (maybe days) away from performance results. My expectation is that vhost + virtio-net will be as good as venet + vbus. This is not entirely impossible, at least for certain simple benchmarks like singleton throughput and latency. But if you think that this somehow invalidates vbus as a concept, you have missed the point entirely. vbus is about creating a flexible (e.g. cross hypervisor, and even physical system or userspace application) in-kernel IO containers with linux. The guest interface represents what I believe to be the ideal interface for ease of use, yet maximum performance for software-to-software interaction. This means very low latency and high-throughput for both synchronous and asynchronous IO, minimizing enters/exits, reducing enter/exit cost, prioritization, parallel computation, etc. The things that we (the alacrityvm community) have coming down the pipeline for high-performance virtualization require that these issues be addressed. venet was originally crafted just to validate the approach and test the vbus interface. It ended up being so much faster that virtio-net, that people in the vbus community started coding against its ABI. Therefore, I decided to support it formally and indefinately. If I can get consensus on virtio-vbus going forward, it will probably be the last vbus-specific driver
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Avi Kivity wrote: On 08/15/2009 01:32 PM, Ingo Molnar wrote: This will generally be used for hypervisors to publish any host-side virtual devices up to a guest. The guest will have the opportunity to consume any devices present on the vbus-proxy as if they were platform devices, similar to existing buses like PCI. Is there a consensus on this with the KVM folks? (i've added the KVM list to the Cc:) My opinion is that this is a duplication of effort and we'd be better off if everyone contributed to enhancing virtio, which already has widely deployed guest drivers and non-Linux guest support. It may have merit if it is proven that it is technically superior to virtio (and I don't mean some benchmark in some point in time; I mean design wise). So far I haven't seen any indications that it is. The design is very different, so hopefully I can start to convince you why it might be interesting. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
* Anthony Liguori anth...@codemonkey.ws wrote: Ingo Molnar wrote: I think the reason vbus gets better performance for networking today is that vbus' backends are in the kernel while virtio's backends are currently in userspace. Since Michael has a functioning in-kernel backend for virtio-net now, I suspect we're weeks (maybe days) away from performance results. My expectation is that vhost + virtio-net will be as good as venet + vbus. If that's the case, then I don't see any reason to adopt vbus unless Greg things there are other compelling features over virtio. Keeping virtio's backend in user-space was rather stupid IMHO. I don't think it's quite so clear. in such a narrow quote it's not so clear indeed - that's why i qualified it with: Having the _option_ to piggyback to user-space (for flexibility, extensibility, etc.) is OK, but not having kernel acceleration is bad. Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/17/2009 05:16 PM, Gregory Haskins wrote: My opinion is that this is a duplication of effort and we'd be better off if everyone contributed to enhancing virtio, which already has widely deployed guest drivers and non-Linux guest support. It may have merit if it is proven that it is technically superior to virtio (and I don't mean some benchmark in some point in time; I mean design wise). So far I haven't seen any indications that it is. The design is very different, so hopefully I can start to convince you why it might be interesting. We've been through this before I believe. If you can point out specific differences that make venet outperform virtio-net I'll be glad to hear (and steal) them though. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Ingo Molnar wrote: * Gregory Haskins gregory.hask...@gmail.com wrote: Ingo Molnar wrote: * Gregory Haskins ghask...@novell.com wrote: This will generally be used for hypervisors to publish any host-side virtual devices up to a guest. The guest will have the opportunity to consume any devices present on the vbus-proxy as if they were platform devices, similar to existing buses like PCI. Signed-off-by: Gregory Haskins ghask...@novell.com --- MAINTAINERS |6 ++ arch/x86/Kconfig|2 + drivers/Makefile|1 drivers/vbus/Kconfig| 14 drivers/vbus/Makefile |3 + drivers/vbus/bus-proxy.c| 152 +++ include/linux/vbus_driver.h | 73 + 7 files changed, 251 insertions(+), 0 deletions(-) create mode 100644 drivers/vbus/Kconfig create mode 100644 drivers/vbus/Makefile create mode 100644 drivers/vbus/bus-proxy.c create mode 100644 include/linux/vbus_driver.h Is there a consensus on this with the KVM folks? (i've added the KVM list to the Cc:) Hi Ingo, Avi can correct me if I am wrong, but the agreement that he and I came to a few months ago was something to the effect of: kvm will be neutral towards various external IO subsystems, and instead provide various hooks (see irqfd, ioeventfd) to permit these IO subsystems to interface with kvm. AlacrityVM is one of the first projects to take advantage of that interface. AlacrityVM is kvm-core + vbus-core + vbus-kvm-connector + vbus-enhanced qemu + guest drivers. This thread is part of the guest-drivers portion. Note that it is specific to alacrityvm, not kvm, which is why the kvm list was not included in the conversation (also an agreement with Avi: http://lkml.org/lkml/2009/8/6/231). Well my own opinion is that the fracturing of the Linux internal driver space into diverging pieces of duplicate functionality (absent compelling technical reasons) is harmful. [Adding Michael Tsirkin] Hi Ingo, 1) First off, let me state that I have made every effort to propose this as a solution to integrate with KVM, the most recent of which is April: http://lkml.org/lkml/2009/4/21/408 If you read through the various vbus related threads on LKML/KVM posted this year, I think you will see that I made numerous polite offerings to work with people on finding a common solution here, including Michael. In the end, Michael decided that go a different route using some of the ideas proposed in vbus + venet-tap to create vhost-net. This is fine, and I respect his decision. But do not try to pin fracturing on me, because I tried everything to avoid it. :) Since I still disagree with the fundamental approach of how KVM IO works, I am continuing my effort in the downstream project AlacrityVM which will hopefully serve to build a better understanding of what it is I am doing with the vbus technology, and a point to maintain the subsystem. 2) There *are* technical reasons for this change (and IMHO, they are compelling), many of which have already been previously discussed (including my last reply to Anthony) so I wont rehash them here. 3) Even if there really is some duplication here, I disagree with you that it is somehow harmful to the Linux community per se. Case in point, look at the graphs posted on the AlacrityVM wiki: http://developer.novell.com/wiki/index.php/AlacrityVM Prior to my effort, KVM was humming along at the status quo and I came along with a closer eye and almost doubled the throughput and cut latency by 78%. Given an apparent disagreement with aspects of my approach, Michael went off and created a counter example that was motivated by my performance findings. Therefore, even if Avi ultimately accepts Michaels vhost approach instead of mine, Linux as a hypervisor platform has been significantly _improved_ by a little friendly competition, not somehow damaged by it. 4) Lastly, these patches are almost entirely just stand alone Linux drivers that do not affect KVM if KVM doesn't wish to acknowledge them. Its just like any of the other numerous drivers that are accepted upstream into Linux every day. The only maintained subsystem that is technically touched by this series is netdev, and David Miller already approved of the relevant patch's inclusion: http://lkml.org/lkml/2009/8/3/505 So with all due respect, where is the problem? The patches are all professionally developed according to the Linux coding standards, pass checkpatch, are GPL'ed, and work with a freely available platform which you can download today (http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=summary) Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
* Avi Kivity a...@redhat.com wrote: I don't have any technical objections to vbus/venet (I had in the past re interrupts but I believe you've addressed them), and it appears to perform very well. However I still think we should address virtio's shortcomings (as Michael is doing) rather than create a competitor. We have enough external competition, we don't need in-tree competitors. I do have strong technical objections: distributions really want to standardize on as few Linux internal virtualization APIs as possible, so splintering it just because /bin/cp is easy to do is bad. If virtio pulls even with vbus's performance and vbus has no advantages over virtio i do NAK vbus on that basis. Lets stop the sillyness before it starts hurting users. Coming up with something better is good, but doing an incompatible, duplicative framework just for NIH reasons is stupid and should be resisted. People dont get to add a new sys_read_v2() without strong technical arguments either - the same holds for our Linux internal driver abstractions, APIs and ABIs. ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/17/2009 06:05 PM, Gregory Haskins wrote: Hi Ingo, 1) First off, let me state that I have made every effort to propose this as a solution to integrate with KVM, the most recent of which is April: http://lkml.org/lkml/2009/4/21/408 If you read through the various vbus related threads on LKML/KVM posted this year, I think you will see that I made numerous polite offerings to work with people on finding a common solution here, including Michael. In the end, Michael decided that go a different route using some of the ideas proposed in vbus + venet-tap to create vhost-net. This is fine, and I respect his decision. But do not try to pin fracturing on me, because I tried everything to avoid it. :) Given your post, there are only three possible ways to continue kvm guest driver development: - develop virtio/vhost, drop vbus/venet - develop vbus/venet, drop virtio - develop both Developing both fractures the community. Dropping virtio invalidates the installed base and Windows effort. There were no strong technical reasons shown in favor of the remaining option. Since I still disagree with the fundamental approach of how KVM IO works, What's that? Prior to my effort, KVM was humming along at the status quo and I came along with a closer eye and almost doubled the throughput and cut latency by 78%. Given an apparent disagreement with aspects of my approach, Michael went off and created a counter example that was motivated by my performance findings. Oh, virtio-net performance was a thorn in our side for a long time. I agree that venet was an additional spur. Therefore, even if Avi ultimately accepts Michaels vhost approach instead of mine, Linux as a hypervisor platform has been significantly _improved_ by a little friendly competition, not somehow damaged by it. Certainly, and irqfd/ioeventfd are a net win in any case. 4) Lastly, these patches are almost entirely just stand alone Linux drivers that do not affect KVM if KVM doesn't wish to acknowledge them. Its just like any of the other numerous drivers that are accepted upstream into Linux every day. The only maintained subsystem that is technically touched by this series is netdev, and David Miller already approved of the relevant patch's inclusion: http://lkml.org/lkml/2009/8/3/505 So with all due respect, where is the problem? The patches are all professionally developed according to the Linux coding standards, pass checkpatch, are GPL'ed, and work with a freely available platform which you can download today (http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=summary) As I mentioned before, I have no technical objections to the patches, I just wish the effort could be concentrated in one direction. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/17/2009 06:09 PM, Gregory Haskins wrote: We've been through this before I believe. If you can point out specific differences that make venet outperform virtio-net I'll be glad to hear (and steal) them though. You sure know how to convince someone to collaborate with you, eh? If I've offended you, I apologize. Unforunately, i've answered that question numerous times, but it apparently falls on deaf ears. Well, I'm sorry, I truly don't think I've had that question answered with specificity. I'm really interested in it (out of a selfish desire to improve virtio), but the only comment I recall from you was to the effect that the virtio rings were better than ioq in terms of cache placement. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Mon, Aug 17, 2009 at 10:14:56AM -0400, Gregory Haskins wrote: Case in point: Take an upstream kernel and you can modprobe the vbus-pcibridge in and virtio devices will work over that transport unmodified. See http://lkml.org/lkml/2009/8/6/244 for details. The modprobe you are talking about would need to be done in guest kernel, correct? OTOH, Michael's patch is purely targeted at improving virtio-net on kvm, and its likewise constrained by various limitations of that decision (such as its reliance of the PCI model, and the kvm memory scheme). vhost is actually not related to PCI in any way. It simply leaves all setup for userspace to do. And the memory scheme was intentionally separated from kvm so that it can easily support e.g. lguest. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Ingo Molnar wrote: * Gregory Haskins gregory.hask...@gmail.com wrote: Hi Ingo, 1) First off, let me state that I have made every effort to propose this as a solution to integrate with KVM, the most recent of which is April: http://lkml.org/lkml/2009/4/21/408 If you read through the various vbus related threads on LKML/KVM posted this year, I think you will see that I made numerous polite offerings to work with people on finding a common solution here, including Michael. In the end, Michael decided that go a different route using some of the ideas proposed in vbus + venet-tap to create vhost-net. This is fine, and I respect his decision. But do not try to pin fracturing on me, because I tried everything to avoid it. :) That's good. So if virtio is fixed to be as fast as vbus, and if there's no other techical advantages of vbus over virtio you'll be glad to drop vbus and stand behind virtio? To reiterate: vbus and virtio are not mutually exclusive. The virtio device model rides happily on top of the vbus bus model. This is primarily a question of the virtio-pci adapter, vs virtio-vbus. For more details, see this post: http://lkml.org/lkml/2009/8/6/244 There is a secondary question of venet (a vbus native device) verses virtio-net (a virtio native device that works with PCI or VBUS). If this contention is really around venet vs virtio-net, I may possibly conceed and retract its submission to mainline. I've been pushing it to date because people are using it and I don't see any reason that the driver couldn't be upstream. Also, are you willing to help virtio to become faster? Yes, that is not a problem. Note that virtio in general, and virtio-net/venet in particular are not the primary goal here, however. Improved 802.x and block IO are just positive side-effects of the effort. I started with 802.x networking just to demonstrate the IO layer capabilities, and to test it. It ended up being so good on contrast to existing facilities, that developers in the vbus community started using it for production development. Ultimately, I created vbus to address areas of performance that have not yet been addressed in things like KVM. Areas such as real-time guests, or RDMA (host bypass) interfaces. I also designed it in such a way that we could, in theory, write one set of (linux-based) backends, and have them work across a variety of environments (such as containers/VMs like KVM, lguest, openvz, but also physical systems like blade enclosures and clusters, or even applications running on the host). Or do you have arguments why that is impossible to do so and why the only possible solution is vbus? Avi says no such arguments were offered so far. Not for lack of trying. I think my points have just been missed everytime I try to describe them. ;) Basically I write a message very similar to this one, and the next conversation starts back from square one. But I digress, let me try again.. Noting that this discussion is really about the layer *below* virtio, not virtio itself (e.g. PCI vs vbus). Lets start with a little background: -- Background -- So on one level, we have the resource-container technology called vbus. It lets you create a container on the host, fill it with virtual devices, and assign that container to some context (such as a KVM guest). These devices are LKMs, and each device has a very simple verb namespace consisting of a synchronous call() method, and a shm() method for establishing async channels. The async channels are just shared-memory with a signal path (e.g. interrupts and hypercalls), which the device+driver can use to overlay things like rings (virtqueues, IOQs), or other shared-memory based constructs of their choosing (such as a shared table). The signal path is designed to minimize enter/exits and reduce spurious signals in a unified way (see shm-signal patch). call() can be used both for config-space like details, as well as fast-path messaging that require synchronous behavior (such as guest scheduler updates). All of this is managed via sysfs/configfs. On the guest, we have a vbus-proxy which is how the guest gets access to devices assigned to its container. (as an aside, virtio devices can be populated in the container, and then surfaced up to the virtio-bus via that virtio-vbus patch I mentioned). There is a thing called a vbus-connector which is the guest specific part. Its job is to connect the vbus-proxy in the guest, to the vbus container on the host. How it does its job is specific to the connector implementation, but its role is to transport messages between the guest and the host (such as for call() and shm() invocations) and to handle things like discovery and hotswap. -- Issues -- Out of all this, I think the biggest contention point is the design of the vbus-connector that I use in AlacrityVM (Avi, correct me if I am wrong and you object to other aspects as well). I suspect that if I had
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Ingo Molnar wrote: * Gregory Haskins gregory.hask...@gmail.com wrote: Avi Kivity wrote: On 08/17/2009 05:16 PM, Gregory Haskins wrote: My opinion is that this is a duplication of effort and we'd be better off if everyone contributed to enhancing virtio, which already has widely deployed guest drivers and non-Linux guest support. It may have merit if it is proven that it is technically superior to virtio (and I don't mean some benchmark in some point in time; I mean design wise). So far I haven't seen any indications that it is. The design is very different, so hopefully I can start to convince you why it might be interesting. We've been through this before I believe. If you can point out specific differences that make venet outperform virtio-net I'll be glad to hear (and steal) them though. You sure know how to convince someone to collaborate with you, eh? Unforunately, i've answered that question numerous times, but it apparently falls on deaf ears. I'm trying to find the relevant discussion. The link you gave in the previous mail: http://lkml.org/lkml/2009/4/21/408 does not offer any design analysis of vbus versus virtio, and why the only fix to virtio is vbus. It offers a comparison and a blanket statement that vbus is superior but no arguments. (If you've already explained in a past thread then please give me an URL to that reply if possible, or forward me that prior reply. Thanks!) Sorry, it was a series of long threads from quite a while back. I will see if I can find some references, but it might be easier to just start fresh (see the last reply I sent). Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Michael S. Tsirkin wrote: On Mon, Aug 17, 2009 at 10:14:56AM -0400, Gregory Haskins wrote: Case in point: Take an upstream kernel and you can modprobe the vbus-pcibridge in and virtio devices will work over that transport unmodified. See http://lkml.org/lkml/2009/8/6/244 for details. The modprobe you are talking about would need to be done in guest kernel, correct? Yes, and your point is? unmodified (pardon the psuedo pun) modifies virtio, not guest. It means you can take an off-the-shelf kernel with off-the-shelf virtio (ala distro-kernel) and modprobe vbus-pcibridge and get alacrityvm acceleration. It is not a design goal of mine to forbid the loading of a new driver, so I am ok with that requirement. OTOH, Michael's patch is purely targeted at improving virtio-net on kvm, and its likewise constrained by various limitations of that decision (such as its reliance of the PCI model, and the kvm memory scheme). vhost is actually not related to PCI in any way. It simply leaves all setup for userspace to do. And the memory scheme was intentionally separated from kvm so that it can easily support e.g. lguest. I think you have missed my point. I mean that vhost requires a separate bus-model (ala qemu-pci). And no, your memory scheme is not separated, at least, not very well. It still assumes memory-regions and copy_to_user(), which is very kvm-esque. Vbus has people using things like userspace containers (no regions), and physical hardware (dma controllers, so no regions or copy_to_user) so your scheme quickly falls apart once you get away from KVM. Don't get me wrong: That design may have its place. Perhaps you only care about fixing KVM, which is a perfectly acceptable strategy. Its just not a strategy that I think is the best approach. Essentially you are promoting the proliferation of competing backends, and I am trying to unify them (which is ironic that this thread started with concerns I was fragmenting things ;). The bottom line is, you have a simpler solution that is more finely targeted at KVM and virtio-networking. It fixes probably a lot of problems with the existing implementation, but it still has limitations. OTOH, what I am promoting is more complex, but more flexible. That is the tradeoff. You can't have both ;) So do not for one second think that what you implemented is equivalent, because they are not. In fact, I believe I warned you about this potential problem when you decided to implement your own version. I think I said something to the effect of you will either have a subset of functionality, or you will ultimately reinvent what I did. Right now you are in the subset phase. Perhaps someday you will be in the complete-reinvent phase. Why you wanted to go that route when I had already worked though the issues is something perhaps only you will ever know, but I'm sure you had your reasons. But do note you could have saved yourself grief by reusing my already implemented and tested variant, as I politely offered to work with you on making it meet your needs. Kind Regards -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Gregory Haskins wrote: Note: No one has ever proposed to change the virtio-ABI. virtio-pci is part of the virtio ABI. You are proposing changing that. You cannot add new kernel modules to guests and expect them to remain supported. So there is value in reusing existing ABIs I think the reason vbus gets better performance for networking today is that vbus' backends are in the kernel while virtio's backends are currently in userspace. Well, with all due respect, you also said initially when I announced vbus that in-kernel doesn't matter, and tried to make virtio-net run as fast as venet from userspace ;) Given that we never saw those userspace patches from you that in fact equaled my performance, I assume you were wrong about that statement. Perhaps you were wrong about other things too? I'm wrong about a lot of things :-) I haven't yet been convinced that I'm wrong here though. One of the gray areas here is what constitutes an in-kernel backend. tun/tap is a sort of an in-kernel backend. Userspace is still involved in all of the paths. vhost seems to be an intermediate step between tun/tap and vbus. The fast paths avoid userspace completely. Many of the slow paths involve userspace still (like migration apparently). With vbus, userspace is avoided entirely. In some ways, you could argue that slirp and vbus are opposite ends of the virtual I/O spectrum. I believe strongly that we should avoid putting things in the kernel unless they absolutely have to be. I'm definitely interested in playing with vhost to see if there are ways to put even less in the kernel. In particular, I think it would be a big win to avoid knowledge of slots in the kernel by doing ring translation in userspace. This implies a userspace transition in the fast path. This may or may not be acceptable. I think this is going to be a very interesting experiment and will ultimately determine whether my intuition about the cost of dropping to userspace is right or wrong. Conversely, I am not afraid of requiring a new driver to optimize the general PV interface. In the long term, this will reduce the amount of reimplementing the same code over and over, reduce system overhead, and it adds new features not previously available (for instance, coalescing and prioritizing interrupts). I think you have a lot of ideas and I don't know that we've been able to really understand your vision. Do you have any plans on writing a paper about vbus that goes into some of your thoughts in detail? If that's the case, then I don't see any reason to adopt vbus unless Greg things there are other compelling features over virtio. Aside from the fact that this is another confusion of the vbus/virtio relationship...yes, of course there are compelling features (IMHO) or I wouldn't be expending effort ;) They are at least compelling enough to put in AlacrityVM. This whole AlactricyVM thing is really hitting this nail with a sledgehammer. While the kernel needs to be very careful about what it pulls in, as long as you're willing to commit to ABI compatibility, we can pull code into QEMU to support vbus. Then you can just offer vbus host and guest drivers instead of forking the kernel. If upstream KVM doesn't want them, that's KVMs decision and I am fine with that. Simply never apply my qemu patches to qemu-kvm.git, and KVM will be blissfully unaware if vbus is present. As I mentioned before, if you submit patches to upstream QEMU, we'll apply them (after appropriate review). As I said previously, we want to avoid user confusion as much as possible. Maybe this means limiting it to -device or a separate machine type. I'm not sure, but that's something we can discussion on qemu-devel. I do hope that I can convince the KVM community otherwise, however. :) Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
* Anthony Liguori anth...@codemonkey.ws wrote: Ingo Molnar wrote: * Gregory Haskins ghask...@novell.com wrote: This will generally be used for hypervisors to publish any host-side virtual devices up to a guest. The guest will have the opportunity to consume any devices present on the vbus-proxy as if they were platform devices, similar to existing buses like PCI. Signed-off-by: Gregory Haskins ghask...@novell.com --- MAINTAINERS |6 ++ arch/x86/Kconfig|2 + drivers/Makefile|1 drivers/vbus/Kconfig| 14 drivers/vbus/Makefile |3 + drivers/vbus/bus-proxy.c| 152 +++ include/linux/vbus_driver.h | 73 + 7 files changed, 251 insertions(+), 0 deletions(-) create mode 100644 drivers/vbus/Kconfig create mode 100644 drivers/vbus/Makefile create mode 100644 drivers/vbus/bus-proxy.c create mode 100644 include/linux/vbus_driver.h Is there a consensus on this with the KVM folks? (i've added the KVM list to the Cc:) I'll let Avi comment about it from a KVM perspective but from a QEMU perspective, I don't think we want to support two paravirtual IO frameworks. I'd like to see them converge. Since there's an install base of guests today with virtio drivers, there really ought to be a compelling reason to change the virtio ABI in a non-backwards compatible way. This means convergence really ought to be adding features to virtio. I agree. While different paravirt drivers are inevitable for things that are externally constrained (say support different hypervisors), doing different _Linux internal_ paravirt drivers looks plain stupid and counter-productive. It splits testing and development. So either the vbus code replaces virtio (for technical merits such as performance and other details), or virtio is enhanced with the vbus performance enhancements. On paper, I don't think vbus really has any features over virtio. vbus does things in different ways (paravirtual bus vs. pci for discovery) but I think we're happy with how virtio does things today. I think the reason vbus gets better performance for networking today is that vbus' backends are in the kernel while virtio's backends are currently in userspace. Since Michael has a functioning in-kernel backend for virtio-net now, I suspect we're weeks (maybe days) away from performance results. My expectation is that vhost + virtio-net will be as good as venet + vbus. If that's the case, then I don't see any reason to adopt vbus unless Greg things there are other compelling features over virtio. Keeping virtio's backend in user-space was rather stupid IMHO. Having the _option_ to piggyback to user-space (for flexibility, extensibility, etc.) is OK, but not having kernel acceleration is bad. Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/15/2009 01:32 PM, Ingo Molnar wrote: This will generally be used for hypervisors to publish any host-side virtual devices up to a guest. The guest will have the opportunity to consume any devices present on the vbus-proxy as if they were platform devices, similar to existing buses like PCI. Is there a consensus on this with the KVM folks? (i've added the KVM list to the Cc:) My opinion is that this is a duplication of effort and we'd be better off if everyone contributed to enhancing virtio, which already has widely deployed guest drivers and non-Linux guest support. It may have merit if it is proven that it is technically superior to virtio (and I don't mean some benchmark in some point in time; I mean design wise). So far I haven't seen any indications that it is. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
* Gregory Haskins ghask...@novell.com wrote: This will generally be used for hypervisors to publish any host-side virtual devices up to a guest. The guest will have the opportunity to consume any devices present on the vbus-proxy as if they were platform devices, similar to existing buses like PCI. Signed-off-by: Gregory Haskins ghask...@novell.com --- MAINTAINERS |6 ++ arch/x86/Kconfig|2 + drivers/Makefile|1 drivers/vbus/Kconfig| 14 drivers/vbus/Makefile |3 + drivers/vbus/bus-proxy.c| 152 +++ include/linux/vbus_driver.h | 73 + 7 files changed, 251 insertions(+), 0 deletions(-) create mode 100644 drivers/vbus/Kconfig create mode 100644 drivers/vbus/Makefile create mode 100644 drivers/vbus/bus-proxy.c create mode 100644 include/linux/vbus_driver.h Is there a consensus on this with the KVM folks? (i've added the KVM list to the Cc:) Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
Ingo Molnar wrote: * Gregory Haskins ghask...@novell.com wrote: This will generally be used for hypervisors to publish any host-side virtual devices up to a guest. The guest will have the opportunity to consume any devices present on the vbus-proxy as if they were platform devices, similar to existing buses like PCI. Signed-off-by: Gregory Haskins ghask...@novell.com --- MAINTAINERS |6 ++ arch/x86/Kconfig|2 + drivers/Makefile|1 drivers/vbus/Kconfig| 14 drivers/vbus/Makefile |3 + drivers/vbus/bus-proxy.c| 152 +++ include/linux/vbus_driver.h | 73 + 7 files changed, 251 insertions(+), 0 deletions(-) create mode 100644 drivers/vbus/Kconfig create mode 100644 drivers/vbus/Makefile create mode 100644 drivers/vbus/bus-proxy.c create mode 100644 include/linux/vbus_driver.h Is there a consensus on this with the KVM folks? (i've added the KVM list to the Cc:) I'll let Avi comment about it from a KVM perspective but from a QEMU perspective, I don't think we want to support two paravirtual IO frameworks. I'd like to see them converge. Since there's an install base of guests today with virtio drivers, there really ought to be a compelling reason to change the virtio ABI in a non-backwards compatible way. This means convergence really ought to be adding features to virtio. On paper, I don't think vbus really has any features over virtio. vbus does things in different ways (paravirtual bus vs. pci for discovery) but I think we're happy with how virtio does things today. I think the reason vbus gets better performance for networking today is that vbus' backends are in the kernel while virtio's backends are currently in userspace. Since Michael has a functioning in-kernel backend for virtio-net now, I suspect we're weeks (maybe days) away from performance results. My expectation is that vhost + virtio-net will be as good as venet + vbus. If that's the case, then I don't see any reason to adopt vbus unless Greg things there are other compelling features over virtio. Regards, Anthony Liguori Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html