Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Rusty Russell wrote: On Friday 07 November 2008 18:17:54 Zhao, Yu wrote: Greg KH wrote: On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote: Greg KH wrote: On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote: I don't think we really know what the One True Usage model is for VF devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has some ideas. I bet there's other people who have other ideas too. I'd love to hear those ideas. We've been talking about avoiding hardware passthrough entirely and just backing a virtio-net backend driver by a dedicated VF in the host. That avoids a huge amount of guest-facing complexity, let's migration Just Work, and should give the same level of performance. This can be commonly used not only with VF -- devices that have multiple DMA queues (e.g., Intel VMDq, Neterion Xframe) and even traditional devices can also take the advantage of this. CC Rusty Russel in case he has more comments. Yes, even dumb devices could use this mechanism if you wanted to bind an entire device solely to one guest. We don't have network infrastructure for this today, but my thought was to do something in dev_alloc_skb and dev_kfree_skb et al. Is there any discussion about this on the netdev? Any prototype available? If not, I'd like to create one and evaluate the performance of virtio-net solution again the hardware passthrough. Thanks, Yu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Anthony Liguori wrote: I don't think it's established that PV/VF will have less latency than using virtio-net. virtio-net requires a world switch to send a group of packets. The cost of this (if it stays in kernel) is only a few thousand cycles on the most modern processors. Using VT-d means that for every DMA fetch that misses in the IOTLB, you potentially have to do four memory fetches to main memory. There will be additional packet latency using VT-d compared to native, it's just not known how much at this time. If the IOTLB has intermediate TLB entries like the processor, we're talking just one or two fetches. That's a lot less than the cacheline bouncing that virtio and kvm interrupt injection incur right now. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Avi Kivity wrote: Anthony Liguori wrote: I don't think it's established that PV/VF will have less latency than using virtio-net. virtio-net requires a world switch to send a group of packets. The cost of this (if it stays in kernel) is only a few thousand cycles on the most modern processors. Using VT-d means that for every DMA fetch that misses in the IOTLB, you potentially have to do four memory fetches to main memory. There will be additional packet latency using VT-d compared to native, it's just not known how much at this time. If the IOTLB has intermediate TLB entries like the processor, we're talking just one or two fetches. That's a lot less than the cacheline bouncing that virtio and kvm interrupt injection incur right now. The PCI SIG Address Translation Service (ATS) specifies a way that uses an Address Translation Cache (ATC) in the Endpoint to reduce the latency. The Linux kernel support for ATS capability will come soon. Thanks, Yu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Fri, Nov 14, 2008 at 03:56:19PM +0800, Zhao, Yu wrote: Hi Greg KH, I updated PF driver to use latest SR-IOV API in the patch set v6, and attached it. Please take a look and please let us know if you have any comments. Is this driver already upstream? If so, can you just send the diff that adds the SR-IOV changes to it? Otherwise it's a bit hard to just pick out those pieces, right? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
It's not upstream yet. However, if you grep through for CONFIG_PCI_IOV you'll see all the relevant code in those sections. - Greg (Rose that is) -Original Message- From: Greg KH [mailto:[EMAIL PROTECTED] Sent: Friday, November 14, 2008 9:40 AM To: Zhao, Yu Cc: Rose, Gregory V; Dong, Eddie; kvm@vger.kernel.org; Barnes, Jesse; Ronciak, John; Nakajima, Jun; Yu, Wilfred; Li, Xin B; Li, Susie Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support On Fri, Nov 14, 2008 at 03:56:19PM +0800, Zhao, Yu wrote: Hi Greg KH, I updated PF driver to use latest SR-IOV API in the patch set v6, and attached it. Please take a look and please let us know if you have any comments. Is this driver already upstream? If so, can you just send the diff that adds the SR-IOV changes to it? Otherwise it's a bit hard to just pick out those pieces, right? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Fri, Nov 14, 2008 at 09:48:15AM -0800, Rose, Gregory V wrote: It's not upstream yet. However, if you grep through for CONFIG_PCI_IOV you'll see all the relevant code in those sections. Wouldn't it make more sense for the IOV code to be reworked to not require #ifdefs in a driver? There seems to be a bit too much #ifdef code in this driver right now :( What is the status of submitting it upstream and getting netdev review of it? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
We have been waiting for the kernel IOV work to be in place upstream completely before we submitted the drivers. Jeff Garzik won't take driver changes that have no user. So as the kernel work completes, we'll submit the driver(s). We have been talking about putting out the changes as RFC. If that make sense we can do that. Cheers, John --- Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety., Benjamin Franklin 1755 -Original Message- From: Greg KH [mailto:[EMAIL PROTECTED] Sent: Friday, November 14, 2008 10:39 AM To: Rose, Gregory V Cc: Zhao, Yu; Dong, Eddie; kvm@vger.kernel.org; Barnes, Jesse; Ronciak, John; Nakajima, Jun; Yu, Wilfred; Li, Xin B; Li, Susie Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support On Fri, Nov 14, 2008 at 09:48:15AM -0800, Rose, Gregory V wrote: It's not upstream yet. However, if you grep through for CONFIG_PCI_IOV you'll see all the relevant code in those sections. Wouldn't it make more sense for the IOV code to be reworked to not require #ifdefs in a driver? There seems to be a bit too much #ifdef code in this driver right now :( What is the status of submitting it upstream and getting netdev review of it? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
A: No. Q: Should I include quotations after my reply? On Fri, Nov 14, 2008 at 11:49:52AM -0700, Ronciak, John wrote: We have been waiting for the kernel IOV work to be in place upstream completely before we submitted the drivers. Jeff Garzik won't take driver changes that have no user. So as the kernel work completes, we'll submit the driver(s). We have been talking about putting out the changes as RFC. If that make sense we can do that. That would make sense, as I had to ask multiple times if a driver was actually using the IOV code that we could review to see if the api was sane for it. good luck, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Sat, Nov 08, 2008 at 02:48:25AM +0800, Greg KH wrote: On Fri, Nov 07, 2008 at 11:17:40PM +0800, Yu Zhao wrote: While we are arguing what the software model the SR-IOV should be, let me ask two simple questions first: 1, What does the SR-IOV looks like? 2, Why do we need to support it? I don't think we need to worry about those questions, as we can see what the SR-IOV interface looks like by looking at the PCI spec, and we know Linux needs to support it, as Linux needs to support everything :) (note, community members that can not see the PCI specs at this point in time, please know that we are working on resolving these issues, hopefully we will have some good news within a month or so.) Thanks for doing this! As you know the Linux kernel is the base of various virtual machine monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support in the kernel because mostly it helps high-end users (IT departments, HPC, etc.) to share limited hardware resources among hundreds or even thousands virtual machines and hence reduce the cost. How can we make these virtual machine monitors utilize the advantage of SR-IOV without spending too much effort meanwhile remaining architectural correctness? I believe making VF represent as much closer as a normal PCI device (struct pci_dev) is the best way in current situation, because this is not only what the hardware designers expect us to do but also the usage model that KVM, Xen and other VMMs have already supported. But would such an api really take advantage of the new IOV interfaces that are exposed by the new device type? The SR-IOV is a very straightforward capability -- it can only reside in the Physical Function's (the real device) config space and controls the allocation of the Virtual Function by several registers. What we can do in the PCI layer is to make the SR-IOV device spawn VF upon user request, and register VF to the PCI core. The functionality of SR-IOV device (both the PF and VF) can vary at a large range and their drivers (same as normal PCI device driver) are responsible for handling device specific stuff. So it looks like we can get all work done in the PCI layer with only two interfaces: one for the PF driver to register itself as a SR-IOV capable driver, expose the sysfs (or ioctl) interface to receive user request, and allocate 'pci_dev' for VF; another one to cleanup all stuff when the PF driver unregisters itself (e.g., the driver is removed or the device is going to power-saving mode.). I agree that API in the SR-IOV pacth is arguable and the concerns such as lack of PF driver, etc. are also valid. But I personally think these stuff are not essential problems to me and other SR-IOV driver developers. How can the lack of a PF driver not be a valid concern at this point in time? Without such a driver written, how can we know that the SR-IOV interface as created is sufficient, or that it even works properly? Here's what I see we need to have before we can evaluate if the IOV core PCI patches are acceptable: - a driver that uses this interface - a PF driver that uses this interface. Without those, we can't determine if the infrastructure provided by the IOV core even is sufficient, right? Yes, using a PF driver to evaluate the SR-IOV core is necessary. And only the PF driver can use the interface since the VF shouldn't have the SR-IOV capability in its config space according to the spec. Regards, Yu Rumor has it that there is both of the above things floating around, can someone please post them to the linux-pci list so that we can see how this all works together? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-pci in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Andi Kleen wrote: Anthony Liguori [EMAIL PROTECTED] writes: What we would rather do in KVM, is have the VFs appear in the host as standard network devices. We would then like to back our existing PV driver to this VF directly bypassing the host networking stack. A key feature here is being able to fill the VF's receive queue with guest memory instead of host kernel memory so that you can get zero-copy receive traffic. This will perform just as well as doing passthrough (at least) and avoid all that ugliness of dealing with SR-IOV in the guest. But you shift a lot of ugliness into the host network stack again. Not sure that is a good trade off. Also it would always require context switches and I believe one of the reasons for the PV/VF model is very low latency IO and having heavyweight switches to the host and back would be against that. I don't think it's established that PV/VF will have less latency than using virtio-net. virtio-net requires a world switch to send a group of packets. The cost of this (if it stays in kernel) is only a few thousand cycles on the most modern processors. Using VT-d means that for every DMA fetch that misses in the IOTLB, you potentially have to do four memory fetches to main memory. There will be additional packet latency using VT-d compared to native, it's just not known how much at this time. Regards, Anthony Liguori -Andi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Sun, Nov 09, 2008 at 09:37:20PM +0200, Avi Kivity wrote: Greg KH wrote: On Sun, Nov 09, 2008 at 02:44:06PM +0200, Avi Kivity wrote: Greg KH wrote: It's that second part that I'm worried about. How is that going to happen? Do you have any patches that show this kind of assignment? For kvm, this is in 2.6.28-rc. Where? I just looked and couldn't find anything, but odds are I was looking in the wrong place :( arch/x86/kvm/vtd.c: iommu integration (allows assigning the device's memory resources) That file is not in 2.6.28-rc4 :( virt/kvm/irq*: interrupt redirection (allows assigning the device's interrupt resources) I only see virt/kvm/irq_comm.c in 2.6.28-rc4. the rest (pci config space, pio redirection) are in userspace. So you don't need these pci core changes at all? Note there are two ways to assign a device to a guest: - run the VF driver in the guest: this has the advantage of best performance, but requires pinning all guest memory, makes live migration a tricky proposition, and ties the guest to the underlying hardware. Is this what you would prefer for kvm? It's not my personal preference, but it is a supported configuration. For some use cases it is the only one that makes sense. Again, VF-in-guest and VF-in-host both have their places. And since Linux can be both guest and host, it's best if the VF driver knows nothing about SR-IOV; it's just a pci driver. The PF driver should emulate anything that SR-IOV does not provide (like missing pci config space). Yes, we need both. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Greg KH wrote: It's that second part that I'm worried about. How is that going to happen? Do you have any patches that show this kind of assignment? For kvm, this is in 2.6.28-rc. Note there are two ways to assign a device to a guest: - run the VF driver in the guest: this has the advantage of best performance, but requires pinning all guest memory, makes live migration a tricky proposition, and ties the guest to the underlying hardware. - run the VF driver in the host, and use virtio to connect the guest to the host: allows paging the guest and allows straightforward live migration, but reduces performance, and hides any features not exposed by virtio from the guest. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Muli Ben-Yehuda wrote: We've been talking about avoiding hardware passthrough entirely and just backing a virtio-net backend driver by a dedicated VF in the host. That avoids a huge amount of guest-facing complexity, let's migration Just Work, and should give the same level of performance. I don't believe that it will, and every benchmark I've seen or have done so far shows a significant performance gap between virtio and direct assignment, even on 1G ethernet. I am willing however to reserve judgement until someone implements your suggestion and actually measures it, preferably on 10G ethernet. Right now virtio copies data, and has other inefficiencies. With a dedicated VF, we can eliminate the copies. CPU utilization and latency will be worse. If we can limit the slowdowns to an acceptable amount, the simplicity and other advantages of VF-in-host may outweigh the performance degradation. No doubt device assignment---and SR-IOV in particular---are complex, but I hardly think ignoring it as you seem to propose is the right approach. I agree. We should hedge our bets and support both models. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Andi Kleen wrote: Anthony Liguori [EMAIL PROTECTED] writes: What we would rather do in KVM, is have the VFs appear in the host as standard network devices. We would then like to back our existing PV driver to this VF directly bypassing the host networking stack. A key feature here is being able to fill the VF's receive queue with guest memory instead of host kernel memory so that you can get zero-copy receive traffic. This will perform just as well as doing passthrough (at least) and avoid all that ugliness of dealing with SR-IOV in the guest. But you shift a lot of ugliness into the host network stack again. Not sure that is a good trade off. The net effect will be positive. We will finally have aio networking from userspace (can send process memory without resorting to sendfile()), and we'll be able to assign a queue to a process (which will enable all sorts of interesting high performance things; basically VJ channels without kernel involvement). Also it would always require context switches and I believe one of the reasons for the PV/VF model is very low latency IO and having heavyweight switches to the host and back would be against that. It's true that latency would suffer (or alternatively cpu consumption would increase). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Greg KH wrote: We've been talking about avoiding hardware passthrough entirely and just backing a virtio-net backend driver by a dedicated VF in the host. That avoids a huge amount of guest-facing complexity, let's migration Just Work, and should give the same level of performance. Does that involve this patch set? Or a different type of interface. So long as the VF is exposed as a standalone PCI device, it's the same interface. In fact you can take a random PCI card and expose it to a guest this way; it doesn't have to be SR-IOV. Of course, with a standard PCI card you won't get much sharing (a quad port NIC will be good for four guests). We'll need other changes in the network stack, but these are orthogonal to SR-IOV. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Matthew Wilcox wrote: What we would rather do in KVM, is have the VFs appear in the host as standard network devices. We would then like to back our existing PV driver to this VF directly bypassing the host networking stack. A key feature here is being able to fill the VF's receive queue with guest memory instead of host kernel memory so that you can get zero-copy receive traffic. This will perform just as well as doing passthrough (at least) and avoid all that ugliness of dealing with SR-IOV in the guest. This argues for ignoring the SR-IOV mess completely. It does, but VF-in-host is not the only model that we want to support. It's just the most appealing. There will definitely be people who want to run VF-in-guest. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Sun, Nov 09, 2008 at 02:44:06PM +0200, Avi Kivity wrote: Greg KH wrote: It's that second part that I'm worried about. How is that going to happen? Do you have any patches that show this kind of assignment? For kvm, this is in 2.6.28-rc. Where? I just looked and couldn't find anything, but odds are I was looking in the wrong place :( Note there are two ways to assign a device to a guest: - run the VF driver in the guest: this has the advantage of best performance, but requires pinning all guest memory, makes live migration a tricky proposition, and ties the guest to the underlying hardware. Is this what you would prefer for kvm? - run the VF driver in the host, and use virtio to connect the guest to the host: allows paging the guest and allows straightforward live migration, but reduces performance, and hides any features not exposed by virtio from the guest. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Greg KH wrote: On Sun, Nov 09, 2008 at 02:44:06PM +0200, Avi Kivity wrote: Greg KH wrote: It's that second part that I'm worried about. How is that going to happen? Do you have any patches that show this kind of assignment? For kvm, this is in 2.6.28-rc. Where? I just looked and couldn't find anything, but odds are I was looking in the wrong place :( arch/x86/kvm/vtd.c: iommu integration (allows assigning the device's memory resources) virt/kvm/irq*: interrupt redirection (allows assigning the device's interrupt resources) the rest (pci config space, pio redirection) are in userspace. Note there are two ways to assign a device to a guest: - run the VF driver in the guest: this has the advantage of best performance, but requires pinning all guest memory, makes live migration a tricky proposition, and ties the guest to the underlying hardware. Is this what you would prefer for kvm? It's not my personal preference, but it is a supported configuration. For some use cases it is the only one that makes sense. Again, VF-in-guest and VF-in-host both have their places. And since Linux can be both guest and host, it's best if the VF driver knows nothing about SR-IOV; it's just a pci driver. The PF driver should emulate anything that SR-IOV does not provide (like missing pci config space). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support Importance: High On Fri, Nov 07, 2008 at 11:17:40PM +0800, Yu Zhao wrote: While we are arguing what the software model the SR-IOV should be, let me ask two simple questions first: 1, What does the SR-IOV looks like? 2, Why do we need to support it? I don't think we need to worry about those questions, as we can see what the SR-IOV interface looks like by looking at the PCI spec, and we know Linux needs to support it, as Linux needs to support everything :) (note, community members that can not see the PCI specs at this point in time, please know that we are working on resolving these issues, hopefully we will have some good news within a month or so.) As you know the Linux kernel is the base of various virtual machine monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support in the kernel because mostly it helps high-end users (IT departments, HPC, etc.) to share limited hardware resources among hundreds or even thousands virtual machines and hence reduce the cost. How can we make these virtual machine monitors utilize the advantage of SR-IOV without spending too much effort meanwhile remaining architectural correctness? I believe making VF represent as much closer as a normal PCI device (struct pci_dev) is the best way in current situation, because this is not only what the hardware designers expect us to do but also the usage model that KVM, Xen and other VMMs have already supported. But would such an api really take advantage of the new IOV interfaces that are exposed by the new device type? I agree with what Yu says. The idea is to have hardware capabilities to virtualize a PCI device in a way that those virtual devices can represent full PCI devices. The advantage of that is that those virtual device can then be used like any other standard PCI device, meaning we can use existing OS tools, configuration mechanism etc. to start working with them. Also, when using a virtualization-based system, e.g. Xen or KVM, we do not need to introduce new mechanisms to make use of SR-IOV, because we can handle VFs as full PCI devices. A virtual PCI device in hardware (a VF) can be as powerful or complex as you like, or it can be very simple. But the big advantage of SR-IOV is that hardware presents a complete PCI device to the OS - as opposed to some resources, or queues, that need specific new configuration and assignment mechanisms in order to use them with a guest OS (like, for example, VMDq or similar technologies). Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
-Original Message- From: Fischer, Anna [mailto:[EMAIL PROTECTED] Sent: Saturday, November 08, 2008 3:10 AM To: Greg KH; Yu Zhao Cc: Matthew Wilcox; Anthony Liguori; H L; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Chiang, Alexander; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; kvm@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Leonid Grossman; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support But would such an api really take advantage of the new IOV interfaces that are exposed by the new device type? I agree with what Yu says. The idea is to have hardware capabilities to virtualize a PCI device in a way that those virtual devices can represent full PCI devices. The advantage of that is that those virtual device can then be used like any other standard PCI device, meaning we can use existing OS tools, configuration mechanism etc. to start working with them. Also, when using a virtualization-based system, e.g. Xen or KVM, we do not need to introduce new mechanisms to make use of SR-IOV, because we can handle VFs as full PCI devices. A virtual PCI device in hardware (a VF) can be as powerful or complex as you like, or it can be very simple. But the big advantage of SR-IOV is that hardware presents a complete PCI device to the OS - as opposed to some resources, or queues, that need specific new configuration and assignment mechanisms in order to use them with a guest OS (like, for example, VMDq or similar technologies). Anna Ditto. Taking netdev interface as an example - a queue pair is a great way to scale across cpu cores in a single OS image, but it is just not a good way to share device across multiple OS images. The best unit of virtualization is a VF that is implemented as a complete netdev pci device (not a subset of a pci device). This way, native netdev device drivers can work for direct hw access to a VF as is, and most/all Linux networking features (including VMQ) will work in a guest. Also, guest migration for netdev interfaces (both direct and virtual) can be supported via native Linux mechanism (bonding driver), while Dom0 can retain veto power over any guest direct interface operation it deems privileged (vlan, mac address, promisc mode, bandwidth allocation between VFs, etc.). Leonid -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote: We've been talking about avoiding hardware passthrough entirely and just backing a virtio-net backend driver by a dedicated VF in the host. That avoids a huge amount of guest-facing complexity, let's migration Just Work, and should give the same level of performance. I don't believe that it will, and every benchmark I've seen or have done so far shows a significant performance gap between virtio and direct assignment, even on 1G ethernet. I am willing however to reserve judgement until someone implements your suggestion and actually measures it, preferably on 10G ethernet. No doubt device assignment---and SR-IOV in particular---are complex, but I hardly think ignoring it as you seem to propose is the right approach. Cheers, Muli -- The First Workshop on I/O Virtualization (WIOV '08) Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/ - SYSTOR 2009---The Israeli Experimental Systems Conference http://www.haifa.il.ibm.com/conferences/systor2009/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Anthony Liguori [EMAIL PROTECTED] writes: What we would rather do in KVM, is have the VFs appear in the host as standard network devices. We would then like to back our existing PV driver to this VF directly bypassing the host networking stack. A key feature here is being able to fill the VF's receive queue with guest memory instead of host kernel memory so that you can get zero-copy receive traffic. This will perform just as well as doing passthrough (at least) and avoid all that ugliness of dealing with SR-IOV in the guest. But you shift a lot of ugliness into the host network stack again. Not sure that is a good trade off. Also it would always require context switches and I believe one of the reasons for the PV/VF model is very low latency IO and having heavyweight switches to the host and back would be against that. -Andi -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
While we are arguing what the software model the SR-IOV should be, let me ask two simple questions first: 1, What does the SR-IOV looks like? 2, Why do we need to support it? I'm sure people have different understandings from their own view points. No one is wrong, but, please don't make thing complicated and don't ignore user requirements. PCI SIG and hardware vendors create such thing intending to make hardware resource in one PCI device be shared from different software instances -- I guess all of us agree with this. No doubt PF is real function in the PCI device, but VF is different? No, it also has its own Bus, Device and Function numbers, and PCI configuration space and Memory Space (MMIO). To be more detailed, it can response to and initiate PCI Transaction Layer Protocol packets, which means it can do everything a PF can in PCI level. From these obvious behaviors, we can conclude PCI SIG model VF as a normal PCI device function, even it's not standalone. As you know the Linux kernel is the base of various virtual machine monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support in the kernel because mostly it helps high-end users (IT departments, HPC, etc.) to share limited hardware resources among hundreds or even thousands virtual machines and hence reduce the cost. How can we make these virtual machine monitors utilize the advantage of SR-IOV without spending too much effort meanwhile remaining architectural correctness? I believe making VF represent as much closer as a normal PCI device (struct pci_dev) is the best way in current situation, because this is not only what the hardware designers expect us to do but also the usage model that KVM, Xen and other VMMs have already supported. I agree that API in the SR-IOV pacth is arguable and the concerns such as lack of PF driver, etc. are also valid. But I personally think these stuff are not essential problems to me and other SR-IOV driver developers. People can refine things but don't want to recreate things in another totally different way especially that way doesn't bring them obvious benefits. As I can see that we are now reaching a point that a decision must be made, I know this is such difficult thing in an open and free community but fortunately we have a lot of talented and experienced people here. So let's make it happen, and keep our loyal users happy! Thanks, Yu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Anthony Liguori wrote: Matthew Wilcox wrote: [Anna, can you fix your word-wrapping please? Your lines appear to be infinitely long which is most unpleasant to reply to] On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote: Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? A VF appears to the Linux OS as a standard (full, additional) PCI device. The driver is associated in the same way as for a normal PCI device. Ideally, you would use SR-IOV devices on a virtualized system, for example, using Xen. A VF can then be assigned to a guest domain as a full PCI device. It's not clear thats the right solution. If the VF devices are _only_ going to be used by the guest, then arguably, we don't want to create pci_devs for them in the host. (I think it _is_ the right answer, but I want to make it clear there's multiple opinions on this). The VFs shouldn't be limited to being used by the guest. Yes, VF driver running in the host is supported :-) SR-IOV is actually an incredibly painful thing. You need to have a VF driver in the guest, do hardware pass through, have a PV driver stub in the guest that's hypervisor specific (a VF is not usable on it's own), have a device specific backend in the VMM, and if you want to do live migration, have another PV driver in the guest that you can do teaming with. Just a mess. Actually not so mess. VF driver can be a plain PCI device driver and doesn't require any backend in the VMM, or hypervisor specific knowledge, if the hardware is properly designed. In this case PF driver controls hardware resource allocation for VFs and VF driver can work without any communication to PF driver or VMM. What we would rather do in KVM, is have the VFs appear in the host as standard network devices. We would then like to back our existing PV driver to this VF directly bypassing the host networking stack. A key feature here is being able to fill the VF's receive queue with guest memory instead of host kernel memory so that you can get zero-copy receive traffic. This will perform just as well as doing passthrough (at least) and avoid all that ugliness of dealing with SR-IOV in the guest. If the hardware supports both SR-IOV and IOMMU, I wouldn't suggest people to do so, because they will get better performance by directly assigning VF to the guest. However, lots of low-end machines don't have SR-IOV and IOMMU support. They may have multi queue NIC, which uses built-in L2 switch to dispense packets to different DMA queue according to MAC address. They definitely can benefit a lot if there is software support for the DMA queue hooking virtio-net backend as you suggested. This eliminates all of the mess of various drivers in the guest and all the associated baggage of doing hardware passthrough. So IMHO, having VFs be usable in the host is absolutely critical because I think it's the only reasonable usage model. Please don't worry, we have take this usage model as well as container model into account when designing SR-IOV framework for the kernel. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe linux-pci in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Fri, Nov 07, 2008 at 11:17:40PM +0800, Yu Zhao wrote: While we are arguing what the software model the SR-IOV should be, let me ask two simple questions first: 1, What does the SR-IOV looks like? 2, Why do we need to support it? I don't think we need to worry about those questions, as we can see what the SR-IOV interface looks like by looking at the PCI spec, and we know Linux needs to support it, as Linux needs to support everything :) (note, community members that can not see the PCI specs at this point in time, please know that we are working on resolving these issues, hopefully we will have some good news within a month or so.) As you know the Linux kernel is the base of various virtual machine monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support in the kernel because mostly it helps high-end users (IT departments, HPC, etc.) to share limited hardware resources among hundreds or even thousands virtual machines and hence reduce the cost. How can we make these virtual machine monitors utilize the advantage of SR-IOV without spending too much effort meanwhile remaining architectural correctness? I believe making VF represent as much closer as a normal PCI device (struct pci_dev) is the best way in current situation, because this is not only what the hardware designers expect us to do but also the usage model that KVM, Xen and other VMMs have already supported. But would such an api really take advantage of the new IOV interfaces that are exposed by the new device type? I agree that API in the SR-IOV pacth is arguable and the concerns such as lack of PF driver, etc. are also valid. But I personally think these stuff are not essential problems to me and other SR-IOV driver developers. How can the lack of a PF driver not be a valid concern at this point in time? Without such a driver written, how can we know that the SR-IOV interface as created is sufficient, or that it even works properly? Here's what I see we need to have before we can evaluate if the IOV core PCI patches are acceptable: - a driver that uses this interface - a PF driver that uses this interface. Without those, we can't determine if the infrastructure provided by the IOV core even is sufficient, right? Rumor has it that there is both of the above things floating around, can someone please post them to the linux-pci list so that we can see how this all works together? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Greetings (from a new lurker to the list), To your question Greg, yes and sort of ;-). I have started taking a look at these patches with a strong interest in understanding how they work. I've built a kernel with them and tried out a few things with real SR-IOV hardware. -- Lance Hartmann --- On Wed, 11/5/08, Greg KH [EMAIL PROTECTED] wrote: Is there any actual users of this API around yet? How was it tested as there is no hardware to test on? Which drivers are going to have to be rewritten to take advantage of this new interface? thanks, greg k-h ___ Virtualization mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/virtualization -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 07:40:12AM -0800, H L wrote: Greetings (from a new lurker to the list), Welcome! To your question Greg, yes and sort of ;-). I have started taking a look at these patches with a strong interest in understanding how they work. I've built a kernel with them and tried out a few things with real SR-IOV hardware. Did you have to modify individual drivers to take advantage of this code? It looks like the core code will run on this type of hardware, but there seems to be no real advantage until a driver is modified to use it, right? Or am I missing some great advantage to having this code without modified drivers? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. I have so far only seen Yu Zhao's 7-patch set. I've not yet looked at his subsequently tendered 15-patch set so I don't know what has changed.The hardware/firmware implementation for any given SR-IOV compatible device, will determine the extent of differences required between a PF driver and a VF driver. -- Lance Hartmann --- On Thu, 11/6/08, Greg KH [EMAIL PROTECTED] wrote: Date: Thursday, November 6, 2008, 9:43 AM On Thu, Nov 06, 2008 at 07:40:12AM -0800, H L wrote: Greetings (from a new lurker to the list), Welcome! To your question Greg, yes and sort of ;-). I have started taking a look at these patches with a strong interest in understanding how they work. I've built a kernel with them and tried out a few things with real SR-IOV hardware. Did you have to modify individual drivers to take advantage of this code? It looks like the core code will run on this type of hardware, but there seems to be no real advantage until a driver is modified to use it, right? Or am I missing some great advantage to having this code without modified drivers? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
A: No. Q: Should I include quotations after my reply? On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? Will all drivers that want to bind to a VF device need to be rewritten? I have so far only seen Yu Zhao's 7-patch set. I've not yet looked at his subsequently tendered 15-patch set so I don't know what has changed.The hardware/firmware implementation for any given SR-IOV compatible device, will determine the extent of differences required between a PF driver and a VF driver. Yeah, that's what I'm worried/curious about. Without seeing the code for such a driver, how can we properly evaluate if this infrastructure is the correct one and the proper way to do all of this? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? A VF appears to the Linux OS as a standard (full, additional) PCI device. The driver is associated in the same way as for a normal PCI device. Ideally, you would use SR-IOV devices on a virtualized system, for example, using Xen. A VF can then be assigned to a guest domain as a full PCI device. Will all drivers that want to bind to a VF device need to be rewritten? Currently, any vendor providing a SR-IOV device needs to provide a PF driver and a VF driver that runs on their hardware. A VF driver does not necessarily need to know much about SR-IOV but just run on the presented PCI device. You might want to have a communication channel between PF and VF driver though, for various reasons, if such a channel is not provided in hardware. I have so far only seen Yu Zhao's 7-patch set. I've not yet looked at his subsequently tendered 15-patch set so I don't know what has changed.The hardware/firmware implementation for any given SR-IOV compatible device, will determine the extent of differences required between a PF driver and a VF driver. Yeah, that's what I'm worried/curious about. Without seeing the code for such a driver, how can we properly evaluate if this infrastructure is the correct one and the proper way to do all of this? Yu's API allows a PF driver to register with the Linux PCI code and use it to activate VFs and allocate their resources. The PF driver needs to be modified to work with that API. While you can argue about how that API is supposed to look like, it is clear that such an API is required in some form. The PF driver needs to know when VFs are active as it might want to allocate further (device-specific) resources to VFs or initiate further (device-specific) configurations. While probably a lot of SR-IOV specific code has to be in the PF driver, there is also support required from the Linux PCI subsystem, which is to some extend provided by Yu's patches. Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote: On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? Will all drivers that want to bind to a VF device need to be rewritten? The current model being implemented by my colleagues has separate drivers for the PF (aka native) and VF devices. I don't personally believe this is the correct path, but I'm reserving judgement until I see some code. I don't think we really know what the One True Usage model is for VF devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has some ideas. I bet there's other people who have other ideas too. -- Matthew Wilcox Intel Open Source Technology Centre Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote: On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote: On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? Will all drivers that want to bind to a VF device need to be rewritten? The current model being implemented by my colleagues has separate drivers for the PF (aka native) and VF devices. I don't personally believe this is the correct path, but I'm reserving judgement until I see some code. Hm, I would like to see that code before we can properly evaluate this interface. Especially as they are all tightly tied together. I don't think we really know what the One True Usage model is for VF devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has some ideas. I bet there's other people who have other ideas too. I'd love to hear those ideas. Rumor has it, there is some Xen code floating around to support this already, is that true? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote: On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? A VF appears to the Linux OS as a standard (full, additional) PCI device. The driver is associated in the same way as for a normal PCI device. Ideally, you would use SR-IOV devices on a virtualized system, for example, using Xen. A VF can then be assigned to a guest domain as a full PCI device. It's that second part that I'm worried about. How is that going to happen? Do you have any patches that show this kind of assignment? Will all drivers that want to bind to a VF device need to be rewritten? Currently, any vendor providing a SR-IOV device needs to provide a PF driver and a VF driver that runs on their hardware. Are there any such drivers available yet? A VF driver does not necessarily need to know much about SR-IOV but just run on the presented PCI device. You might want to have a communication channel between PF and VF driver though, for various reasons, if such a channel is not provided in hardware. Agreed, but what does that channel look like in Linux? I have some ideas of what I think it should look like, but if people already have code, I'd love to see that as well. I have so far only seen Yu Zhao's 7-patch set. I've not yet looked at his subsequently tendered 15-patch set so I don't know what has changed.The hardware/firmware implementation for any given SR-IOV compatible device, will determine the extent of differences required between a PF driver and a VF driver. Yeah, that's what I'm worried/curious about. Without seeing the code for such a driver, how can we properly evaluate if this infrastructure is the correct one and the proper way to do all of this? Yu's API allows a PF driver to register with the Linux PCI code and use it to activate VFs and allocate their resources. The PF driver needs to be modified to work with that API. While you can argue about how that API is supposed to look like, it is clear that such an API is required in some form. I totally agree, I'm arguing about what that API looks like :) I want to see some code... thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
--- On Thu, 11/6/08, Greg KH [EMAIL PROTECTED] wrote: On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? I have not yet fully grocked Yu Zhao's model to answer this. That said, I would *hope* to find it on the pci_dev level. Will all drivers that want to bind to a VF device need to be rewritten? Not necessarily, or perhaps minimally; depends on hardware/firmware and actions the driver wants to take. An example here might assist. Let's just say someone has created, oh, I don't know, maybe an SR-IOV NIC. Now, for 'general' I/O operations to pass network traffic back and forth there would ideally be no difference in the actions and therefore behavior of a PF driver and a VF driver. But, what do you do in the instance a VF wants to change link-speed? As that physical characteristic affects all VFs, how do you handle that? This is where the hardware/firmware implementation part comes to play. If a VF driver performs some actions to initiate the change in link speed, the logic in the adapter could be anything like: 1. Acknowledge the request as if it were really done, but effectively ignore it. The Independent Hardware Vendor (IHV) might dictate that if you want to change any global characteristics of an adapter, you may only do so via the PF driver. Granted, this, depending on the device class, may just not be acceptable. 2. Acknowledge the request and then trigger an interrupt to the PF driver to have it assist. The PF driver might then just set the new link-speed, or it could result in a PF driver communicating by some mechanism to all of the VF driver instances that this change of link-speed was requested. 3. Acknowledge the request and perform inner PF and VF communication of this event within the logic of the card (e.g. to vote on whether or not to perform this action) with interrupts and associated status delivered to all PF and VF drivers. The list goes on. I have so far only seen Yu Zhao's 7-patch set. I've not yet looked at his subsequently tendered 15-patch set so I don't know what has changed.The hardware/firmware implementation for any given SR-IOV compatible device, will determine the extent of differences required between a PF driver and a VF driver. Yeah, that's what I'm worried/curious about. Without seeing the code for such a driver, how can we properly evaluate if this infrastructure is the correct one and the proper way to do all of this? As the example above demonstrates, that's a tough question to answer. Ideally, in my view, there would only be one driver written per SR-IOV device and it would contain the logic to do the right things based on whether its running as a PF or VF with that determination easily accomplished by testing the existence of the SR-IOV extended capability.Then, in an effort to minimize (if not eliminate) the complexities of driver-to-driver actions for fielding global events, contain as much of the logic as is possible within the adapter. Minimizing the efforts required for the device driver writers in my opinion paves the way to greater adoption of this technology. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[Anna, can you fix your word-wrapping please? Your lines appear to be infinitely long which is most unpleasant to reply to] On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote: Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? A VF appears to the Linux OS as a standard (full, additional) PCI device. The driver is associated in the same way as for a normal PCI device. Ideally, you would use SR-IOV devices on a virtualized system, for example, using Xen. A VF can then be assigned to a guest domain as a full PCI device. It's not clear thats the right solution. If the VF devices are _only_ going to be used by the guest, then arguably, we don't want to create pci_devs for them in the host. (I think it _is_ the right answer, but I want to make it clear there's multiple opinions on this). Will all drivers that want to bind to a VF device need to be rewritten? Currently, any vendor providing a SR-IOV device needs to provide a PF driver and a VF driver that runs on their hardware. A VF driver does not necessarily need to know much about SR-IOV but just run on the presented PCI device. You might want to have a communication channel between PF and VF driver though, for various reasons, if such a channel is not provided in hardware. That is one model. Another model is to provide one driver that can handle both PF and VF devices. A third model is to provide, say, a Windows VF driver and a Xen PF driver and only support Windows-on-Xen. (This last would probably be an exercise in foot-shooting, but nevertheless, I've heard it mooted). Yeah, that's what I'm worried/curious about. Without seeing the code for such a driver, how can we properly evaluate if this infrastructure is the correct one and the proper way to do all of this? Yu's API allows a PF driver to register with the Linux PCI code and use it to activate VFs and allocate their resources. The PF driver needs to be modified to work with that API. While you can argue about how that API is supposed to look like, it is clear that such an API is required in some form. The PF driver needs to know when VFs are active as it might want to allocate further (device-specific) resources to VFs or initiate further (device-specific) configurations. While probably a lot of SR-IOV specific code has to be in the PF driver, there is also support required from the Linux PCI subsystem, which is to some extend provided by Yu's patches. Everyone agrees that some support is necessary. The question is exactly what it looks like. I must confess to not having reviewed this latest patch series yet -- I'm a little burned out on patch review. -- Matthew Wilcox Intel Open Source Technology Centre Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 10:05:39AM -0800, H L wrote: --- On Thu, 11/6/08, Greg KH [EMAIL PROTECTED] wrote: On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? I have not yet fully grocked Yu Zhao's model to answer this. That said, I would *hope* to find it on the pci_dev level. Me too. Will all drivers that want to bind to a VF device need to be rewritten? Not necessarily, or perhaps minimally; depends on hardware/firmware and actions the driver wants to take. An example here might assist. Let's just say someone has created, oh, I don't know, maybe an SR-IOV NIC. Now, for 'general' I/O operations to pass network traffic back and forth there would ideally be no difference in the actions and therefore behavior of a PF driver and a VF driver. But, what do you do in the instance a VF wants to change link-speed? As that physical characteristic affects all VFs, how do you handle that? This is where the hardware/firmware implementation part comes to play. If a VF driver performs some actions to initiate the change in link speed, the logic in the adapter could be anything like: snip Yes, I agree that all of this needs to be done, somehow. It's that somehow that I am interested in trying to see how it works out. I have so far only seen Yu Zhao's 7-patch set. I've not yet looked at his subsequently tendered 15-patch set so I don't know what has changed.The hardware/firmware implementation for any given SR-IOV compatible device, will determine the extent of differences required between a PF driver and a VF driver. Yeah, that's what I'm worried/curious about. Without seeing the code for such a driver, how can we properly evaluate if this infrastructure is the correct one and the proper way to do all of this? As the example above demonstrates, that's a tough question to answer. Ideally, in my view, there would only be one driver written per SR-IOV device and it would contain the logic to do the right things based on whether its running as a PF or VF with that determination easily accomplished by testing the existence of the SR-IOV extended capability.Then, in an effort to minimize (if not eliminate) the complexities of driver-to-driver actions for fielding global events, contain as much of the logic as is possible within the adapter. Minimizing the efforts required for the device driver writers in my opinion paves the way to greater adoption of this technology. Yes, making things easier is the key here. Perhaps some of this could be hidden with a new bus type for these kinds of devices? Or a virtual bus of pci devices that the original SR-IOV device creates that corrispond to the individual virtual PCI devices? If that were the case, then it might be a lot easier in the end. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote: On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? A VF appears to the Linux OS as a standard (full, additional) PCI device. The driver is associated in the same way as for a normal PCI device. Ideally, you would use SR-IOV devices on a virtualized system, for example, using Xen. A VF can then be assigned to a guest domain as a full PCI device. It's that second part that I'm worried about. How is that going to happen? Do you have any patches that show this kind of assignment? That depends on your setup. Using Xen, you could assign the VF to a guest domain like any other PCI device, e.g. using PCI pass-through. For VMware, KVM, there are standard ways to do that, too. I currently don't see why SR-IOV devices would need any specific, non-standard mechanism for device assignment. Will all drivers that want to bind to a VF device need to be rewritten? Currently, any vendor providing a SR-IOV device needs to provide a PF driver and a VF driver that runs on their hardware. Are there any such drivers available yet? I don't know. A VF driver does not necessarily need to know much about SR-IOV but just run on the presented PCI device. You might want to have a communication channel between PF and VF driver though, for various reasons, if such a channel is not provided in hardware. Agreed, but what does that channel look like in Linux? I have some ideas of what I think it should look like, but if people already have code, I'd love to see that as well. At this point I would guess that this code is vendor specific, as are the drivers. The issue I see is that most likely drivers will run in different environments, for example, in Xen the PF driver runs in a driver domain while a VF driver runs in a guest VM. So a communication channel would need to be either Xen specific, or vendor specific. Also, a guest using the VF might run Windows while the PF might be controlled under Linux. Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 09:53:08AM -0800, Greg KH wrote: On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote: On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote: On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? Will all drivers that want to bind to a VF device need to be rewritten? The current model being implemented by my colleagues has separate drivers for the PF (aka native) and VF devices. I don't personally believe this is the correct path, but I'm reserving judgement until I see some code. Hm, I would like to see that code before we can properly evaluate this interface. Especially as they are all tightly tied together. I don't think we really know what the One True Usage model is for VF devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has some ideas. I bet there's other people who have other ideas too. I'd love to hear those ideas. Rumor has it, there is some Xen code floating around to support this already, is that true? Xen patches were posted to xen-devel by Yu Zhao on the 29th of September [1]. Unfortunately the only responses that I can find are a) that the patches were mangled and b) they seem to include changes (by others) that have been merged into Linux. I have confirmed that both of these concerns are valid. I have not yet examined the difference, if any, in the approach taken by Yu to SR-IOV in Linux and Xen. Unfortunately comparison is less than trivial due to the gaping gap in kernel versions between Linux-Xen (2.6.18.8) and Linux itself. One approach that I was considering in order to familiarise myself with the code was to backport the v6 Linux patches (this thread) to Linux-Xen. I made a start on that, but again due to kernel version differences it is non-trivial. [1] http://lists.xensource.com/archives/html/xen-devel/2008-09/msg00923.html -- Simon Horman VA Linux Systems Japan K.K., Sydney, Australia Satellite Office H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Matthew Wilcox wrote: [Anna, can you fix your word-wrapping please? Your lines appear to be infinitely long which is most unpleasant to reply to] On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote: Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? A VF appears to the Linux OS as a standard (full, additional) PCI device. The driver is associated in the same way as for a normal PCI device. Ideally, you would use SR-IOV devices on a virtualized system, for example, using Xen. A VF can then be assigned to a guest domain as a full PCI device. It's not clear thats the right solution. If the VF devices are _only_ going to be used by the guest, then arguably, we don't want to create pci_devs for them in the host. (I think it _is_ the right answer, but I want to make it clear there's multiple opinions on this). The VFs shouldn't be limited to being used by the guest. SR-IOV is actually an incredibly painful thing. You need to have a VF driver in the guest, do hardware pass through, have a PV driver stub in the guest that's hypervisor specific (a VF is not usable on it's own), have a device specific backend in the VMM, and if you want to do live migration, have another PV driver in the guest that you can do teaming with. Just a mess. What we would rather do in KVM, is have the VFs appear in the host as standard network devices. We would then like to back our existing PV driver to this VF directly bypassing the host networking stack. A key feature here is being able to fill the VF's receive queue with guest memory instead of host kernel memory so that you can get zero-copy receive traffic. This will perform just as well as doing passthrough (at least) and avoid all that ugliness of dealing with SR-IOV in the guest. This eliminates all of the mess of various drivers in the guest and all the associated baggage of doing hardware passthrough. So IMHO, having VFs be usable in the host is absolutely critical because I think it's the only reasonable usage model. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Greg KH wrote: On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote: I don't think we really know what the One True Usage model is for VF devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has some ideas. I bet there's other people who have other ideas too. I'd love to hear those ideas. We've been talking about avoiding hardware passthrough entirely and just backing a virtio-net backend driver by a dedicated VF in the host. That avoids a huge amount of guest-facing complexity, let's migration Just Work, and should give the same level of performance. Regards, Anthony Liguori Rumor has it, there is some Xen code floating around to support this already, is that true? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 04:38:40PM -0600, Anthony Liguori wrote: It's not clear thats the right solution. If the VF devices are _only_ going to be used by the guest, then arguably, we don't want to create pci_devs for them in the host. (I think it _is_ the right answer, but I want to make it clear there's multiple opinions on this). The VFs shouldn't be limited to being used by the guest. SR-IOV is actually an incredibly painful thing. You need to have a VF driver in the guest, do hardware pass through, have a PV driver stub in the guest that's hypervisor specific (a VF is not usable on it's own), have a device specific backend in the VMM, and if you want to do live migration, have another PV driver in the guest that you can do teaming with. Just a mess. Not to mention that you basically have to statically allocate them up front. What we would rather do in KVM, is have the VFs appear in the host as standard network devices. We would then like to back our existing PV driver to this VF directly bypassing the host networking stack. A key feature here is being able to fill the VF's receive queue with guest memory instead of host kernel memory so that you can get zero-copy receive traffic. This will perform just as well as doing passthrough (at least) and avoid all that ugliness of dealing with SR-IOV in the guest. This argues for ignoring the SR-IOV mess completely. Just have the host driver expose multiple 'ethN' devices. This eliminates all of the mess of various drivers in the guest and all the associated baggage of doing hardware passthrough. So IMHO, having VFs be usable in the host is absolutely critical because I think it's the only reasonable usage model. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe linux-pci in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Matthew Wilcox Intel Open Source Technology Centre Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
* Greg KH ([EMAIL PROTECTED]) wrote: On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote: On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote: On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? Will all drivers that want to bind to a VF device need to be rewritten? The current model being implemented by my colleagues has separate drivers for the PF (aka native) and VF devices. I don't personally believe this is the correct path, but I'm reserving judgement until I see some code. Hm, I would like to see that code before we can properly evaluate this interface. Especially as they are all tightly tied together. I don't think we really know what the One True Usage model is for VF devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has some ideas. I bet there's other people who have other ideas too. I'd love to hear those ideas. First there's the question of how to represent the VF on the host. Ideally (IMO) this would show up as a normal interface so that normal tools can configure the interface. This is not exactly how the first round of patches were designed. Second there's the question of reserving the BDF on the host such that we don't have two drivers (one in the host and one in a guest) trying to drive the same device (an issue that shows up for device assignment as well as VF assignment). Third there's the question of whether the VF can be used in the host at all. Fourth there's the question of whether the VF and PF drivers are the same or separate. The typical usecase is assigning the VF to the guest directly, so there's only enough functionality in the host side to allocate a VF, configure it, and assign it (and propagate AER). This is with separate PF and VF driver. As Anthony mentioned, we are interested in allowing the host to use the VF. This could be useful for containers as well as dedicating a VF (a set of device resources) to a guest w/out passing it through. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
What we would rather do in KVM, is have the VFs appear in the host as standard network devices. We would then like to back our existing PV driver to this VF directly bypassing the host networking stack. A key feature here is being able to fill the VF's receive queue with guest memory instead of host kernel memory so that you can get zero-copy receive traffic. This will perform just as well as doing passthrough (at least) and avoid all that ugliness of dealing with SR-IOV in the guest. Anthony: This is already addressed by VMDq solution(or so called netchannel2), right? Qing He is debugging the KVM side patch and pretty much close to end. For this single purpose, we don't need SR-IOV. BTW at least Intel SR-IOV NIC also supports VMDq, so you can achieve this by simply use native VMDq enabled driver here, plus the work we are debugging now. Thx, eddie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On 11/6/2008 2:38:40 PM, Anthony Liguori wrote: Matthew Wilcox wrote: [Anna, can you fix your word-wrapping please? Your lines appear to be infinitely long which is most unpleasant to reply to] On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote: Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? A VF appears to the Linux OS as a standard (full, additional) PCI device. The driver is associated in the same way as for a normal PCI device. Ideally, you would use SR-IOV devices on a virtualized system, for example, using Xen. A VF can then be assigned to a guest domain as a full PCI device. It's not clear thats the right solution. If the VF devices are _only_ going to be used by the guest, then arguably, we don't want to create pci_devs for them in the host. (I think it _is_ the right answer, but I want to make it clear there's multiple opinions on this). The VFs shouldn't be limited to being used by the guest. SR-IOV is actually an incredibly painful thing. You need to have a VF driver in the guest, do hardware pass through, have a PV driver stub in the guest that's hypervisor specific (a VF is not usable on it's own), have a device specific backend in the VMM, and if you want to do live migration, have another PV driver in the guest that you can do teaming with. Just a mess. Actually a PV driver stub in the guest _was_ correct; I admit that I stated so at a virt mini summit more than a half year ago ;-). But the things have changed, and such a stub is no longer required (at least in our implementation). The major benefit of VF drivers now is that they are VMM-agnostic. What we would rather do in KVM, is have the VFs appear in the host as standard network devices. We would then like to back our existing PV driver to this VF directly bypassing the host networking stack. A key feature here is being able to fill the VF's receive queue with guest memory instead of host kernel memory so that you can get zero-copy receive traffic. This will perform just as well as doing passthrough (at least) and avoid all that ugliness of dealing with SR-IOV in the guest. This eliminates all of the mess of various drivers in the guest and all the associated baggage of doing hardware passthrough. So IMHO, having VFs be usable in the host is absolutely critical because I think it's the only reasonable usage model. As Eddie said, VMDq is better for this model, and the feature is already available today. It is much simpler because it was designed for such purposes. It does not require hardware pass-through (e.g. VT-d) or VFs as a PCI device, either. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html . Jun Nakajima | Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Greg KH wrote: On Wed, Oct 22, 2008 at 04:38:09PM +0800, Yu Zhao wrote: Greetings, Following patches are intended to support SR-IOV capability in the Linux kernel. With these patches, people can turn a PCI device with the capability into multiple ones from software perspective, which will benefit KVM and achieve other purposes such as QoS, security, and etc. Is there any actual users of this API around yet? How was it tested as there is no hardware to test on? Which drivers are going to have to be rewritten to take advantage of this new interface? Yes, the API is used by Intel, HP, NextIO and some other anonymous companies as they rise questions and send me feedback. I haven't seen their works but I guess some of drivers using SR-IOV API are going to be released soon. My test was done with Intel 82576 Gigabit Ethernet Controller. The product brief is at http://download.intel.com/design/network/ProdBrf/320025.pdf and the spec is available at http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf Regards, Yu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Greg KH wrote: On Thu, Nov 06, 2008 at 10:05:39AM -0800, H L wrote: --- On Thu, 11/6/08, Greg KH [EMAIL PROTECTED] wrote: On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? I have not yet fully grocked Yu Zhao's model to answer this. That said, I would *hope* to find it on the pci_dev level. Me too. VF is kind of lightweight PCI device, and it's represented by struct pci_dev. VF driver bounds to the pci_dev and works in the same way as other drivers. Will all drivers that want to bind to a VF device need to be rewritten? Not necessarily, or perhaps minimally; depends on hardware/firmware and actions the driver wants to take. An example here might assist. Let's just say someone has created, oh, I don't know, maybe an SR-IOV NIC. Now, for 'general' I/O operations to pass network traffic back and forth there would ideally be no difference in the actions and therefore behavior of a PF driver and a VF driver. But, what do you do in the instance a VF wants to change link-speed? As that physical characteristic affects all VFs, how do you handle that? This is where the hardware/firmware implementation part comes to play. If a VF driver performs some actions to initiate the change in link speed, the logic in the adapter could be anything like: snip Yes, I agree that all of this needs to be done, somehow. It's that somehow that I am interested in trying to see how it works out. This is device specific part. VF driver is free to do what it wants to do with device specific registers and resources, and wouldn't concern us as far as it behaves as PCI device driver. I have so far only seen Yu Zhao's 7-patch set. I've not yet looked at his subsequently tendered 15-patch set so I don't know what has changed.The hardware/firmware implementation for any given SR-IOV compatible device, will determine the extent of differences required between a PF driver and a VF driver. Yeah, that's what I'm worried/curious about. Without seeing the code for such a driver, how can we properly evaluate if this infrastructure is the correct one and the proper way to do all of this? As the example above demonstrates, that's a tough question to answer. Ideally, in my view, there would only be one driver written per SR-IOV device and it would contain the logic to do the right things based on whether its running as a PF or VF with that determination easily accomplished by testing the existence of the SR-IOV extended capability.Then, in an effort to minimize (if not eliminate) the complexities of driver-to-driver actions for fielding global events, contain as much of the logic as is possible within the adapter. Minimizing the efforts required for the device driver writers in my opinion paves the way to greater adoption of this technology. Yes, making things easier is the key here. Perhaps some of this could be hidden with a new bus type for these kinds of devices? Or a virtual bus of pci devices that the original SR-IOV device creates that corrispond to the individual virtual PCI devices? If that were the case, then it might be a lot easier in the end. PCI SIG only defines SR-IOV at PCI level, we can't predict what the hardware vendors would implement at device specific logic level. An example of SR-IOV NIC: PF may not have network functionality, it only controls VFs. Because people only want to use VFs in virtual machines, they don't need network functionality in the environment (e.g. hypervisor) where PF resides. Thanks, Yu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Fri, Nov 07, 2008 at 01:18:52PM +0800, Zhao, Yu wrote: Greg KH wrote: On Wed, Oct 22, 2008 at 04:38:09PM +0800, Yu Zhao wrote: Greetings, Following patches are intended to support SR-IOV capability in the Linux kernel. With these patches, people can turn a PCI device with the capability into multiple ones from software perspective, which will benefit KVM and achieve other purposes such as QoS, security, and etc. Is there any actual users of this API around yet? How was it tested as there is no hardware to test on? Which drivers are going to have to be rewritten to take advantage of this new interface? Yes, the API is used by Intel, HP, NextIO and some other anonymous companies as they rise questions and send me feedback. I haven't seen their works but I guess some of drivers using SR-IOV API are going to be released soon. Well, we can't merge infrastructure without seeing the users of that infrastructure, right? My test was done with Intel 82576 Gigabit Ethernet Controller. The product brief is at http://download.intel.com/design/network/ProdBrf/320025.pdf and the spec is available at http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf Cool, do you have that driver we can see? How does it interact and handle the kvm and xen issues that have been posted? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 03:54:06PM -0800, Chris Wright wrote: * Greg KH ([EMAIL PROTECTED]) wrote: On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote: On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote: On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? Will all drivers that want to bind to a VF device need to be rewritten? The current model being implemented by my colleagues has separate drivers for the PF (aka native) and VF devices. I don't personally believe this is the correct path, but I'm reserving judgement until I see some code. Hm, I would like to see that code before we can properly evaluate this interface. Especially as they are all tightly tied together. I don't think we really know what the One True Usage model is for VF devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has some ideas. I bet there's other people who have other ideas too. I'd love to hear those ideas. First there's the question of how to represent the VF on the host. Ideally (IMO) this would show up as a normal interface so that normal tools can configure the interface. This is not exactly how the first round of patches were designed. Second there's the question of reserving the BDF on the host such that we don't have two drivers (one in the host and one in a guest) trying to drive the same device (an issue that shows up for device assignment as well as VF assignment). Third there's the question of whether the VF can be used in the host at all. Fourth there's the question of whether the VF and PF drivers are the same or separate. The typical usecase is assigning the VF to the guest directly, so there's only enough functionality in the host side to allocate a VF, configure it, and assign it (and propagate AER). This is with separate PF and VF driver. As Anthony mentioned, we are interested in allowing the host to use the VF. This could be useful for containers as well as dedicating a VF (a set of device resources) to a guest w/out passing it through. All of this looks great. So, with all of these questions, how does the current code pertain to these issues? It seems like we have a long way to go... thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote: Greg KH wrote: On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote: I don't think we really know what the One True Usage model is for VF devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has some ideas. I bet there's other people who have other ideas too. I'd love to hear those ideas. We've been talking about avoiding hardware passthrough entirely and just backing a virtio-net backend driver by a dedicated VF in the host. That avoids a huge amount of guest-facing complexity, let's migration Just Work, and should give the same level of performance. Does that involve this patch set? Or a different type of interface. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 03:58:54PM -0700, Matthew Wilcox wrote: What we would rather do in KVM, is have the VFs appear in the host as standard network devices. We would then like to back our existing PV driver to this VF directly bypassing the host networking stack. A key feature here is being able to fill the VF's receive queue with guest memory instead of host kernel memory so that you can get zero-copy receive traffic. This will perform just as well as doing passthrough (at least) and avoid all that ugliness of dealing with SR-IOV in the guest. This argues for ignoring the SR-IOV mess completely. Just have the host driver expose multiple 'ethN' devices. That would work, but do we want to do that for every different type of driver? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Zhao, Yu Sent: Thursday, November 06, 2008 11:06 PM To: Chris Wright Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Matthew Wilcox; Greg KH; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; kvm@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support Chris Wright wrote: * Greg KH ([EMAIL PROTECTED]) wrote: On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote: On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote: On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? Will all drivers that want to bind to a VF device need to be rewritten? The current model being implemented by my colleagues has separate drivers for the PF (aka native) and VF devices. I don't personally believe this is the correct path, but I'm reserving judgement until I see some code. Hm, I would like to see that code before we can properly evaluate this interface. Especially as they are all tightly tied together. I don't think we really know what the One True Usage model is for VF devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has some ideas. I bet there's other people who have other ideas too. I'd love to hear those ideas. First there's the question of how to represent the VF on the host. Ideally (IMO) this would show up as a normal interface so that normal tools can configure the interface. This is not exactly how the first round of patches were designed. Whether the VF can show up as a normal interface is decided by VF driver. VF is represented by 'pci_dev' at PCI level, so VF driver can be loaded as normal PCI device driver. What the software representation (eth, framebuffer, etc.) created by VF driver is not controlled by SR-IOV framework. So you definitely can use normal tool to configure the VF if its driver supports that :-) Second there's the question of reserving the BDF on the host such that we don't have two drivers (one in the host and one in a guest) trying to drive the same device (an issue that shows up for device assignment as well as VF assignment). If we don't reserve BDF for the device, they can't work neither in the host nor the guest. Without BDF, we can't access the config space of the device, the device also can't do DMA. Did I miss your point? Third there's the question of whether the VF can be used in the host at all. Why can't? My VFs work well in the host as normal PCI devices :-) Fourth there's the question of whether the VF and PF drivers are the same or separate. As I mentioned in another email of this thread. We can't predict how hardware vendor creates their SR-IOV device. PCI SIG doesn't define device specific logics. So I think the answer of this question is up to the device driver developers. If PF and VF in a SR-IOV device have similar logics, then they can combine the driver. Otherwise, e.g., if PF doesn't have real functionality at all -- it only has registers to control internal resource allocation for VFs, then the drivers should be separate, right? Right, this really depends upon the functionality behind a VF. If VF is done as a subset of netdev interface (for example, a queue pair), then a split VF/PF driver model and a proprietary communication channel is in order. If each VF is done as a complete netdev interface (like in our 10GbE IOV controllers), then PF and VF drivers could be the same. Each VF can be independently driven by such native netdev driver; this includes the ability to run a native driver in a guest in passthru mode. A PF driver in a privileged domain doesn't even have to be present. The typical usecase is assigning the VF to the guest directly, so there's only enough functionality in the host side to allocate a VF, configure it, and assign it (and propagate AER). This is with separate PF and VF driver. As Anthony mentioned, we are interested in allowing the host to use the VF. This could be useful for containers as well
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Greg KH wrote: On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote: Greg KH wrote: On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote: I don't think we really know what the One True Usage model is for VF devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has some ideas. I bet there's other people who have other ideas too. I'd love to hear those ideas. We've been talking about avoiding hardware passthrough entirely and just backing a virtio-net backend driver by a dedicated VF in the host. That avoids a huge amount of guest-facing complexity, let's migration Just Work, and should give the same level of performance. This can be commonly used not only with VF -- devices that have multiple DMA queues (e.g., Intel VMDq, Neterion Xframe) and even traditional devices can also take the advantage of this. CC Rusty Russel in case he has more comments. Does that involve this patch set? Or a different type of interface. I think that is a different type of interface. We need to hook the DMA interface in the device driver to virtio-net backend so the hardware (normal device, VF, VMDq, etc.) can DMA data to/from the virtio-net backend. Regards, Yu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Wed, Oct 22, 2008 at 04:38:09PM +0800, Yu Zhao wrote: Greetings, Following patches are intended to support SR-IOV capability in the Linux kernel. With these patches, people can turn a PCI device with the capability into multiple ones from software perspective, which will benefit KVM and achieve other purposes such as QoS, security, and etc. Is there any actual users of this API around yet? How was it tested as there is no hardware to test on? Which drivers are going to have to be rewritten to take advantage of this new interface? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html