Re: kvm PCI assignment VFIO ramblings

2011-08-30 Thread Joerg Roedel
On Fri, Aug 26, 2011 at 12:04:22PM -0600, Alex Williamson wrote: On Thu, 2011-08-25 at 20:05 +0200, Joerg Roedel wrote: If we really expect segment numbers that need the full 16 bit then this would be the way to go. Otherwise I would prefer returning the group-id directly and partition the

Re: kvm PCI assignment VFIO ramblings

2011-08-30 Thread Joerg Roedel
On Sun, Aug 28, 2011 at 05:04:32PM +0300, Avi Kivity wrote: On 08/28/2011 04:56 PM, Joerg Roedel wrote: This can't be secured by a lock, because it introduces potential A-B--B-A lock problem when two processes try to take each others mm. It could probably be solved by a task-real_mm pointer,

Re: kvm PCI assignment VFIO ramblings

2011-08-29 Thread David Gibson
eOn Fri, Aug 26, 2011 at 01:17:05PM -0700, Aaron Fabbri wrote: [snip] Yes. In essence, I'd rather not have to run any other admin processes. Doing things programmatically, on the fly, from each process, is the cleanest model right now. The persistent group model doesn't necessarily prevent

Re: kvm PCI assignment VFIO ramblings

2011-08-28 Thread Avi Kivity
On 08/26/2011 12:24 PM, Roedel, Joerg wrote: As I see it there are two options: (a) make subsequent accesses from userspace or the guest result in either a SIGBUS that userspace must either deal with or die, or (b) replace the mapping with a dummy RO mapping containing 0xff, with any

Re: kvm PCI assignment VFIO ramblings

2011-08-28 Thread Joerg Roedel
On Sun, Aug 28, 2011 at 04:14:00PM +0300, Avi Kivity wrote: On 08/26/2011 12:24 PM, Roedel, Joerg wrote: The biggest problem with this approach is that it has to happen in the context of the given process. Linux can't really modify an mm which which belong to another context in a safe way.

Re: kvm PCI assignment VFIO ramblings

2011-08-28 Thread Avi Kivity
On 08/28/2011 04:56 PM, Joerg Roedel wrote: On Sun, Aug 28, 2011 at 04:14:00PM +0300, Avi Kivity wrote: On 08/26/2011 12:24 PM, Roedel, Joerg wrote: The biggest problem with this approach is that it has to happen in the context of the given process. Linux can't really modify an mm which

Re: kvm PCI assignment VFIO ramblings

2011-08-26 Thread Roedel, Joerg
On Fri, Aug 26, 2011 at 12:24:23AM -0400, David Gibson wrote: On Thu, Aug 25, 2011 at 08:25:45AM -0500, Alexander Graf wrote: On 25.08.2011, at 07:31, Roedel, Joerg wrote: For mmio we could stop the guest and replace the mmio region with a region that is filled with 0xff, no? Sure,

Re: kvm PCI assignment VFIO ramblings

2011-08-26 Thread Roedel, Joerg
On Fri, Aug 26, 2011 at 12:20:00AM -0400, David Gibson wrote: On Wed, Aug 24, 2011 at 01:03:32PM +0200, Roedel, Joerg wrote: On Wed, Aug 24, 2011 at 05:33:00AM -0400, David Gibson wrote: On Wed, Aug 24, 2011 at 11:14:26AM +0200, Roedel, Joerg wrote: I don't see a reason to make this

Re: kvm PCI assignment VFIO ramblings

2011-08-26 Thread Alexander Graf
On 26.08.2011, at 04:33, Roedel, Joerg wrote: On Fri, Aug 26, 2011 at 12:20:00AM -0400, David Gibson wrote: On Wed, Aug 24, 2011 at 01:03:32PM +0200, Roedel, Joerg wrote: On Wed, Aug 24, 2011 at 05:33:00AM -0400, David Gibson wrote: On Wed, Aug 24, 2011 at 11:14:26AM +0200, Roedel, Joerg

Re: kvm PCI assignment VFIO ramblings

2011-08-26 Thread Joerg Roedel
On Fri, Aug 26, 2011 at 09:07:35AM -0500, Alexander Graf wrote: On 26.08.2011, at 04:33, Roedel, Joerg wrote: The reason is that you mean the usability for the programmer and I mean it for the actual user of qemu :) No, we mean the actual user of qemu. The reason being that making a

Re: kvm PCI assignment VFIO ramblings

2011-08-26 Thread Alexander Graf
On 26.08.2011, at 10:24, Joerg Roedel wrote: On Fri, Aug 26, 2011 at 09:07:35AM -0500, Alexander Graf wrote: On 26.08.2011, at 04:33, Roedel, Joerg wrote: The reason is that you mean the usability for the programmer and I mean it for the actual user of qemu :) No, we mean the actual

Re: kvm PCI assignment VFIO ramblings

2011-08-26 Thread Aaron Fabbri
On 8/26/11 7:07 AM, Alexander Graf ag...@suse.de wrote: snip Forget the KVM case for a moment and think of a user space device driver. I as a user am not root. But I as a user when having access to /dev/vfioX want to be able to access the device and manage it - and only it. The admin of

Re: kvm PCI assignment VFIO ramblings

2011-08-26 Thread Aaron Fabbri
On 8/26/11 12:35 PM, Chris Wright chr...@sous-sol.org wrote: * Aaron Fabbri (aafab...@cisco.com) wrote: On 8/26/11 7:07 AM, Alexander Graf ag...@suse.de wrote: Forget the KVM case for a moment and think of a user space device driver. I as a user am not root. But I as a user when having

Re: kvm PCI assignment VFIO ramblings

2011-08-26 Thread Chris Wright
* Aaron Fabbri (aafab...@cisco.com) wrote: On 8/26/11 7:07 AM, Alexander Graf ag...@suse.de wrote: Forget the KVM case for a moment and think of a user space device driver. I as a user am not root. But I as a user when having access to /dev/vfioX want to be able to access the device and

Re: kvm PCI assignment VFIO ramblings

2011-08-26 Thread Chris Wright
* Aaron Fabbri (aafab...@cisco.com) wrote: On 8/26/11 12:35 PM, Chris Wright chr...@sous-sol.org wrote: * Aaron Fabbri (aafab...@cisco.com) wrote: Each process will open vfio devices on the fly, and they need to be able to share IOMMU resources. How do you share IOMMU resources w/

Re: kvm PCI assignment VFIO ramblings

2011-08-25 Thread Roedel, Joerg
Hi Alex, On Wed, Aug 24, 2011 at 05:13:49PM -0400, Alex Williamson wrote: Is this roughly what you're thinking of for the iommu_group component? Adding a dev_to_group iommu ops callback let's us consolidate the sysfs support in the iommu base. Would AMD-Vi do something similar (or exactly

Re: kvm PCI assignment VFIO ramblings

2011-08-25 Thread Roedel, Joerg
On Wed, Aug 24, 2011 at 10:56:13AM -0400, Alex Williamson wrote: On Wed, 2011-08-24 at 10:43 +0200, Joerg Roedel wrote: A side-note: Might it be better to expose assigned devices in a guest on a seperate bus? This will make it easier to emulate an IOMMU for the guest inside qemu. I think

Re: kvm PCI assignment VFIO ramblings

2011-08-25 Thread Roedel, Joerg
On Wed, Aug 24, 2011 at 11:07:46AM -0400, Alex Williamson wrote: On Wed, 2011-08-24 at 10:52 +0200, Roedel, Joerg wrote: On Tue, Aug 23, 2011 at 01:08:29PM -0400, Alex Williamson wrote: On Tue, 2011-08-23 at 15:14 +0200, Roedel, Joerg wrote: Handling it through fds is a good idea.

Re: kvm PCI assignment VFIO ramblings

2011-08-25 Thread Alexander Graf
On 25.08.2011, at 07:31, Roedel, Joerg wrote: On Wed, Aug 24, 2011 at 11:07:46AM -0400, Alex Williamson wrote: On Wed, 2011-08-24 at 10:52 +0200, Roedel, Joerg wrote: [...] We need to try the polite method of attempting to hot unplug the device from qemu first, which the current vfio

Re: kvm PCI assignment VFIO ramblings

2011-08-25 Thread Don Dutile
On 08/25/2011 06:54 AM, Roedel, Joerg wrote: Hi Alex, On Wed, Aug 24, 2011 at 05:13:49PM -0400, Alex Williamson wrote: Is this roughly what you're thinking of for the iommu_group component? Adding a dev_to_group iommu ops callback let's us consolidate the sysfs support in the iommu base.

Re: kvm PCI assignment VFIO ramblings

2011-08-25 Thread Roedel, Joerg
On Thu, Aug 25, 2011 at 11:38:09AM -0400, Don Dutile wrote: On 08/25/2011 06:54 AM, Roedel, Joerg wrote: We need to solve this differently. ARM is starting to use the iommu-api too and this definitly does not work there. One possible solution might be to make the iommu-ops per-bus. When

Re: kvm PCI assignment VFIO ramblings

2011-08-25 Thread Alex Williamson
On Thu, 2011-08-25 at 12:54 +0200, Roedel, Joerg wrote: Hi Alex, On Wed, Aug 24, 2011 at 05:13:49PM -0400, Alex Williamson wrote: Is this roughly what you're thinking of for the iommu_group component? Adding a dev_to_group iommu ops callback let's us consolidate the sysfs support in the

Re: kvm PCI assignment VFIO ramblings

2011-08-25 Thread Joerg Roedel
On Thu, Aug 25, 2011 at 11:20:30AM -0600, Alex Williamson wrote: On Thu, 2011-08-25 at 12:54 +0200, Roedel, Joerg wrote: We need to solve this differently. ARM is starting to use the iommu-api too and this definitly does not work there. One possible solution might be to make the iommu-ops

Re: kvm PCI assignment VFIO ramblings

2011-08-25 Thread David Gibson
On Wed, Aug 24, 2011 at 01:03:32PM +0200, Roedel, Joerg wrote: On Wed, Aug 24, 2011 at 05:33:00AM -0400, David Gibson wrote: On Wed, Aug 24, 2011 at 11:14:26AM +0200, Roedel, Joerg wrote: I don't see a reason to make this meta-grouping static. It would harm flexibility on x86. I think

Re: kvm PCI assignment VFIO ramblings

2011-08-25 Thread David Gibson
On Thu, Aug 25, 2011 at 08:25:45AM -0500, Alexander Graf wrote: On 25.08.2011, at 07:31, Roedel, Joerg wrote: On Wed, Aug 24, 2011 at 11:07:46AM -0400, Alex Williamson wrote: On Wed, 2011-08-24 at 10:52 +0200, Roedel, Joerg wrote: [...] We need to try the polite method of

Re: kvm PCI assignment VFIO ramblings

2011-08-24 Thread Joerg Roedel
On Tue, Aug 23, 2011 at 03:30:06PM -0400, Alex Williamson wrote: On Tue, 2011-08-23 at 07:01 +1000, Benjamin Herrenschmidt wrote: Could be tho in what form ? returning sysfs pathes ? I'm at a loss there, please suggest. I think we need an ioctl that returns some kind of array of devices

Re: kvm PCI assignment VFIO ramblings

2011-08-24 Thread Roedel, Joerg
On Tue, Aug 23, 2011 at 01:08:29PM -0400, Alex Williamson wrote: On Tue, 2011-08-23 at 15:14 +0200, Roedel, Joerg wrote: Handling it through fds is a good idea. This makes sure that everything belongs to one process. I am not really sure yet if we go the way to just bind plain groups

Re: kvm PCI assignment VFIO ramblings

2011-08-24 Thread Roedel, Joerg
On Tue, Aug 23, 2011 at 07:35:37PM -0400, Benjamin Herrenschmidt wrote: On Tue, 2011-08-23 at 15:18 +0200, Roedel, Joerg wrote: Hmm, good idea. But as far as I know the hotplug-event needs to be in the guest _before_ the device is actually unplugged (so that the guest can unbind its driver

Re: kvm PCI assignment VFIO ramblings

2011-08-24 Thread Joerg Roedel
On Tue, Aug 23, 2011 at 01:33:14PM -0400, Aaron Fabbri wrote: On 8/23/11 10:01 AM, Alex Williamson alex.william...@redhat.com wrote: The iommu domain would probably be allocated when the first device is bound to vfio. As each device is bound, it gets attached to the group. DMAs are done

Re: kvm PCI assignment VFIO ramblings

2011-08-24 Thread Roedel, Joerg
On Tue, Aug 23, 2011 at 12:54:27PM -0400, aafabbri wrote: On 8/23/11 4:04 AM, Joerg Roedel joerg.roe...@amd.com wrote: That is makes uiommu basically the same as the meta-groups, right? Yes, functionality seems the same, thus my suggestion to keep uiommu explicit. Is there some need for

Re: kvm PCI assignment VFIO ramblings

2011-08-24 Thread David Gibson
On Wed, Aug 24, 2011 at 11:14:26AM +0200, Roedel, Joerg wrote: On Tue, Aug 23, 2011 at 12:54:27PM -0400, aafabbri wrote: On 8/23/11 4:04 AM, Joerg Roedel joerg.roe...@amd.com wrote: That is makes uiommu basically the same as the meta-groups, right? Yes, functionality seems the same,

Re: kvm PCI assignment VFIO ramblings

2011-08-24 Thread Roedel, Joerg
On Wed, Aug 24, 2011 at 05:33:00AM -0400, David Gibson wrote: On Wed, Aug 24, 2011 at 11:14:26AM +0200, Roedel, Joerg wrote: I don't see a reason to make this meta-grouping static. It would harm flexibility on x86. I think it makes things easier on power but there are options on that

Re: kvm PCI assignment VFIO ramblings

2011-08-24 Thread Alex Williamson
On Wed, 2011-08-24 at 09:51 +1000, Benjamin Herrenschmidt wrote: For us the most simple and logical approach (which is also what pHyp uses and what Linux handles well) is really to expose a given PCI host bridge per group to the guest. Believe it or not, it makes things easier :-)

Re: kvm PCI assignment VFIO ramblings

2011-08-24 Thread Alex Williamson
On Wed, 2011-08-24 at 10:43 +0200, Joerg Roedel wrote: On Tue, Aug 23, 2011 at 03:30:06PM -0400, Alex Williamson wrote: On Tue, 2011-08-23 at 07:01 +1000, Benjamin Herrenschmidt wrote: Could be tho in what form ? returning sysfs pathes ? I'm at a loss there, please suggest. I think

Re: kvm PCI assignment VFIO ramblings

2011-08-24 Thread Alex Williamson
On Wed, 2011-08-24 at 10:52 +0200, Roedel, Joerg wrote: On Tue, Aug 23, 2011 at 01:08:29PM -0400, Alex Williamson wrote: On Tue, 2011-08-23 at 15:14 +0200, Roedel, Joerg wrote: Handling it through fds is a good idea. This makes sure that everything belongs to one process. I am not

Re: kvm PCI assignment VFIO ramblings

2011-08-24 Thread Alex Williamson
Joerg, Is this roughly what you're thinking of for the iommu_group component? Adding a dev_to_group iommu ops callback let's us consolidate the sysfs support in the iommu base. Would AMD-Vi do something similar (or exactly the same) for group #s? Thanks, Alex Signed-off-by: Alex Williamson

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Benjamin Herrenschmidt
On Mon, 2011-08-22 at 17:52 -0700, aafabbri wrote: I'm not following you. You have to enforce group/iommu domain assignment whether you have the existing uiommu API, or if you change it to your proposed ioctl(inherit_iommu) API. The only change needed to VFIO here should be to make

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Joerg Roedel
On Mon, Aug 22, 2011 at 08:52:18PM -0400, aafabbri wrote: You have to enforce group/iommu domain assignment whether you have the existing uiommu API, or if you change it to your proposed ioctl(inherit_iommu) API. The only change needed to VFIO here should be to make uiommu fd assignment

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Joerg Roedel
On Tue, Aug 23, 2011 at 02:54:43AM -0400, Benjamin Herrenschmidt wrote: Possibly, the question that interest me the most is what interface will KVM end up using. I'm also not terribly fan with the (perceived) discrepancy between using uiommu to create groups but using the group fd to actually

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 05:03:53PM -0400, Benjamin Herrenschmidt wrote: I am in favour of /dev/vfio/$GROUP. If multiple devices should be assigned to a guest, there can also be an ioctl to bind a group to an address-space of another group (certainly needs some care to not allow that both

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 03:17:00PM -0400, Alex Williamson wrote: On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote: I am in favour of /dev/vfio/$GROUP. If multiple devices should be assigned to a guest, there can also be an ioctl to bind a group to an address-space of another group

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alex Williamson
On Tue, 2011-08-23 at 12:38 +1000, David Gibson wrote: On Mon, Aug 22, 2011 at 09:45:48AM -0600, Alex Williamson wrote: On Mon, 2011-08-22 at 15:55 +1000, David Gibson wrote: On Sat, Aug 20, 2011 at 09:51:39AM -0700, Alex Williamson wrote: We had an extremely productive VFIO BoF on

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread aafabbri
On 8/23/11 4:04 AM, Joerg Roedel joerg.roe...@amd.com wrote: On Mon, Aug 22, 2011 at 08:52:18PM -0400, aafabbri wrote: You have to enforce group/iommu domain assignment whether you have the existing uiommu API, or if you change it to your proposed ioctl(inherit_iommu) API. The only

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alex Williamson
On Tue, 2011-08-23 at 16:54 +1000, Benjamin Herrenschmidt wrote: On Mon, 2011-08-22 at 17:52 -0700, aafabbri wrote: I'm not following you. You have to enforce group/iommu domain assignment whether you have the existing uiommu API, or if you change it to your proposed

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alex Williamson
On Tue, 2011-08-23 at 15:14 +0200, Roedel, Joerg wrote: On Mon, Aug 22, 2011 at 03:17:00PM -0400, Alex Williamson wrote: On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote: I am in favour of /dev/vfio/$GROUP. If multiple devices should be assigned to a guest, there can also be an

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Aaron Fabbri
On 8/23/11 10:01 AM, Alex Williamson alex.william...@redhat.com wrote: On Tue, 2011-08-23 at 16:54 +1000, Benjamin Herrenschmidt wrote: On Mon, 2011-08-22 at 17:52 -0700, aafabbri wrote: I'm not following you. You have to enforce group/iommu domain assignment whether you have the

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alex Williamson
On Tue, 2011-08-23 at 10:33 -0700, Aaron Fabbri wrote: On 8/23/11 10:01 AM, Alex Williamson alex.william...@redhat.com wrote: On Tue, 2011-08-23 at 16:54 +1000, Benjamin Herrenschmidt wrote: On Mon, 2011-08-22 at 17:52 -0700, aafabbri wrote: I'm not following you. You have to

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alex Williamson
On Tue, 2011-08-23 at 07:01 +1000, Benjamin Herrenschmidt wrote: On Mon, 2011-08-22 at 09:45 -0600, Alex Williamson wrote: Yes, that's the idea. An open question I have towards the configuration side is whether we might add iommu driver specific options to the groups. For instance on

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Benjamin Herrenschmidt
On Tue, 2011-08-23 at 15:18 +0200, Roedel, Joerg wrote: On Mon, Aug 22, 2011 at 05:03:53PM -0400, Benjamin Herrenschmidt wrote: I am in favour of /dev/vfio/$GROUP. If multiple devices should be assigned to a guest, there can also be an ioctl to bind a group to an address-space of

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Benjamin Herrenschmidt
On Tue, 2011-08-23 at 10:23 -0600, Alex Williamson wrote: Yeah. Joerg's idea of binding groups internally (pass the fd of one group to another via ioctl) is one option. The tricky part will be implementing it to support hot unplug of any group from the supergroup. I believe Ben had a

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Benjamin Herrenschmidt
For us the most simple and logical approach (which is also what pHyp uses and what Linux handles well) is really to expose a given PCI host bridge per group to the guest. Believe it or not, it makes things easier :-) I'm all for easier. Why does exposing the bridge use less bus

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alexander Graf
On 23.08.2011, at 18:41, Benjamin Herrenschmidt wrote: On Tue, 2011-08-23 at 10:23 -0600, Alex Williamson wrote: Yeah. Joerg's idea of binding groups internally (pass the fd of one group to another via ioctl) is one option. The tricky part will be implementing it to support hot unplug of

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alexander Graf
On 23.08.2011, at 18:51, Benjamin Herrenschmidt wrote: For us the most simple and logical approach (which is also what pHyp uses and what Linux handles well) is really to expose a given PCI host bridge per group to the guest. Believe it or not, it makes things easier :-) I'm all for

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Avi Kivity
On 08/20/2011 07:51 PM, Alex Williamson wrote: We need to address both the description and enforcement of device groups. Groups are formed any time the iommu does not have resolution between a set of devices. On x86, this typically happens when a PCI-to-PCI bridge exists between the set of

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Joerg Roedel
On Mon, Aug 22, 2011 at 02:30:26AM -0400, Avi Kivity wrote: On 08/20/2011 07:51 PM, Alex Williamson wrote: We need to address both the description and enforcement of device groups. Groups are formed any time the iommu does not have resolution between a set of devices. On x86, this

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Avi Kivity
On 08/22/2011 01:46 PM, Joerg Roedel wrote: $ readlink /sys/devices/pci:00/:00:19.0/iommu_group ../../../path/to/device/which/represents/the/resource/constraint (the pci-to-pci bridge on x86, or whatever node represents partitionable endpoints on power) That does not work. The

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 06:51:35AM -0400, Avi Kivity wrote: On 08/22/2011 01:46 PM, Joerg Roedel wrote: That does not work. The bridge in question may not even be visible as a PCI device, so you can't link to it. This is the case on a few PCIe cards which only have a PCIx chip and a

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Avi Kivity
On 08/22/2011 03:36 PM, Roedel, Joerg wrote: On Mon, Aug 22, 2011 at 06:51:35AM -0400, Avi Kivity wrote: On 08/22/2011 01:46 PM, Joerg Roedel wrote: That does not work. The bridge in question may not even be visible as a PCI device, so you can't link to it. This is the case on a few

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 08:42:35AM -0400, Avi Kivity wrote: On 08/22/2011 03:36 PM, Roedel, Joerg wrote: On the AMD IOMMU side this information is stored in the IVRS ACPI table. Not sure about the VT-d side, though. I see. There is no sysfs node representing it? No. It also doesn't exist

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Avi Kivity
On 08/22/2011 03:55 PM, Roedel, Joerg wrote: On Mon, Aug 22, 2011 at 08:42:35AM -0400, Avi Kivity wrote: On 08/22/2011 03:36 PM, Roedel, Joerg wrote: On the AMD IOMMU side this information is stored in the IVRS ACPI table. Not sure about the VT-d side, though. I see. There is no

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 09:06:07AM -0400, Avi Kivity wrote: On 08/22/2011 03:55 PM, Roedel, Joerg wrote: Well, I don't think its really meaningless, but we need some way to communicate the information about device groups to userspace. I mean the contents of the group descriptor. There

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Avi Kivity
On 08/22/2011 04:15 PM, Roedel, Joerg wrote: On Mon, Aug 22, 2011 at 09:06:07AM -0400, Avi Kivity wrote: On 08/22/2011 03:55 PM, Roedel, Joerg wrote: Well, I don't think its really meaningless, but we need some way to communicate the information about device groups to userspace. I

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 09:17:41AM -0400, Avi Kivity wrote: On 08/22/2011 04:15 PM, Roedel, Joerg wrote: On Mon, Aug 22, 2011 at 09:06:07AM -0400, Avi Kivity wrote: On 08/22/2011 03:55 PM, Roedel, Joerg wrote: Well, I don't think its really meaningless, but we need some way to

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Alex Williamson
On Mon, 2011-08-22 at 15:55 +1000, David Gibson wrote: On Sat, Aug 20, 2011 at 09:51:39AM -0700, Alex Williamson wrote: We had an extremely productive VFIO BoF on Monday. Here's my attempt to capture the plan that I think we agreed to: We need to address both the description and

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Joerg Roedel
On Sat, Aug 20, 2011 at 12:51:39PM -0400, Alex Williamson wrote: We had an extremely productive VFIO BoF on Monday. Here's my attempt to capture the plan that I think we agreed to: We need to address both the description and enforcement of device groups. Groups are formed any time the

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Alex Williamson
On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote: On Sat, Aug 20, 2011 at 12:51:39PM -0400, Alex Williamson wrote: We had an extremely productive VFIO BoF on Monday. Here's my attempt to capture the plan that I think we agreed to: We need to address both the description and

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Benjamin Herrenschmidt
On Mon, 2011-08-22 at 13:29 -0700, aafabbri wrote: Each device fd would then support a similar set of ioctls and mapping (mmio/pio/config) interface as current vfio, except for the obvious domain and dma ioctls superseded by the group fd. Another valid model might be that

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Benjamin Herrenschmidt
On Mon, 2011-08-22 at 09:30 +0300, Avi Kivity wrote: On 08/20/2011 07:51 PM, Alex Williamson wrote: We need to address both the description and enforcement of device groups. Groups are formed any time the iommu does not have resolution between a set of devices. On x86, this typically

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Benjamin Herrenschmidt
On Mon, 2011-08-22 at 09:45 -0600, Alex Williamson wrote: Yes, that's the idea. An open question I have towards the configuration side is whether we might add iommu driver specific options to the groups. For instance on x86 where we typically have B:D.F granularity, should we have an option

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Benjamin Herrenschmidt
I am in favour of /dev/vfio/$GROUP. If multiple devices should be assigned to a guest, there can also be an ioctl to bind a group to an address-space of another group (certainly needs some care to not allow that both groups belong to different processes). Btw, a problem we havn't talked

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread Benjamin Herrenschmidt
I wouldn't use uiommu for that. Any particular reason besides saving a file descriptor? We use it today, and it seems like a cleaner API than what you propose changing it to. Well for one, we are back to square one vs. grouping constraints. .../... If we in singleton-group land were

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread aafabbri
On 8/20/11 9:51 AM, Alex Williamson alex.william...@redhat.com wrote: We had an extremely productive VFIO BoF on Monday. Here's my attempt to capture the plan that I think we agreed to: We need to address both the description and enforcement of device groups. Groups are formed any time

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread aafabbri
On 8/22/11 1:49 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Mon, 2011-08-22 at 13:29 -0700, aafabbri wrote: Each device fd would then support a similar set of ioctls and mapping (mmio/pio/config) interface as current vfio, except for the obvious domain and dma ioctls

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread aafabbri
On 8/22/11 2:49 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: I wouldn't use uiommu for that. Any particular reason besides saving a file descriptor? We use it today, and it seems like a cleaner API than what you propose changing it to. Well for one, we are back to

Re: kvm PCI assignment VFIO ramblings

2011-08-22 Thread David Gibson
On Mon, Aug 22, 2011 at 09:45:48AM -0600, Alex Williamson wrote: On Mon, 2011-08-22 at 15:55 +1000, David Gibson wrote: On Sat, Aug 20, 2011 at 09:51:39AM -0700, Alex Williamson wrote: We had an extremely productive VFIO BoF on Monday. Here's my attempt to capture the plan that I think

Re: kvm PCI assignment VFIO ramblings

2011-08-21 Thread David Gibson
On Sat, Aug 20, 2011 at 09:51:39AM -0700, Alex Williamson wrote: We had an extremely productive VFIO BoF on Monday. Here's my attempt to capture the plan that I think we agreed to: We need to address both the description and enforcement of device groups. Groups are formed any time the

Re: kvm PCI assignment VFIO ramblings

2011-08-20 Thread Alex Williamson
We had an extremely productive VFIO BoF on Monday. Here's my attempt to capture the plan that I think we agreed to: We need to address both the description and enforcement of device groups. Groups are formed any time the iommu does not have resolution between a set of devices. On x86, this

Re: kvm PCI assignment VFIO ramblings

2011-08-09 Thread Alex Williamson
On Mon, 2011-08-08 at 11:28 +0300, Avi Kivity wrote: On 08/03/2011 05:04 AM, David Gibson wrote: I still don't understand the distinction you're making. We're saying the group is owned by a given user or guest in the sense that no-one else may use anything in the group (including host

Re: kvm PCI assignment VFIO ramblings

2011-08-09 Thread Benjamin Herrenschmidt
Mostly correct, yes. x86 isn't immune to the group problem, it shows up for us any time there's a PCIe-to-PCI bridge in the device hierarchy. We lose resolution of devices behind the bridge. As you state though, I think of this as only a constraint on what we're able to do with those

Re: kvm PCI assignment VFIO ramblings

2011-08-08 Thread David Gibson
On Fri, Aug 05, 2011 at 09:10:09AM -0600, Alex Williamson wrote: On Fri, 2011-08-05 at 20:42 +1000, Benjamin Herrenschmidt wrote: Right. In fact to try to clarify the problem for everybody, I think we can distinguish two different classes of constraints that can influence the grouping of

Re: kvm PCI assignment VFIO ramblings

2011-08-08 Thread Avi Kivity
On 08/03/2011 05:04 AM, David Gibson wrote: I still don't understand the distinction you're making. We're saying the group is owned by a given user or guest in the sense that no-one else may use anything in the group (including host drivers). At that point none, some or all of the devices in

Re: kvm PCI assignment VFIO ramblings

2011-08-05 Thread Benjamin Herrenschmidt
On Thu, 2011-08-04 at 12:41 +0200, Joerg Roedel wrote: On Mon, Aug 01, 2011 at 02:27:36PM -0600, Alex Williamson wrote: It's not clear to me how we could skip it. With VT-d, we'd have to implement an emulated interrupt remapper and hope that the guest picks unused indexes in the host

Re: kvm PCI assignment VFIO ramblings

2011-08-05 Thread Benjamin Herrenschmidt
On Thu, 2011-08-04 at 12:27 +0200, Joerg Roedel wrote: Hi Ben, thanks for your detailed introduction to the requirements for POWER. Its good to know that the granularity problem is not x86-only. I'm happy to see your reply :-) I had the feeling I was a bit alone here... On Sat, Jul 30,

Re: kvm PCI assignment VFIO ramblings

2011-08-05 Thread Joerg Roedel
On Fri, Aug 05, 2011 at 08:26:11PM +1000, Benjamin Herrenschmidt wrote: On Thu, 2011-08-04 at 12:41 +0200, Joerg Roedel wrote: On Mon, Aug 01, 2011 at 02:27:36PM -0600, Alex Williamson wrote: It's not clear to me how we could skip it. With VT-d, we'd have to implement an emulated

Re: kvm PCI assignment VFIO ramblings

2011-08-05 Thread Joerg Roedel
On Fri, Aug 05, 2011 at 08:42:38PM +1000, Benjamin Herrenschmidt wrote: Right. In fact to try to clarify the problem for everybody, I think we can distinguish two different classes of constraints that can influence the grouping of devices: 1- Hard constraints. These are typically devices

Re: kvm PCI assignment VFIO ramblings

2011-08-05 Thread Alex Williamson
On Fri, 2011-08-05 at 20:42 +1000, Benjamin Herrenschmidt wrote: Right. In fact to try to clarify the problem for everybody, I think we can distinguish two different classes of constraints that can influence the grouping of devices: 1- Hard constraints. These are typically devices using the

Re: kvm PCI assignment VFIO ramblings

2011-08-05 Thread Benjamin Herrenschmidt
On Fri, 2011-08-05 at 15:44 +0200, Joerg Roedel wrote: On Fri, Aug 05, 2011 at 08:42:38PM +1000, Benjamin Herrenschmidt wrote: Right. In fact to try to clarify the problem for everybody, I think we can distinguish two different classes of constraints that can influence the grouping of

Re: kvm PCI assignment VFIO ramblings

2011-08-04 Thread Joerg Roedel
On Sat, Jul 30, 2011 at 12:20:08PM -0600, Alex Williamson wrote: On Sat, 2011-07-30 at 09:58 +1000, Benjamin Herrenschmidt wrote: - The -minimum- granularity of pass-through is not always a single device and not always under SW control But IMHO, we need to preserve the granularity of

Re: kvm PCI assignment VFIO ramblings

2011-08-04 Thread Joerg Roedel
Hi Ben, thanks for your detailed introduction to the requirements for POWER. Its good to know that the granularity problem is not x86-only. On Sat, Jul 30, 2011 at 09:58:53AM +1000, Benjamin Herrenschmidt wrote: In IBM POWER land, we call this a partitionable endpoint (the term endpoint here

Re: kvm PCI assignment VFIO ramblings

2011-08-04 Thread Joerg Roedel
On Mon, Aug 01, 2011 at 02:27:36PM -0600, Alex Williamson wrote: It's not clear to me how we could skip it. With VT-d, we'd have to implement an emulated interrupt remapper and hope that the guest picks unused indexes in the host interrupt remapping table before it could do anything useful

Re: kvm PCI assignment VFIO ramblings

2011-08-03 Thread David Gibson
On Tue, Aug 02, 2011 at 09:44:49PM -0600, Alex Williamson wrote: On Wed, 2011-08-03 at 12:04 +1000, David Gibson wrote: On Tue, Aug 02, 2011 at 12:35:19PM -0600, Alex Williamson wrote: On Tue, 2011-08-02 at 12:14 -0600, Alex Williamson wrote: On Tue, 2011-08-02 at 18:28 +1000, David

Re: kvm PCI assignment VFIO ramblings

2011-08-02 Thread Avi Kivity
On 08/01/2011 11:27 PM, Alex Williamson wrote: On Sun, 2011-07-31 at 17:09 +0300, Avi Kivity wrote: On 07/30/2011 02:58 AM, Benjamin Herrenschmidt wrote: Due to our paravirt nature, we don't need to masquerade the MSI-X table for example. At all. If the guest configures crap into it,

Re: kvm PCI assignment VFIO ramblings

2011-08-02 Thread Avi Kivity
On 08/02/2011 04:27 AM, Benjamin Herrenschmidt wrote: I have a feeling you'll be getting the same capabilities sooner or later, or you won't be able to make use of S/R IOV VFs. I'm not sure why you mean. We can do SR/IOV just fine (well, with some limitations due to constraints with how

Re: kvm PCI assignment VFIO ramblings

2011-08-02 Thread David Gibson
On Sat, Jul 30, 2011 at 12:20:08PM -0600, Alex Williamson wrote: On Sat, 2011-07-30 at 09:58 +1000, Benjamin Herrenschmidt wrote: [snip] On x86, the USB controllers don't typically live behind a PCIe-to-PCI bridge, so don't suffer the source identifier problem, but they do often share an

Re: kvm PCI assignment VFIO ramblings

2011-08-02 Thread Benjamin Herrenschmidt
On Tue, 2011-08-02 at 12:12 +0300, Avi Kivity wrote: On 08/02/2011 04:27 AM, Benjamin Herrenschmidt wrote: I have a feeling you'll be getting the same capabilities sooner or later, or you won't be able to make use of S/R IOV VFs. I'm not sure why you mean. We can do SR/IOV just

Re: kvm PCI assignment VFIO ramblings

2011-08-02 Thread Avi Kivity
On 08/02/2011 03:58 PM, Benjamin Herrenschmidt wrote: What you mean 2-level is two passes through two trees (ie 6 or 8 levels right ?). (16 or 25) 25 levels ? You mean 25 loads to get to a translation ? And you get any kind of performance out of that ? :-) Aggressive partial

Re: kvm PCI assignment VFIO ramblings

2011-08-02 Thread Alex Williamson
On Tue, 2011-08-02 at 11:27 +1000, Benjamin Herrenschmidt wrote: It's a shared address space. With a basic configuration on p7ioc for example we have MMIO going from 3G to 4G (PCI side addresses). BARs contain the normal PCI address there. But that 1G is divided in 128 segments of equal size

Re: kvm PCI assignment VFIO ramblings

2011-08-02 Thread Alex Williamson
On Tue, 2011-08-02 at 22:58 +1000, Benjamin Herrenschmidt wrote: Don't worry, it took me a while to get my head around the HW :-) SR-IOV VFs will generally not have limitations like that no, but on the other hand, they -will- still require 1 VF = 1 group, ie, you won't be able to take a

Re: kvm PCI assignment VFIO ramblings

2011-08-02 Thread Alex Williamson
On Tue, 2011-08-02 at 18:28 +1000, David Gibson wrote: On Sat, Jul 30, 2011 at 12:20:08PM -0600, Alex Williamson wrote: On Sat, 2011-07-30 at 09:58 +1000, Benjamin Herrenschmidt wrote: [snip] On x86, the USB controllers don't typically live behind a PCIe-to-PCI bridge, so don't suffer the

Re: kvm PCI assignment VFIO ramblings

2011-08-02 Thread Alex Williamson
On Tue, 2011-08-02 at 12:14 -0600, Alex Williamson wrote: On Tue, 2011-08-02 at 18:28 +1000, David Gibson wrote: On Sat, Jul 30, 2011 at 12:20:08PM -0600, Alex Williamson wrote: On Sat, 2011-07-30 at 09:58 +1000, Benjamin Herrenschmidt wrote: [snip] On x86, the USB controllers don't

  1   2   >