Re: RFC: vfio / device assignment -- layout of device fd files
On Thu, Sep 01, 2011 at 03:26:01PM -0500, Scott Wood wrote: On 09/01/2011 03:00 PM, Michael S. Tsirkin wrote: That's a very rich interface, and easy to get wrong. AFAIK the only reason vfio uses read/write for PCI was to avoid inventing a custom interface. But if you do, it looks like a set of ioctls would be much easier? You can even fit the existing uio infrastructure if you like. How would it be easier than producing/parsing a static data structure? What would it look like? For example, for a property X, instead of adding a structure with identifier X, implement ioctl GET_X. Userspace calls this ioctl, an error implies the property is not present. Here's another idea: all the information is likely already available in sysfs. The only major thing that is likely available elsewhere is PCI config space, and that was not new to this proposal. Most other material is specifically related to the vfio/dtio interface (e.g. offsets into the file handle, arguments to the get irq fd ioctl, mapping of dtio regions/interrupts to device tree nodes), and could not be useful to more than just vfio. For example resources are already there in sysfs. A way to query where the device is in sysfs would give you *a ton* of information, including the type etc, For PCI, the user has domain/bus/dev/fn which should be sufficient to find that, if desired. For device-tree devices, there's a device tree path provided for each region/interrupt. -Scott So you are saying the user already has sysfs path? I thought the point was to pass all info through a single fd. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: vfio / device assignment -- layout of device fd files
On 09/02/2011 10:57 AM, Michael S. Tsirkin wrote: On Thu, Sep 01, 2011 at 03:26:01PM -0500, Scott Wood wrote: On 09/01/2011 03:00 PM, Michael S. Tsirkin wrote: That's a very rich interface, and easy to get wrong. AFAIK the only reason vfio uses read/write for PCI was to avoid inventing a custom interface. But if you do, it looks like a set of ioctls would be much easier? You can even fit the existing uio infrastructure if you like. How would it be easier than producing/parsing a static data structure? What would it look like? For example, for a property X, instead of adding a structure with identifier X, implement ioctl GET_X. Userspace calls this ioctl, an error implies the property is not present. Why is this better? Here's another idea: all the information is likely already available in sysfs. The only major thing that is likely available elsewhere is PCI config space, and that was not new to this proposal. Most other material is specifically related to the vfio/dtio interface (e.g. offsets into the file handle, arguments to the get irq fd ioctl, mapping of dtio regions/interrupts to device tree nodes), and could not be useful to more than just vfio. For example resources are already there in sysfs. Their relationship to vfio regions, and their offsets in the fd, is not. A way to query where the device is in sysfs would give you *a ton* of information, including the type etc, For PCI, the user has domain/bus/dev/fn which should be sufficient to find that, if desired. For device-tree devices, there's a device tree path provided for each region/interrupt. -Scott So you are saying the user already has sysfs path? I thought the point was to pass all info through a single fd. No, as I understand it the point is to provide access through that fd, as well as information that is specific to that fd. Whether any particular extra bit of information gets included there is a question of convenience -- which should not be ignored in interface design. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: vfio / device assignment -- layout of device fd files
On Mon, Aug 29, 2011 at 04:51:20PM +, Yoder Stuart-B08248 wrote: Alex Graf, Scott Wood, and I met last week to try to flesh out some details as to how vfio could work for non-PCI devices, like we have in embedded systems. This most likely will require a different kernel driver than vfio-- for now we are calling it dtio (for device tree I/O) as there is no way to discover these devices except from the device tree. But the dtio driver would use the same architecture and interfaces as vfio. For devices on a system bus and represented in a device tree we have some different requirements than PCI for what is exposed in the device fd file. A device may have multiple address regions, multiple interrupts, a variable length device tree path, whether a region is mmapable, etc. With existing vfio, the device fd file layout is something like: 0xF Config space offset ... 0x6 ROM offset 0x5 BAR 5 offset 0x4 BAR 4 offset 0x3 BAR 3 offset 0x2 BAR 2 offset 0x1 BAR 1 offset 0x0 BAR 0 offset We have an alternate proposal that we think is more flexible, extensible, and will accommodate both PCI and system bus type devices (and others). Instead of config space fixed at 0xf, we would propose a header and multiple 'device info' records at offset 0x0 that would encode everything that user space needs to know about the device: That's a very rich interface, and easy to get wrong. AFAIK the only reason vfio uses read/write for PCI was to avoid inventing a custom interface. But if you do, it looks like a set of ioctls would be much easier? You can even fit the existing uio infrastructure if you like. There may be other more complex device or bus types that need their own special encodings, and this format would allow the definition of new records to define devices. Two other types that come to mind are Serial Rapid I/O busses commonly used in our networking SoCs and the FSL DPAA portals which are very strange devices that may require their own unique interface exposed to user space. In short, when user space opens up a device fd it needs some information about what this device is, and this proposal tries to address that. Regards, Stuart Yoder Here's another idea: all the information is likely already available in sysfs. A way to query where the device is in sysfs would give you *a ton* of information, including the type etc, if the info is not there you will be able to add it in a way that is useful to more than just vfio, and you won't need to extend the interface each time you find you need a new piece of info. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: vfio / device assignment -- layout of device fd files
On 09/01/2011 03:00 PM, Michael S. Tsirkin wrote: That's a very rich interface, and easy to get wrong. AFAIK the only reason vfio uses read/write for PCI was to avoid inventing a custom interface. But if you do, it looks like a set of ioctls would be much easier? You can even fit the existing uio infrastructure if you like. How would it be easier than producing/parsing a static data structure? What would it look like? Here's another idea: all the information is likely already available in sysfs. The only major thing that is likely available elsewhere is PCI config space, and that was not new to this proposal. Most other material is specifically related to the vfio/dtio interface (e.g. offsets into the file handle, arguments to the get irq fd ioctl, mapping of dtio regions/interrupts to device tree nodes), and could not be useful to more than just vfio. A way to query where the device is in sysfs would give you *a ton* of information, including the type etc, For PCI, the user has domain/bus/dev/fn which should be sufficient to find that, if desired. For device-tree devices, there's a device tree path provided for each region/interrupt. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: vfio / device assignment -- layout of device fd files
On 08/29/2011 11:55 PM, Alex Williamson wrote: Thanks for the example. Is it always the case that you need a path and an index? If so, why are they separate sub-regions instead of combined into a DT_INFO sub-region? You'll always need both. DTPATH was put into a separate subrecord because it is of variable length. Putting it at the end of DTINDEX would inhibit DTINDEX's extensibility. On Mon, 2011-08-29 at 18:14 -0500, Scott Wood wrote: On 08/29/2011 05:46 PM, Alex Williamson wrote: I could imagine a platform device described by ACPI that might want to differentiate. The physical device doesn't get moved of course, but guest drivers might care how the device is described if we need to rebuild those ACPI tables. ACPI might even be a good place to leverage these data structures... /me ducks. ACPI info could be another subrecord type, but in the device tree system-bus case we generally don't have this information at the generic infrastructure level. Drivers are expected to know how their devices' regions should be mapped. The device tree tells them how they're mapped, right? Or maybe more precisely, the device tree tells them where they're mapped and it doesn't really matter whether they're 32bit or 64bit because they can't be moved. The device tree says where they're mapped. The concept of 32/64bit doesn't really apply to a fixed mapping -- or if not fixed, there may be arbitrary constraints, and the OS should just treat it as fixed. Maybe this is sub-region material. It just feels wrong to enumerate a region and not be able to include any basic properties beyond offset and size in a common field. offset, size, and whether you can use mmap() are the only things we could think of that were universal. The rest can go in subrecords. Put their instance numbers outside of the BAR region? We have a fixed REGION space on PCI, so we could just define BAR0 == instance 0, BAR1 == instance 1... ROM == instance 6, CONFIG == instance 0xF (or 7). Seems more awkward than just having each region say what it is. What do you do to fill in the gaps? You don't, instance numbers would just be non-contiguous. OK, maybe I'm not understanding what you mean by instance numbers -- I thought you were talking about the order in which they appear. If you mean to put an index field in the common REGION struct, I'd prefer not to. It would lack meaning without the context of some particular encoding scheme. Better to put it in a typed subrecord where we can be explicit about what it means. The reason I cringe at PCI_INFO sub-regions is that all the info is already available in PCI config space BAR index isn't. I'm neutral on whether it makes sense to include BAR type. We also have MSI/X interrupts on PCI. Do we need to describe those via info records or parse PCI config space and know that vfio-pci device fds support the VFIO_PCI_DEVICE_SET_MSI_FDS ioctl? I'm not very familiar with MSI-X, but if it's workable without it then probably no need. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RFC: vfio / device assignment -- layout of device fd files
Alex Graf, Scott Wood, and I met last week to try to flesh out some details as to how vfio could work for non-PCI devices, like we have in embedded systems. This most likely will require a different kernel driver than vfio-- for now we are calling it dtio (for device tree I/O) as there is no way to discover these devices except from the device tree. But the dtio driver would use the same architecture and interfaces as vfio. For devices on a system bus and represented in a device tree we have some different requirements than PCI for what is exposed in the device fd file. A device may have multiple address regions, multiple interrupts, a variable length device tree path, whether a region is mmapable, etc. With existing vfio, the device fd file layout is something like: 0xF Config space offset ... 0x6 ROM offset 0x5 BAR 5 offset 0x4 BAR 4 offset 0x3 BAR 3 offset 0x2 BAR 2 offset 0x1 BAR 1 offset 0x0 BAR 0 offset We have an alternate proposal that we think is more flexible, extensible, and will accommodate both PCI and system bus type devices (and others). Instead of config space fixed at 0xf, we would propose a header and multiple 'device info' records at offset 0x0 that would encode everything that user space needs to know about the device: 0x0 +-+-+ | magic | version | u64 // magic u64 identifies the type of | vfio| | // passthru I/O, plus version # | dtio| | // vfio - PCI devices +-+-+ // dtio - device tree devices | flags | u32 // encodes any flags (TBD) +---+ | dev info record N| |type | u32 // type of record |rec_len| u32 // length in bytes of record | | (including record header) |flags | u32 // type specific flags |...content... | // record content, which could +---+ // include sub-records | dev info record N+1 | +---+ | dev info record N+2 | +---+ ... The device info records following the file header have the following record types each with content encoded in a record specific way: REGION - describes an addressable address range for the device DTPATH - describes the device tree path for the device DTINDEX - describes the index into the related device tree property (reg,ranges,interrupts,interrupt-map) INTERRUPT - describes an interrupt for the device PCI_CONFIG_SPACE - describes config space for the device PCI_INFO - domain:bus:device:func PCI_BAR_INFO - information about the BAR for a device For a device tree type device the file may look like: 0x0+---+ |header | +---+ | type = REGION | | rec_len | | flags = | u32 // region specific flags | is_mmapable | | offset | u64 // seek offset to region from | |from beginning | len | u64 // length of region | addr| u64 // phys addr of region | | +---+ \ type = DTPATH \ // a sub-region | rec_len| | flags | | dev tree path | char[] // device tree path +---+ \ type = DTINDEX \ // a sub-region | rec_len| | flags | | prop_type | u32 // REG, RANGES | prop_index | u32 // index into resource list +---+ | type = INTERRUPT | | rec_len | | flags| u32 | ioctl_handle | u32 // argument to ioctl to get interrupts | | +---+ \ type = DTPATH \ // a sub-region | rec_len | | flags | | dev tree path | char[] // device tree path +---+ \ type = DTINDEX \ // a sub-region | rec_len | | flags | | prop_type | u32 // INTERRUPT,INTERRUPT_MAP | prop_index| u32 // index PCI devices would have a PCI specific encoding. Instead of config space and the mappable BAR regions being at specific predetermined offsets, the device info records
Re: [Qemu-devel] RFC: vfio / device assignment -- layout of device fd files
On 08/29/2011 11:51 AM, Yoder Stuart-B08248 wrote: Alex Graf, Scott Wood, and I met last week to try to flesh out some details as to how vfio could work for non-PCI devices, like we have in embedded systems. This most likely will require a different kernel driver than vfio-- for now we are calling it dtio (for device tree I/O) as there is no way to discover these devices except from the device tree. But the dtio driver would use the same architecture and interfaces as vfio. For devices on a system bus and represented in a device tree we have some different requirements than PCI for what is exposed in the device fd file. A device may have multiple address regions, multiple interrupts, a variable length device tree path, whether a region is mmapable, etc. With existing vfio, the device fd file layout is something like: 0xF Config space offset ... 0x6 ROM offset 0x5 BAR 5 offset 0x4 BAR 4 offset 0x3 BAR 3 offset 0x2 BAR 2 offset 0x1 BAR 1 offset 0x0 BAR 0 offset We have an alternate proposal that we think is more flexible, extensible, and will accommodate both PCI and system bus type devices (and others). Instead of config space fixed at 0xf, we would propose a header and multiple 'device info' records at offset 0x0 that would encode everything that user space needs to know about the device: Why not just use an ioctl with a proper struct? The config space is weird for PCI access because it's mirroring a well known binary blob. It's not something to replicate if you're inventing something new. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] RFC: vfio / device assignment -- layout of device fd files
On 08/29/2011 02:04 PM, Anthony Liguori wrote: On 08/29/2011 11:51 AM, Yoder Stuart-B08248 wrote: Instead of config space fixed at 0xf, we would propose a header and multiple 'device info' records at offset 0x0 that would encode everything that user space needs to know about the device: Why not just use an ioctl with a proper struct? This is more extensible than a struct -- both in features, and in the number of each type of resource that you can have, length of strings you can have, etc. The config space is weird for PCI access because it's mirroring a well known binary blob. It's not something to replicate if you're inventing something new. There's no intent to replicate config space in general -- config space is provided as-is. There's little overlap between config space and the extra information provided. Length can be had from config space, but only by modifying it. Physical address sort-of overlaps, though bus addresess could be different from CPU physical addresses[1]. In both cases, it'd be nice to stay consistent with device-tree regions. BAR type is overlap, but doesn't seem too unreasonable to me. -Scott [1] The user is probably less likely to care about the physical address at all in the PCI case, but consistency is nice. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: vfio / device assignment -- layout of device fd files
On Mon, 2011-08-29 at 16:51 +, Yoder Stuart-B08248 wrote: Alex Graf, Scott Wood, and I met last week to try to flesh out some details as to how vfio could work for non-PCI devices, like we have in embedded systems. This most likely will require a different kernel driver than vfio-- for now we are calling it dtio (for device tree I/O) as there is no way to discover these devices except from the device tree. But the dtio driver would use the same architecture and interfaces as vfio. Why is this a different kernel driver? The difference will primarily be in what bus types vfio registers drivers and the set of device types the device fds support. The group and iommu interfaces will be shared. This sounds more like vfio .config options (CONFIG_VFIO_PCI, CONFIG_VFIO_DT). For devices on a system bus and represented in a device tree we have some different requirements than PCI for what is exposed in the device fd file. A device may have multiple address regions, multiple interrupts, a variable length device tree path, whether a region is mmapable, etc. With existing vfio, the device fd file layout is something like: 0xF Config space offset ... 0x6 ROM offset 0x5 BAR 5 offset 0x4 BAR 4 offset 0x3 BAR 3 offset 0x2 BAR 2 offset 0x1 BAR 1 offset 0x0 BAR 0 offset We have an alternate proposal that we think is more flexible, extensible, and will accommodate both PCI and system bus type devices (and others). Instead of config space fixed at 0xf, we would propose a header and multiple 'device info' records at offset 0x0 that would encode everything that user space needs to know about the device: 0x0 +-+-+ | magic | version | u64 // magic u64 identifies the type of | vfio| | // passthru I/O, plus version # | dtio| | // vfio - PCI devices +-+-+ // dtio - device tree devices Maybe magic = pci, dt, ... | flags | u32 // encodes any flags (TBD) +---+ | dev info record N| |type | u32 // type of record |rec_len| u32 // length in bytes of record | | (including record header) |flags | u32 // type specific flags |...content... | // record content, which could +---+ // include sub-records | dev info record N+1 | +---+ | dev info record N+2 | +---+ ... The device info records following the file header have the following record types each with content encoded in a record specific way: REGION - describes an addressable address range for the device DTPATH - describes the device tree path for the device DTINDEX - describes the index into the related device tree property (reg,ranges,interrupts,interrupt-map) I don't quite understand if these are physical or virtual. INTERRUPT - describes an interrupt for the device PCI_CONFIG_SPACE - describes config space for the device I would have expected this to be a REGION with a property of PCI_CONFIG_SPACE. PCI_INFO - domain:bus:device:func Not entirely sure we need this. How are you imagining we get from a group fd to a device fd (wondering if you're only including this for enumeration)? I'm currently coding it as a VFIO_GROUP_GET_DEVICE_FD ioctl that takes a char* parameter that contains the dev_name() for the device requested. The list of devices under each group can be found by read()ing the group fd. If we keep this, we should make the interfaces similar, in fact, maybe this is how we describe the capabilities of the iommu too, reading a table from the iommu fd. PCI_BAR_INFO - information about the BAR for a device For a device tree type device the file may look like: 0x0+---+ |header | +---+ | type = REGION | | rec_len | | flags = | u32 // region specific flags | is_mmapable | | offset | u64 // seek offset to region from | |from beginning | len | u64 // length of region | addr| u64 // phys addr of region Would we only need to expose phys addr for 1:1 mapping requirements? I'm not sure why we'd care to expose this otherwise. | | +---+ \ type = DTPATH \ // a sub-region | rec_len| | flags | | dev tree path
Re: RFC: vfio / device assignment -- layout of device fd files
On 08/29/2011 02:51 PM, Alex Williamson wrote: On Mon, 2011-08-29 at 16:51 +, Yoder Stuart-B08248 wrote: The device info records following the file header have the following record types each with content encoded in a record specific way: REGION - describes an addressable address range for the device DTPATH - describes the device tree path for the device DTINDEX - describes the index into the related device tree property (reg,ranges,interrupts,interrupt-map) I don't quite understand if these are physical or virtual. If what are physical or virtual? INTERRUPT - describes an interrupt for the device PCI_CONFIG_SPACE - describes config space for the device I would have expected this to be a REGION with a property of PCI_CONFIG_SPACE. Could be, if physical address is made optional. PCI_INFO - domain:bus:device:func Not entirely sure we need this. How are you imagining we get from a group fd to a device fd (wondering if you're only including this for enumeration)? I'm currently coding it as a VFIO_GROUP_GET_DEVICE_FD ioctl that takes a char* parameter that contains the dev_name() for the device requested. The list of devices under each group can be found by read()ing the group fd. Seems like it would be nice to keep this information around, rather than require the user to pass it around separately. Shouldn't cost much. If we keep this, we should make the interfaces similar, Yes. PCI_BAR_INFO - information about the BAR for a device For a device tree type device the file may look like: 0x0+---+ |header | +---+ | type = REGION | | rec_len | | flags = | u32 // region specific flags | is_mmapable | | offset | u64 // seek offset to region from | |from beginning | len | u64 // length of region | addr| u64 // phys addr of region Would we only need to expose phys addr for 1:1 mapping requirements? I'm not sure why we'd care to expose this otherwise. It's more important for non-PCI, where it avoids the need for userspace to parse the device tree to find the guest address (we'll usually want 1:1), or to consolidate pages shared by multiple regions. It could be nice for debugging, as well. Seems like something that's better to have and not need, than the other way around. It would need to be optional, though, if we want to be able to describe things without a physical address like config space. | | +---+ \ type = DTPATH \ // a sub-region | rec_len| | flags | | dev tree path | char[] // device tree path +---+ \ type = DTINDEX \ // a sub-region | rec_len| | flags | | prop_type | u32 // REG, RANGES | prop_index | u32 // index into resource list +---+ | type = INTERRUPT | | rec_len | | flags| u32 | ioctl_handle | u32 // argument to ioctl to get interrupts Is this a dynamic ioctl number or just a u32 parameter to an ioctl like VFIO_DEVICE_GET_IRQ_FD (ie. an instance number)? It's a parameter to VFIO_DEVICE_GET_IRQ_FD. +---+ | type = PCI_INFO | | rec_len | | flags = 0x0 | | dom:bus:dev:func| u32 // pci device info +---+ | type = REGION | | rec_len | | flags = | | is_mmapable | | offset | u64 // seek offset to region from | |from beginning | len | u64 // length of region | addr| u64 // physical addr of region +---+ \ type = PCI_BAR_INFO\ | rec_len| | flags | | bar_type | // pio | | // prefetable mmio | | // non-prefetchable mmmio | bar_index | // index of bar in device Aren't a lot of these typical region attributes? Wondering if we should just make them part of the REGION flags or we'll have a growing number of sub-regions describing common things. It'd be nice to keep the base region record common among PCI/DT/whatever.
Re: RFC: vfio / device assignment -- layout of device fd files
On Mon, 2011-08-29 at 16:58 -0500, Scott Wood wrote: On 08/29/2011 02:51 PM, Alex Williamson wrote: On Mon, 2011-08-29 at 16:51 +, Yoder Stuart-B08248 wrote: The device info records following the file header have the following record types each with content encoded in a record specific way: REGION - describes an addressable address range for the device DTPATH - describes the device tree path for the device DTINDEX - describes the index into the related device tree property (reg,ranges,interrupts,interrupt-map) I don't quite understand if these are physical or virtual. If what are physical or virtual? Can you give an example of a path vs an index? I don't understand enough about these to ask a useful question about what they're describing. INTERRUPT - describes an interrupt for the device PCI_CONFIG_SPACE - describes config space for the device I would have expected this to be a REGION with a property of PCI_CONFIG_SPACE. Could be, if physical address is made optional. Or physical address is also a property, aka sub-region. PCI_INFO - domain:bus:device:func Not entirely sure we need this. How are you imagining we get from a group fd to a device fd (wondering if you're only including this for enumeration)? I'm currently coding it as a VFIO_GROUP_GET_DEVICE_FD ioctl that takes a char* parameter that contains the dev_name() for the device requested. The list of devices under each group can be found by read()ing the group fd. Seems like it would be nice to keep this information around, rather than require the user to pass it around separately. Shouldn't cost much. Seems redundant. The user had to know the d:b:d.f to get the device fd in the first place. At best it's a sanity check. If we keep this, we should make the interfaces similar, Yes. PCI_BAR_INFO - information about the BAR for a device For a device tree type device the file may look like: 0x0+---+ |header | +---+ | type = REGION | | rec_len | | flags = | u32 // region specific flags | is_mmapable | | offset | u64 // seek offset to region from | |from beginning | len | u64 // length of region | addr| u64 // phys addr of region Would we only need to expose phys addr for 1:1 mapping requirements? I'm not sure why we'd care to expose this otherwise. It's more important for non-PCI, where it avoids the need for userspace to parse the device tree to find the guest address (we'll usually want 1:1), or to consolidate pages shared by multiple regions. It could be nice for debugging, as well. So the device tree path is ripped straight from the system, so it's the actual 1:1, matching physical hardware, path. Seems like something that's better to have and not need, than the other way around. It would need to be optional, though, if we want to be able to describe things without a physical address like config space. sub-region? | | +---+ \ type = DTPATH \ // a sub-region | rec_len| | flags | | dev tree path | char[] // device tree path +---+ \ type = DTINDEX \ // a sub-region | rec_len| | flags | | prop_type | u32 // REG, RANGES | prop_index | u32 // index into resource list +---+ | type = INTERRUPT | | rec_len | | flags| u32 | ioctl_handle | u32 // argument to ioctl to get interrupts Is this a dynamic ioctl number or just a u32 parameter to an ioctl like VFIO_DEVICE_GET_IRQ_FD (ie. an instance number)? It's a parameter to VFIO_DEVICE_GET_IRQ_FD. +---+ | type = PCI_INFO | | rec_len | | flags = 0x0 | | dom:bus:dev:func| u32 // pci device info +---+ | type = REGION | | rec_len | | flags = | | is_mmapable | | offset | u64 // seek offset to region from | |from beginning | len | u64 // length of region | addr| u64 // physical addr of region +---+ \
Re: RFC: vfio / device assignment -- layout of device fd files
On 08/29/2011 05:46 PM, Alex Williamson wrote: On Mon, 2011-08-29 at 16:58 -0500, Scott Wood wrote: On 08/29/2011 02:51 PM, Alex Williamson wrote: On Mon, 2011-08-29 at 16:51 +, Yoder Stuart-B08248 wrote: The device info records following the file header have the following record types each with content encoded in a record specific way: REGION - describes an addressable address range for the device DTPATH - describes the device tree path for the device DTINDEX - describes the index into the related device tree property (reg,ranges,interrupts,interrupt-map) I don't quite understand if these are physical or virtual. If what are physical or virtual? Can you give an example of a path vs an index? I don't understand enough about these to ask a useful question about what they're describing. You'd have both path and index. Example, for this tree: / { ... foo { ... bar { reg = 0x1000 64 0x1800 64; ranges = 0 0x2 0x1; ... child { reg = 0x100 0x100; ... }; }; }; }; There would be 4 regions if you bind to /foo/bar: // this is 64 bytes at 0x1000 DTPATH /foo/bar DTINDEX prop_type=REG prop_index=0 // this is 64 bytes at 0x1800 DTPATH /foo/bar DTINDEX prop_type=REG prop_index=1 // this is 16K at 0x2 DTPATH /foo/bar DTINDEX prop_type=RANGES prop_index=0 // this is 256 bytes at 0x20100 DTPATH /foo/bar/child DTINDEX prop_type=REG prop_index=0 Both ranges and the child reg are needed, since ranges could be a simple ranges; that passes everything with no translation, and child nodes could be absent-but-implied in some other cases (such as when they represent PCI devices which can be probed -- we still need to map the ranges that correspond to PCI controller windows). INTERRUPT - describes an interrupt for the device PCI_CONFIG_SPACE - describes config space for the device I would have expected this to be a REGION with a property of PCI_CONFIG_SPACE. Could be, if physical address is made optional. Or physical address is also a property, aka sub-region. A subrecord of REGION is fine with me. Would we only need to expose phys addr for 1:1 mapping requirements? I'm not sure why we'd care to expose this otherwise. It's more important for non-PCI, where it avoids the need for userspace to parse the device tree to find the guest address (we'll usually want 1:1), or to consolidate pages shared by multiple regions. It could be nice for debugging, as well. So the device tree path is ripped straight from the system, so it's the actual 1:1, matching physical hardware, path. Yes. Even for non-PCI we need to know if the region is pio/mmio32/mmio64/prefetchable/etc. Outside of PCI, what standardized form would you put such information in? Where would the kernel get this information? What does mmio32/mmio64 mean in this context? I could imagine a platform device described by ACPI that might want to differentiate. The physical device doesn't get moved of course, but guest drivers might care how the device is described if we need to rebuild those ACPI tables. ACPI might even be a good place to leverage these data structures... /me ducks. ACPI info could be another subrecord type, but in the device tree system-bus case we generally don't have this information at the generic infrastructure level. Drivers are expected to know how their devices' regions should be mapped. BAR index could really just translate to a REGION instance number. How would that work if you make non-BAR things (such as config space) into regions? Put their instance numbers outside of the BAR region? We have a fixed REGION space on PCI, so we could just define BAR0 == instance 0, BAR1 == instance 1... ROM == instance 6, CONFIG == instance 0xF (or 7). Seems more awkward than just having each region say what it is. What do you do to fill in the gaps? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: vfio / device assignment -- layout of device fd files
On Mon, 2011-08-29 at 18:14 -0500, Scott Wood wrote: On 08/29/2011 05:46 PM, Alex Williamson wrote: On Mon, 2011-08-29 at 16:58 -0500, Scott Wood wrote: On 08/29/2011 02:51 PM, Alex Williamson wrote: On Mon, 2011-08-29 at 16:51 +, Yoder Stuart-B08248 wrote: The device info records following the file header have the following record types each with content encoded in a record specific way: REGION - describes an addressable address range for the device DTPATH - describes the device tree path for the device DTINDEX - describes the index into the related device tree property (reg,ranges,interrupts,interrupt-map) I don't quite understand if these are physical or virtual. If what are physical or virtual? Can you give an example of a path vs an index? I don't understand enough about these to ask a useful question about what they're describing. You'd have both path and index. Example, for this tree: / { ... foo { ... bar { reg = 0x1000 64 0x1800 64; ranges = 0 0x2 0x1; ... child { reg = 0x100 0x100; ... }; }; }; }; There would be 4 regions if you bind to /foo/bar: // this is 64 bytes at 0x1000 DTPATH /foo/bar DTINDEX prop_type=REG prop_index=0 // this is 64 bytes at 0x1800 DTPATH /foo/bar DTINDEX prop_type=REG prop_index=1 // this is 16K at 0x2 DTPATH /foo/bar DTINDEX prop_type=RANGES prop_index=0 // this is 256 bytes at 0x20100 DTPATH /foo/bar/child DTINDEX prop_type=REG prop_index=0 Both ranges and the child reg are needed, since ranges could be a simple ranges; that passes everything with no translation, and child nodes could be absent-but-implied in some other cases (such as when they represent PCI devices which can be probed -- we still need to map the ranges that correspond to PCI controller windows). Thanks for the example. Is it always the case that you need a path and an index? If so, why are they separate sub-regions instead of combined into a DT_INFO sub-region? INTERRUPT - describes an interrupt for the device PCI_CONFIG_SPACE - describes config space for the device I would have expected this to be a REGION with a property of PCI_CONFIG_SPACE. Could be, if physical address is made optional. Or physical address is also a property, aka sub-region. A subrecord of REGION is fine with me. Would we only need to expose phys addr for 1:1 mapping requirements? I'm not sure why we'd care to expose this otherwise. It's more important for non-PCI, where it avoids the need for userspace to parse the device tree to find the guest address (we'll usually want 1:1), or to consolidate pages shared by multiple regions. It could be nice for debugging, as well. So the device tree path is ripped straight from the system, so it's the actual 1:1, matching physical hardware, path. Yes. Even for non-PCI we need to know if the region is pio/mmio32/mmio64/prefetchable/etc. Outside of PCI, what standardized form would you put such information in? Where would the kernel get this information? What does mmio32/mmio64 mean in this context? I could imagine a platform device described by ACPI that might want to differentiate. The physical device doesn't get moved of course, but guest drivers might care how the device is described if we need to rebuild those ACPI tables. ACPI might even be a good place to leverage these data structures... /me ducks. ACPI info could be another subrecord type, but in the device tree system-bus case we generally don't have this information at the generic infrastructure level. Drivers are expected to know how their devices' regions should be mapped. The device tree tells them how they're mapped, right? Or maybe more precisely, the device tree tells them where they're mapped and it doesn't really matter whether they're 32bit or 64bit because they can't be moved. Maybe this is sub-region material. It just feels wrong to enumerate a region and not be able to include any basic properties beyond offset and size in a common field. For PCI, we can also describe the properties via config space, so sub-regions could still be optional. BAR index could really just translate to a REGION instance number. How would that work if you make non-BAR things (such as config space) into regions? Put their instance numbers outside of the BAR region? We have a fixed REGION space on PCI, so we could just define BAR0 == instance 0, BAR1 == instance 1... ROM == instance 6, CONFIG == instance 0xF (or 7). Seems more awkward than just having each region say what it is. What do you do to fill in the gaps? You don't, instance numbers would just be non-contiguous. The