Re: RFC: vfio / device assignment -- layout of device fd files

2011-09-02 Thread Michael S. Tsirkin
On Thu, Sep 01, 2011 at 03:26:01PM -0500, Scott Wood wrote:
 On 09/01/2011 03:00 PM, Michael S. Tsirkin wrote:
  That's a very rich interface, and easy to get wrong.
  AFAIK the only reason vfio uses read/write for PCI was to avoid inventing
  a custom interface. But if you do, it looks like a set of ioctls would
  be much easier? You can even fit the existing uio infrastructure if you 
  like.
 
 How would it be easier than producing/parsing a static data structure?
 What would it look like?

For example, for a property X, instead of adding a structure
with identifier X, implement ioctl GET_X. Userspace
calls this ioctl, an error implies the property
is not present.


  Here's another idea:  all the information is likely already available
  in sysfs.
 
 The only major thing that is likely available elsewhere is PCI config
 space, and that was not new to this proposal.
 Most other material is specifically related to the vfio/dtio interface
 (e.g. offsets into the file handle, arguments to the get irq fd ioctl,
 mapping of dtio regions/interrupts to device tree nodes), and could not
 be useful to more than just vfio.

For example resources are already there in sysfs.

  A way to query where the device is in sysfs
  would give you *a ton* of information, including the type etc,
 
 For PCI, the user has domain/bus/dev/fn which should be sufficient to
 find that, if desired.  For device-tree devices, there's a device tree
 path provided for each region/interrupt.
 
 -Scott

So you are saying the user already has sysfs path?
I thought the point was to pass all info through
a single fd.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio / device assignment -- layout of device fd files

2011-09-02 Thread Scott Wood
On 09/02/2011 10:57 AM, Michael S. Tsirkin wrote:
 On Thu, Sep 01, 2011 at 03:26:01PM -0500, Scott Wood wrote:
 On 09/01/2011 03:00 PM, Michael S. Tsirkin wrote:
 That's a very rich interface, and easy to get wrong.
 AFAIK the only reason vfio uses read/write for PCI was to avoid inventing
 a custom interface. But if you do, it looks like a set of ioctls would
 be much easier? You can even fit the existing uio infrastructure if you 
 like.

 How would it be easier than producing/parsing a static data structure?
 What would it look like?
 
 For example, for a property X, instead of adding a structure
 with identifier X, implement ioctl GET_X. Userspace
 calls this ioctl, an error implies the property
 is not present.

Why is this better?

 Here's another idea:  all the information is likely already available
 in sysfs.

 The only major thing that is likely available elsewhere is PCI config
 space, and that was not new to this proposal.
 Most other material is specifically related to the vfio/dtio interface
 (e.g. offsets into the file handle, arguments to the get irq fd ioctl,
 mapping of dtio regions/interrupts to device tree nodes), and could not
 be useful to more than just vfio.
 
 For example resources are already there in sysfs.

Their relationship to vfio regions, and their offsets in the fd, is not.

 A way to query where the device is in sysfs
 would give you *a ton* of information, including the type etc,

 For PCI, the user has domain/bus/dev/fn which should be sufficient to
 find that, if desired.  For device-tree devices, there's a device tree
 path provided for each region/interrupt.

 -Scott
 
 So you are saying the user already has sysfs path?
 I thought the point was to pass all info through
 a single fd.

No, as I understand it the point is to provide access through that fd,
as well as information that is specific to that fd.  Whether any
particular extra bit of information gets included there is a question of
convenience -- which should not be ignored in interface design.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio / device assignment -- layout of device fd files

2011-09-01 Thread Michael S. Tsirkin
On Mon, Aug 29, 2011 at 04:51:20PM +, Yoder Stuart-B08248 wrote:
 Alex Graf, Scott Wood, and I met last week to try to flesh out
 some details as to how vfio could work for non-PCI devices,
 like we have in embedded systems.   This most likely will
 require a different kernel driver than vfio-- for now we are
 calling it dtio (for device tree I/O) as there is no way
 to discover these devices except from the device tree.   But
 the dtio driver would use the same architecture and interfaces
 as vfio.
 
 For devices on a system bus and represented in a device
 tree we have some different requirements than PCI for what
 is exposed in the device fd file.  A device may have multiple
 address regions, multiple interrupts, a variable length device
 tree path, whether a region is mmapable, etc.
 
 With existing vfio, the device fd file layout is something
 like:
   0xF Config space offset
   ...
   0x6 ROM offset
   0x5 BAR 5 offset
   0x4 BAR 4 offset
   0x3 BAR 3 offset
   0x2 BAR 2 offset
   0x1 BAR 1 offset
   0x0 BAR 0 offset
 
 We have an alternate proposal that we think is more flexible,
 extensible, and will accommodate both PCI and system bus
 type devices (and others).
 
 Instead of config space fixed at 0xf, we would propose
 a header and multiple 'device info' records at offset 0x0 that would
 encode everything that user space needs to know about
 the device:
 

That's a very rich interface, and easy to get wrong.
AFAIK the only reason vfio uses read/write for PCI was to avoid inventing
a custom interface. But if you do, it looks like a set of ioctls would
be much easier? You can even fit the existing uio infrastructure if you like.

 There may be other more complex device or bus types that
 need their own special encodings, and this format would
 allow the definition of new records to define devices.  Two
 other types that come to mind are Serial Rapid I/O busses
 commonly used in our networking SoCs and the FSL DPAA
 portals which are very strange devices that may require
 their own unique interface exposed to user space.
 
 In short, when user space opens up a device fd it needs
 some information about what this device is, and this
 proposal tries to address that.
 
 Regards,
 Stuart Yoder

Here's another idea:  all the information is likely already available
in sysfs. A way to query where the device is in sysfs
would give you *a ton* of information, including the type etc,
if the info is not there you will be able to add it
in a way that is useful to more than just vfio,
and you won't need to extend the interface each time
you find you need a new piece of info.


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio / device assignment -- layout of device fd files

2011-09-01 Thread Scott Wood
On 09/01/2011 03:00 PM, Michael S. Tsirkin wrote:
 That's a very rich interface, and easy to get wrong.
 AFAIK the only reason vfio uses read/write for PCI was to avoid inventing
 a custom interface. But if you do, it looks like a set of ioctls would
 be much easier? You can even fit the existing uio infrastructure if you like.

How would it be easier than producing/parsing a static data structure?
What would it look like?

 Here's another idea:  all the information is likely already available
 in sysfs.

The only major thing that is likely available elsewhere is PCI config
space, and that was not new to this proposal.

Most other material is specifically related to the vfio/dtio interface
(e.g. offsets into the file handle, arguments to the get irq fd ioctl,
mapping of dtio regions/interrupts to device tree nodes), and could not
be useful to more than just vfio.

 A way to query where the device is in sysfs
 would give you *a ton* of information, including the type etc,

For PCI, the user has domain/bus/dev/fn which should be sufficient to
find that, if desired.  For device-tree devices, there's a device tree
path provided for each region/interrupt.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio / device assignment -- layout of device fd files

2011-08-30 Thread Scott Wood
On 08/29/2011 11:55 PM, Alex Williamson wrote:
 Thanks for the example.  Is it always the case that you need a path and
 an index?  If so, why are they separate sub-regions instead of combined
 into a DT_INFO sub-region?

You'll always need both.  DTPATH was put into a separate subrecord
because it is of variable length.  Putting it at the end of DTINDEX
would inhibit DTINDEX's extensibility.

 On Mon, 2011-08-29 at 18:14 -0500, Scott Wood wrote:
 On 08/29/2011 05:46 PM, Alex Williamson wrote:
 I could imagine a platform device described by ACPI that might want to
 differentiate.  The physical device doesn't get moved of course, but
 guest drivers might care how the device is described if we need to
 rebuild those ACPI tables.  ACPI might even be a good place to leverage
 these data structures... /me ducks.

 ACPI info could be another subrecord type, but in the device tree
 system-bus case we generally don't have this information at the generic
 infrastructure level.  Drivers are expected to know how their devices'
 regions should be mapped.
 
 The device tree tells them how they're mapped, right?
 Or maybe more precisely, the device tree tells them where they're mapped and 
 it
 doesn't really matter whether they're 32bit or 64bit because they can't
 be moved.

The device tree says where they're mapped.  The concept of 32/64bit
doesn't really apply to a fixed mapping -- or if not fixed, there may be
arbitrary constraints, and the OS should just treat it as fixed.

 Maybe this is sub-region material.  It just feels wrong to enumerate a
 region and not be able to include any basic properties beyond offset and
 size in a common field.

offset, size, and whether you can use mmap() are the only things we
could think of that were universal.  The rest can go in subrecords.

 Put their instance numbers outside of the BAR region?  We have a fixed
 REGION space on PCI, so we could just define BAR0 == instance 0, BAR1 ==
 instance 1... ROM == instance 6, CONFIG == instance 0xF (or 7).

 Seems more awkward than just having each region say what it is.  What do
 you do to fill in the gaps?
 
 You don't, instance numbers would just be non-contiguous.

OK, maybe I'm not understanding what you mean by instance numbers -- I
thought you were talking about the order in which they appear.  If you
mean to put an index field in the common REGION struct, I'd prefer not
to.  It would lack meaning without the context of some particular
encoding scheme.  Better to put it in a typed subrecord where we can be
explicit about what it means.

 The reason I cringe at PCI_INFO sub-regions is that all the info is already 
 available
 in PCI config space

BAR index isn't.  I'm neutral on whether it makes sense to include BAR type.

 We also have MSI/X
 interrupts on PCI.  Do we need to describe those via info records or
 parse PCI config space and know that vfio-pci device fds support the
 VFIO_PCI_DEVICE_SET_MSI_FDS ioctl?

I'm not very familiar with MSI-X, but if it's workable without it then
probably no need.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC: vfio / device assignment -- layout of device fd files

2011-08-29 Thread Yoder Stuart-B08248
Alex Graf, Scott Wood, and I met last week to try to flesh out
some details as to how vfio could work for non-PCI devices,
like we have in embedded systems.   This most likely will
require a different kernel driver than vfio-- for now we are
calling it dtio (for device tree I/O) as there is no way
to discover these devices except from the device tree.   But
the dtio driver would use the same architecture and interfaces
as vfio.

For devices on a system bus and represented in a device
tree we have some different requirements than PCI for what
is exposed in the device fd file.  A device may have multiple
address regions, multiple interrupts, a variable length device
tree path, whether a region is mmapable, etc.

With existing vfio, the device fd file layout is something
like:
  0xF Config space offset
  ...
  0x6 ROM offset
  0x5 BAR 5 offset
  0x4 BAR 4 offset
  0x3 BAR 3 offset
  0x2 BAR 2 offset
  0x1 BAR 1 offset
  0x0 BAR 0 offset

We have an alternate proposal that we think is more flexible,
extensible, and will accommodate both PCI and system bus
type devices (and others).

Instead of config space fixed at 0xf, we would propose
a header and multiple 'device info' records at offset 0x0 that would
encode everything that user space needs to know about
the device:

  0x0  +-+-+
   | magic   |   version   | u64   // magic u64 identifies the type of
   |   vfio| |   // passthru I/O, plus version #
   |   dtio| |   //   vfio - PCI devices
   +-+-+   //   dtio - device tree devices
   | flags | u32   // encodes any flags (TBD)
   +---+
   |  dev info record N|
   |type   | u32   // type of record
   |rec_len| u32   // length in bytes of record
   |   |  (including record header)
   |flags  | u32   // type specific flags
   |...content...  |   // record content, which could
   +---+   // include sub-records
   |  dev info record N+1  |
   +---+
   |  dev info record N+2  |
   +---+
   ...

The device info records following the file header have the following
record types each with content encoded in a record specific way:

 REGION  - describes an addressable address range for the device
 DTPATH - describes the device tree path for the device
 DTINDEX - describes the index into the related device tree
   property (reg,ranges,interrupts,interrupt-map)
 INTERRUPT - describes an interrupt for the device
 PCI_CONFIG_SPACE - describes config space for the device
 PCI_INFO - domain:bus:device:func
 PCI_BAR_INFO - information about the BAR for a device

For a device tree type device the file may look like:

 0x0+---+
|header |  
+---+
|   type = REGION   |  
|   rec_len |  
|   flags = | u32 // region specific flags
|   is_mmapable | 
|   offset  | u64 // seek offset to region from
|   |from beginning
|   len | u64 // length of region
|   addr| u64 // phys addr of region
|   |  
+---+
 \   type = DTPATH  \  // a sub-region
  |   rec_len|  
  |   flags  |  
  |   dev tree path  | char[] // device tree path
+---+
 \   type = DTINDEX \  // a sub-region
  |   rec_len|  
  |   flags  |  
  |   prop_type  | u32  // REG, RANGES
  |   prop_index | u32  // index  into resource list
+---+
|  type = INTERRUPT |  
|  rec_len  |  
|  flags| u32 
|  ioctl_handle | u32 // argument to ioctl to get interrupts
|   |  
+---+
 \   type = DTPATH \  // a sub-region
  |   rec_len   |  
  |   flags |  
  |   dev tree path |  char[] // device tree path
+---+
  \   type = DTINDEX   \  // a sub-region 
  |   rec_len   |  
  |   flags |  
  |   prop_type | u32 // INTERRUPT,INTERRUPT_MAP
  |   prop_index| u32 // index


PCI devices would have a PCI specific encoding.  Instead of
config space and the mappable BAR regions being at specific
predetermined offsets, the device info records 

Re: [Qemu-devel] RFC: vfio / device assignment -- layout of device fd files

2011-08-29 Thread Anthony Liguori

On 08/29/2011 11:51 AM, Yoder Stuart-B08248 wrote:

Alex Graf, Scott Wood, and I met last week to try to flesh out
some details as to how vfio could work for non-PCI devices,
like we have in embedded systems.   This most likely will
require a different kernel driver than vfio-- for now we are
calling it dtio (for device tree I/O) as there is no way
to discover these devices except from the device tree.   But
the dtio driver would use the same architecture and interfaces
as vfio.

For devices on a system bus and represented in a device
tree we have some different requirements than PCI for what
is exposed in the device fd file.  A device may have multiple
address regions, multiple interrupts, a variable length device
tree path, whether a region is mmapable, etc.

With existing vfio, the device fd file layout is something
like:
   0xF Config space offset
   ...
   0x6 ROM offset
   0x5 BAR 5 offset
   0x4 BAR 4 offset
   0x3 BAR 3 offset
   0x2 BAR 2 offset
   0x1 BAR 1 offset
   0x0 BAR 0 offset

We have an alternate proposal that we think is more flexible,
extensible, and will accommodate both PCI and system bus
type devices (and others).

Instead of config space fixed at 0xf, we would propose
a header and multiple 'device info' records at offset 0x0 that would
encode everything that user space needs to know about
the device:


Why not just use an ioctl with a proper struct?

The config space is weird for PCI access because it's mirroring a well 
known binary blob.  It's not something to replicate if you're inventing 
something new.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] RFC: vfio / device assignment -- layout of device fd files

2011-08-29 Thread Scott Wood
On 08/29/2011 02:04 PM, Anthony Liguori wrote:
 On 08/29/2011 11:51 AM, Yoder Stuart-B08248 wrote:
 Instead of config space fixed at 0xf, we would propose
 a header and multiple 'device info' records at offset 0x0 that would
 encode everything that user space needs to know about
 the device:
 
 Why not just use an ioctl with a proper struct?

This is more extensible than a struct -- both in features, and in the
number of each type of resource that you can have, length of strings you
can have, etc.

 The config space is weird for PCI access because it's mirroring a well
 known binary blob.  It's not something to replicate if you're inventing
 something new.

There's no intent to replicate config space in general -- config space
is provided as-is.  There's little overlap between config space and the
extra information provided.  Length can be had from config space, but
only by modifying it.  Physical address sort-of overlaps, though bus
addresess could be different from CPU physical addresses[1].  In both
cases, it'd be nice to stay consistent with device-tree regions.

BAR type is overlap, but doesn't seem too unreasonable to me.

-Scott

[1] The user is probably less likely to care about the physical address
at all in the PCI case, but consistency is nice.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio / device assignment -- layout of device fd files

2011-08-29 Thread Alex Williamson
On Mon, 2011-08-29 at 16:51 +, Yoder Stuart-B08248 wrote:
 Alex Graf, Scott Wood, and I met last week to try to flesh out
 some details as to how vfio could work for non-PCI devices,
 like we have in embedded systems.   This most likely will
 require a different kernel driver than vfio-- for now we are
 calling it dtio (for device tree I/O) as there is no way
 to discover these devices except from the device tree.   But
 the dtio driver would use the same architecture and interfaces
 as vfio.

Why is this a different kernel driver?  The difference will primarily be
in what bus types vfio registers drivers and the set of device types the
device fds support.  The group and iommu interfaces will be shared.
This sounds more like vfio .config options (CONFIG_VFIO_PCI,
CONFIG_VFIO_DT).

 For devices on a system bus and represented in a device
 tree we have some different requirements than PCI for what
 is exposed in the device fd file.  A device may have multiple
 address regions, multiple interrupts, a variable length device
 tree path, whether a region is mmapable, etc.
 
 With existing vfio, the device fd file layout is something
 like:
   0xF Config space offset
   ...
   0x6 ROM offset
   0x5 BAR 5 offset
   0x4 BAR 4 offset
   0x3 BAR 3 offset
   0x2 BAR 2 offset
   0x1 BAR 1 offset
   0x0 BAR 0 offset
 
 We have an alternate proposal that we think is more flexible,
 extensible, and will accommodate both PCI and system bus
 type devices (and others).
 
 Instead of config space fixed at 0xf, we would propose
 a header and multiple 'device info' records at offset 0x0 that would
 encode everything that user space needs to know about
 the device:
 
   0x0  +-+-+
| magic   |   version   | u64   // magic u64 identifies the type of
|   vfio| |   // passthru I/O, plus version #
|   dtio| |   //   vfio - PCI devices
+-+-+   //   dtio - device tree devices

Maybe magic = pci, dt, ...

| flags | u32   // encodes any flags (TBD)
+---+
|  dev info record N|
|type   | u32   // type of record
|rec_len| u32   // length in bytes of record
|   |  (including record header)
|flags  | u32   // type specific flags
|...content...  |   // record content, which could
+---+   // include sub-records
|  dev info record N+1  |
+---+
|  dev info record N+2  |
+---+
...
 
 The device info records following the file header have the following
 record types each with content encoded in a record specific way:
 
  REGION  - describes an addressable address range for the device
  DTPATH - describes the device tree path for the device
  DTINDEX - describes the index into the related device tree
property (reg,ranges,interrupts,interrupt-map)

I don't quite understand if these are physical or virtual.

  INTERRUPT - describes an interrupt for the device
  PCI_CONFIG_SPACE - describes config space for the device

I would have expected this to be a REGION with a property of
PCI_CONFIG_SPACE.

  PCI_INFO - domain:bus:device:func

Not entirely sure we need this.  How are you imagining we get from a
group fd to a device fd (wondering if you're only including this for
enumeration)?  I'm currently coding it as a VFIO_GROUP_GET_DEVICE_FD
ioctl that takes a char* parameter that contains the dev_name() for the
device requested.  The list of devices under each group can be found by
read()ing the group fd.  If we keep this, we should make the interfaces
similar, in fact, maybe this is how we describe the capabilities of the
iommu too, reading a table from the iommu fd.

  PCI_BAR_INFO - information about the BAR for a device
 
 For a device tree type device the file may look like:
 
  0x0+---+
 |header |  
 +---+
 |   type = REGION   |  
 |   rec_len |  
 |   flags = | u32 // region specific flags
 |   is_mmapable | 
 |   offset  | u64 // seek offset to region from
 |   |from beginning
 |   len | u64 // length of region
 |   addr| u64 // phys addr of region

Would we only need to expose phys addr for 1:1 mapping requirements?
I'm not sure why we'd care to expose this otherwise.

 |   |  
 +---+
  \   type = DTPATH  \  // a sub-region
   |   rec_len|  
   |   flags  |  
   |   dev tree path 

Re: RFC: vfio / device assignment -- layout of device fd files

2011-08-29 Thread Scott Wood
On 08/29/2011 02:51 PM, Alex Williamson wrote:
 On Mon, 2011-08-29 at 16:51 +, Yoder Stuart-B08248 wrote:
 The device info records following the file header have the following
 record types each with content encoded in a record specific way:

  REGION  - describes an addressable address range for the device
  DTPATH - describes the device tree path for the device
  DTINDEX - describes the index into the related device tree
property (reg,ranges,interrupts,interrupt-map)
 
 I don't quite understand if these are physical or virtual.

If what are physical or virtual?

  INTERRUPT - describes an interrupt for the device
  PCI_CONFIG_SPACE - describes config space for the device
 
 I would have expected this to be a REGION with a property of
 PCI_CONFIG_SPACE.

Could be, if physical address is made optional.

  PCI_INFO - domain:bus:device:func
 
 Not entirely sure we need this.  How are you imagining we get from a
 group fd to a device fd (wondering if you're only including this for
 enumeration)?  I'm currently coding it as a VFIO_GROUP_GET_DEVICE_FD
 ioctl that takes a char* parameter that contains the dev_name() for the
 device requested.  The list of devices under each group can be found by
 read()ing the group fd.  

Seems like it would be nice to keep this information around, rather than
require the user to pass it around separately.  Shouldn't cost much.

 If we keep this, we should make the interfaces similar, 

Yes.

  PCI_BAR_INFO - information about the BAR for a device

 For a device tree type device the file may look like:

  0x0+---+
 |header |  
 +---+
 |   type = REGION   |  
 |   rec_len |  
 |   flags = | u32 // region specific flags
 |   is_mmapable | 
 |   offset  | u64 // seek offset to region from
 |   |from beginning
 |   len | u64 // length of region
 |   addr| u64 // phys addr of region
 
 Would we only need to expose phys addr for 1:1 mapping requirements?
 I'm not sure why we'd care to expose this otherwise.

It's more important for non-PCI, where it avoids the need for userspace
to parse the device tree to find the guest address (we'll usually want
1:1), or to consolidate pages shared by multiple regions.  It could be
nice for debugging, as well.

Seems like something that's better to have and not need, than the other
way around.  It would need to be optional, though, if we want to be able
to describe things without a physical address like config space.

 |   |  
 +---+
  \   type = DTPATH  \  // a sub-region
   |   rec_len|  
   |   flags  |  
   |   dev tree path  | char[] // device tree path
 +---+
  \   type = DTINDEX \  // a sub-region
   |   rec_len|  
   |   flags  |  
   |   prop_type  | u32  // REG, RANGES
   |   prop_index | u32  // index  into resource list
 +---+
 |  type = INTERRUPT |  
 |  rec_len  |  
 |  flags| u32 
 |  ioctl_handle | u32 // argument to ioctl to get interrupts
 
 Is this a dynamic ioctl number or just a u32 parameter to an ioctl like
 VFIO_DEVICE_GET_IRQ_FD (ie. an instance number)?

It's a parameter to VFIO_DEVICE_GET_IRQ_FD.

 +---+
 |   type = PCI_INFO |  
 |   rec_len |  
 |   flags = 0x0 |  
 |   dom:bus:dev:func| u32 // pci device info
 +---+
 |   type = REGION   |  
 |   rec_len |  
 |   flags = |  
 |   is_mmapable |  
 |   offset  | u64 // seek offset to region from
 |   |from beginning
 |   len | u64 // length of region
 |   addr| u64 // physical addr of region
 +---+
  \   type = PCI_BAR_INFO\  
   |   rec_len|  
   |   flags  |  
   |   bar_type   |  // pio
   |  |  // prefetable mmio
   |  |  // non-prefetchable mmmio
   |   bar_index  |  // index of bar in device
 
 Aren't a lot of these typical region attributes?  Wondering if we should
 just make them part of the REGION flags or we'll have a growing number
 of sub-regions describing common things.

It'd be nice to keep the base region record common among
PCI/DT/whatever.


Re: RFC: vfio / device assignment -- layout of device fd files

2011-08-29 Thread Alex Williamson
On Mon, 2011-08-29 at 16:58 -0500, Scott Wood wrote:
 On 08/29/2011 02:51 PM, Alex Williamson wrote:
  On Mon, 2011-08-29 at 16:51 +, Yoder Stuart-B08248 wrote:
  The device info records following the file header have the following
  record types each with content encoded in a record specific way:
 
   REGION  - describes an addressable address range for the device
   DTPATH - describes the device tree path for the device
   DTINDEX - describes the index into the related device tree
 property (reg,ranges,interrupts,interrupt-map)
  
  I don't quite understand if these are physical or virtual.
 
 If what are physical or virtual?

Can you give an example of a path vs an index?  I don't understand
enough about these to ask a useful question about what they're
describing.

   INTERRUPT - describes an interrupt for the device
   PCI_CONFIG_SPACE - describes config space for the device
  
  I would have expected this to be a REGION with a property of
  PCI_CONFIG_SPACE.
 
 Could be, if physical address is made optional.

Or physical address is also a property, aka sub-region.

   PCI_INFO - domain:bus:device:func
  
  Not entirely sure we need this.  How are you imagining we get from a
  group fd to a device fd (wondering if you're only including this for
  enumeration)?  I'm currently coding it as a VFIO_GROUP_GET_DEVICE_FD
  ioctl that takes a char* parameter that contains the dev_name() for the
  device requested.  The list of devices under each group can be found by
  read()ing the group fd.  
 
 Seems like it would be nice to keep this information around, rather than
 require the user to pass it around separately.  Shouldn't cost much.

Seems redundant.  The user had to know the d:b:d.f to get the device fd
in the first place.  At best it's a sanity check.

  If we keep this, we should make the interfaces similar, 
 
 Yes.
 
   PCI_BAR_INFO - information about the BAR for a device
 
  For a device tree type device the file may look like:
 
   0x0+---+
  |header |  
  +---+
  |   type = REGION   |  
  |   rec_len |  
  |   flags = | u32 // region specific flags
  |   is_mmapable | 
  |   offset  | u64 // seek offset to region from
  |   |from beginning
  |   len | u64 // length of region
  |   addr| u64 // phys addr of region
  
  Would we only need to expose phys addr for 1:1 mapping requirements?
  I'm not sure why we'd care to expose this otherwise.
 
 It's more important for non-PCI, where it avoids the need for userspace
 to parse the device tree to find the guest address (we'll usually want
 1:1), or to consolidate pages shared by multiple regions.  It could be
 nice for debugging, as well.

So the device tree path is ripped straight from the system, so it's the
actual 1:1, matching physical hardware, path.

 Seems like something that's better to have and not need, than the other
 way around.  It would need to be optional, though, if we want to be able
 to describe things without a physical address like config space.

sub-region?

  |   |  
  +---+
   \   type = DTPATH  \  // a sub-region
|   rec_len|  
|   flags  |  
|   dev tree path  | char[] // device tree path
  +---+
   \   type = DTINDEX \  // a sub-region
|   rec_len|  
|   flags  |  
|   prop_type  | u32  // REG, RANGES
|   prop_index | u32  // index  into resource list
  +---+
  |  type = INTERRUPT |  
  |  rec_len  |  
  |  flags| u32 
  |  ioctl_handle | u32 // argument to ioctl to get 
  interrupts
  
  Is this a dynamic ioctl number or just a u32 parameter to an ioctl like
  VFIO_DEVICE_GET_IRQ_FD (ie. an instance number)?
 
 It's a parameter to VFIO_DEVICE_GET_IRQ_FD.
 
  +---+
  |   type = PCI_INFO |  
  |   rec_len |  
  |   flags = 0x0 |  
  |   dom:bus:dev:func| u32 // pci device info
  +---+
  |   type = REGION   |  
  |   rec_len |  
  |   flags = |  
  |   is_mmapable |  
  |   offset  | u64 // seek offset to region from
  |   |from beginning
  |   len | u64 // length of region
  |   addr| u64 // physical addr of region
  +---+
   \   

Re: RFC: vfio / device assignment -- layout of device fd files

2011-08-29 Thread Scott Wood
On 08/29/2011 05:46 PM, Alex Williamson wrote:
 On Mon, 2011-08-29 at 16:58 -0500, Scott Wood wrote:
 On 08/29/2011 02:51 PM, Alex Williamson wrote:
 On Mon, 2011-08-29 at 16:51 +, Yoder Stuart-B08248 wrote:
 The device info records following the file header have the following
 record types each with content encoded in a record specific way:

  REGION  - describes an addressable address range for the device
  DTPATH - describes the device tree path for the device
  DTINDEX - describes the index into the related device tree
property (reg,ranges,interrupts,interrupt-map)

 I don't quite understand if these are physical or virtual.

 If what are physical or virtual?
 
 Can you give an example of a path vs an index?  I don't understand
 enough about these to ask a useful question about what they're
 describing.

You'd have both path and index.

Example, for this tree:

/ {
...
foo {
...
bar {
reg = 0x1000 64 0x1800 64;
ranges = 0 0x2 0x1;
...

child {
reg = 0x100 0x100;
...
};
};
};
};

There would be 4 regions if you bind to /foo/bar:

// this is 64 bytes at 0x1000
DTPATH /foo/bar
DTINDEX prop_type=REG prop_index=0

// this is 64 bytes at 0x1800
DTPATH /foo/bar
DTINDEX prop_type=REG prop_index=1

// this is 16K at 0x2
DTPATH /foo/bar
DTINDEX prop_type=RANGES prop_index=0

// this is 256 bytes at 0x20100
DTPATH /foo/bar/child
DTINDEX prop_type=REG prop_index=0

Both ranges and the child reg are needed, since ranges could be a simple
ranges; that passes everything with no translation, and child nodes
could be absent-but-implied in some other cases (such as when they
represent PCI devices which can be probed -- we still need to map the
ranges that correspond to PCI controller windows).

  INTERRUPT - describes an interrupt for the device
  PCI_CONFIG_SPACE - describes config space for the device

 I would have expected this to be a REGION with a property of
 PCI_CONFIG_SPACE.

 Could be, if physical address is made optional.
 
 Or physical address is also a property, aka sub-region.

A subrecord of REGION is fine with me.

 Would we only need to expose phys addr for 1:1 mapping requirements?
 I'm not sure why we'd care to expose this otherwise.

 It's more important for non-PCI, where it avoids the need for userspace
 to parse the device tree to find the guest address (we'll usually want
 1:1), or to consolidate pages shared by multiple regions.  It could be
 nice for debugging, as well.
 
 So the device tree path is ripped straight from the system, so it's the
 actual 1:1, matching physical hardware, path.

Yes.

 Even for non-PCI we need to
 know if the region is pio/mmio32/mmio64/prefetchable/etc.

 Outside of PCI, what standardized form would you put such information
 in?  Where would the kernel get this information?  What does
 mmio32/mmio64 mean in this context?
 
 I could imagine a platform device described by ACPI that might want to
 differentiate.  The physical device doesn't get moved of course, but
 guest drivers might care how the device is described if we need to
 rebuild those ACPI tables.  ACPI might even be a good place to leverage
 these data structures... /me ducks.

ACPI info could be another subrecord type, but in the device tree
system-bus case we generally don't have this information at the generic
infrastructure level.  Drivers are expected to know how their devices'
regions should be mapped.

 BAR index could really just translate to a REGION instance number.

 How would that work if you make non-BAR things (such as config space)
 into regions?
 
 Put their instance numbers outside of the BAR region?  We have a fixed
 REGION space on PCI, so we could just define BAR0 == instance 0, BAR1 ==
 instance 1... ROM == instance 6, CONFIG == instance 0xF (or 7).

Seems more awkward than just having each region say what it is.  What do
you do to fill in the gaps?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio / device assignment -- layout of device fd files

2011-08-29 Thread Alex Williamson
On Mon, 2011-08-29 at 18:14 -0500, Scott Wood wrote:
 On 08/29/2011 05:46 PM, Alex Williamson wrote:
  On Mon, 2011-08-29 at 16:58 -0500, Scott Wood wrote:
  On 08/29/2011 02:51 PM, Alex Williamson wrote:
  On Mon, 2011-08-29 at 16:51 +, Yoder Stuart-B08248 wrote:
  The device info records following the file header have the following
  record types each with content encoded in a record specific way:
 
   REGION  - describes an addressable address range for the device
   DTPATH - describes the device tree path for the device
   DTINDEX - describes the index into the related device tree
 property (reg,ranges,interrupts,interrupt-map)
 
  I don't quite understand if these are physical or virtual.
 
  If what are physical or virtual?
  
  Can you give an example of a path vs an index?  I don't understand
  enough about these to ask a useful question about what they're
  describing.
 
 You'd have both path and index.
 
 Example, for this tree:
 
 / {
   ...
   foo {
   ...
   bar {
   reg = 0x1000 64 0x1800 64;
   ranges = 0 0x2 0x1;
   ...
 
   child {
   reg = 0x100 0x100;
   ...
   };
   };
   };
 };
 
 There would be 4 regions if you bind to /foo/bar:
 
 // this is 64 bytes at 0x1000
 DTPATH /foo/bar
 DTINDEX prop_type=REG prop_index=0
 
 // this is 64 bytes at 0x1800
 DTPATH /foo/bar
 DTINDEX prop_type=REG prop_index=1
 
 // this is 16K at 0x2
 DTPATH /foo/bar
 DTINDEX prop_type=RANGES prop_index=0
 
 // this is 256 bytes at 0x20100
 DTPATH /foo/bar/child
 DTINDEX prop_type=REG prop_index=0
 
 Both ranges and the child reg are needed, since ranges could be a simple
 ranges; that passes everything with no translation, and child nodes
 could be absent-but-implied in some other cases (such as when they
 represent PCI devices which can be probed -- we still need to map the
 ranges that correspond to PCI controller windows).

Thanks for the example.  Is it always the case that you need a path and
an index?  If so, why are they separate sub-regions instead of combined
into a DT_INFO sub-region?

   INTERRUPT - describes an interrupt for the device
   PCI_CONFIG_SPACE - describes config space for the device
 
  I would have expected this to be a REGION with a property of
  PCI_CONFIG_SPACE.
 
  Could be, if physical address is made optional.
  
  Or physical address is also a property, aka sub-region.
 
 A subrecord of REGION is fine with me.
 
  Would we only need to expose phys addr for 1:1 mapping requirements?
  I'm not sure why we'd care to expose this otherwise.
 
  It's more important for non-PCI, where it avoids the need for userspace
  to parse the device tree to find the guest address (we'll usually want
  1:1), or to consolidate pages shared by multiple regions.  It could be
  nice for debugging, as well.
  
  So the device tree path is ripped straight from the system, so it's the
  actual 1:1, matching physical hardware, path.
 
 Yes.
 
  Even for non-PCI we need to
  know if the region is pio/mmio32/mmio64/prefetchable/etc.
 
  Outside of PCI, what standardized form would you put such information
  in?  Where would the kernel get this information?  What does
  mmio32/mmio64 mean in this context?
  
  I could imagine a platform device described by ACPI that might want to
  differentiate.  The physical device doesn't get moved of course, but
  guest drivers might care how the device is described if we need to
  rebuild those ACPI tables.  ACPI might even be a good place to leverage
  these data structures... /me ducks.
 
 ACPI info could be another subrecord type, but in the device tree
 system-bus case we generally don't have this information at the generic
 infrastructure level.  Drivers are expected to know how their devices'
 regions should be mapped.

The device tree tells them how they're mapped, right?  Or maybe more
precisely, the device tree tells them where they're mapped and it
doesn't really matter whether they're 32bit or 64bit because they can't
be moved.

Maybe this is sub-region material.  It just feels wrong to enumerate a
region and not be able to include any basic properties beyond offset and
size in a common field.  For PCI, we can also describe the properties
via config space, so sub-regions could still be optional.

  BAR index could really just translate to a REGION instance number.
 
  How would that work if you make non-BAR things (such as config space)
  into regions?
  
  Put their instance numbers outside of the BAR region?  We have a fixed
  REGION space on PCI, so we could just define BAR0 == instance 0, BAR1 ==
  instance 1... ROM == instance 6, CONFIG == instance 0xF (or 7).
 
 Seems more awkward than just having each region say what it is.  What do
 you do to fill in the gaps?

You don't, instance numbers would just be non-contiguous.  The