RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-08 Thread Leonid Grossman


 -Original Message-
 From: Fischer, Anna [mailto:[EMAIL PROTECTED]
 Sent: Saturday, November 08, 2008 3:10 AM
 To: Greg KH; Yu Zhao
 Cc: Matthew Wilcox; Anthony Liguori; H L; [EMAIL PROTECTED];
 [EMAIL PROTECTED]; Chiang, Alexander;
[EMAIL PROTECTED];
 [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED];
 [EMAIL PROTECTED]; kvm@vger.kernel.org;
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; Leonid Grossman;
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
 


  But would such an api really take advantage of the new IOV
interfaces
  that are exposed by the new device type?
 
 I agree with what Yu says. The idea is to have hardware capabilities
to
 virtualize a PCI device in a way that those virtual devices can
represent
 full PCI devices. The advantage of that is that those virtual device
can
 then be used like any other standard PCI device, meaning we can use
 existing
 OS tools, configuration mechanism etc. to start working with them.
Also,
 when
 using a virtualization-based system, e.g. Xen or KVM, we do not need
 to introduce new mechanisms to make use of SR-IOV, because we can
handle
 VFs as full PCI devices.
 
 A virtual PCI device in hardware (a VF) can be as powerful or complex
as
 you like, or it can be very simple. But the big advantage of SR-IOV is
 that hardware presents a complete PCI device to the OS - as opposed to
 some resources, or queues, that need specific new configuration and
 assignment mechanisms in order to use them with a guest OS (like, for
 example, VMDq or similar technologies).
 
 Anna


Ditto. 
Taking netdev interface as an example - a queue pair is a great way to
scale across cpu cores in a single OS image, but it is just not a good
way to share device across multiple OS images. 
The best unit of virtualization is a VF that is implemented as a
complete netdev pci device (not a subset of a pci device).
 This way, native netdev device drivers can work for direct hw access to
a VF as is, and most/all Linux networking features (including VMQ)
will work in a guest.
Also, guest migration for netdev interfaces (both direct and virtual)
can be supported via native Linux mechanism (bonding driver), while Dom0
can retain veto power over any guest direct interface operation it
deems privileged (vlan, mac address, promisc mode, bandwidth allocation
between VFs, etc.).
 
Leonid
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Leonid Grossman


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf
Of
 Zhao, Yu
 Sent: Thursday, November 06, 2008 11:06 PM
 To: Chris Wright
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED];
 Matthew Wilcox; Greg KH; [EMAIL PROTECTED];
[EMAIL PROTECTED];
 [EMAIL PROTECTED]; [EMAIL PROTECTED];
 kvm@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
 
 Chris Wright wrote:
  * Greg KH ([EMAIL PROTECTED]) wrote:
  On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
  On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
  On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
  I have not modified any existing drivers, but instead I threw
 together
  a bare-bones module enabling me to make a call to
pci_iov_register()
  and then poke at an SR-IOV adapter's /sys entries for which no
 driver
  was loaded.
 
  It appears from my perusal thus far that drivers using these new
  SR-IOV patches will require modification; i.e. the driver
associated
  with the Physical Function (PF) will be required to make the
  pci_iov_register() call along with the requisite notify()
function.
  Essentially this suggests to me a model for the PF driver to
perform
  any global actions or setup on behalf of VFs before enabling
them
  after which VF drivers could be associated.
  Where would the VF drivers have to be associated?  On the
pci_dev
  level or on a higher one?
 
  Will all drivers that want to bind to a VF device need to be
  rewritten?
  The current model being implemented by my colleagues has separate
  drivers for the PF (aka native) and VF devices.  I don't
personally
  believe this is the correct path, but I'm reserving judgement
until I
  see some code.
  Hm, I would like to see that code before we can properly evaluate
this
  interface.  Especially as they are all tightly tied together.
 
  I don't think we really know what the One True Usage model is for
VF
  devices.  Chris Wright has some ideas, I have some ideas and Yu
Zhao
 has
  some ideas.  I bet there's other people who have other ideas too.
  I'd love to hear those ideas.
 
  First there's the question of how to represent the VF on the host.
  Ideally (IMO) this would show up as a normal interface so that
normal
 tools
  can configure the interface.  This is not exactly how the first
round of
  patches were designed.
 
 Whether the VF can show up as a normal interface is decided by VF
 driver. VF is represented by 'pci_dev' at PCI level, so VF driver can
be
 loaded as normal PCI device driver.
 
 What the software representation (eth, framebuffer, etc.) created by
VF
 driver is not controlled by SR-IOV framework.
 
 So you definitely can use normal tool to configure the VF if its
driver
 supports that :-)
 
 
  Second there's the question of reserving the BDF on the host such
that
  we don't have two drivers (one in the host and one in a guest)
trying to
  drive the same device (an issue that shows up for device assignment
as
  well as VF assignment).
 
 If we don't reserve BDF for the device, they can't work neither in the
 host nor the guest.
 
 Without BDF, we can't access the config space of the device, the
device
 also can't do DMA.
 
 Did I miss your point?
 
 
  Third there's the question of whether the VF can be used in the host
at
  all.
 
 Why can't? My VFs work well in the host as normal PCI devices :-)
 
 
  Fourth there's the question of whether the VF and PF drivers are the
  same or separate.
 
 As I mentioned in another email of this thread. We can't predict how
 hardware vendor creates their SR-IOV device. PCI SIG doesn't define
 device specific logics.
 
 So I think the answer of this question is up to the device driver
 developers. If PF and VF in a SR-IOV device have similar logics, then
 they can combine the driver. Otherwise, e.g., if PF doesn't have real
 functionality at all -- it only has registers to control internal
 resource allocation for VFs, then the drivers should be separate,
right?


Right, this really depends upon the functionality behind a VF. If VF is
done as a subset of netdev interface (for example, a queue pair), then a
split VF/PF driver model and a proprietary communication channel is in
order. 

If each VF is done as a complete netdev interface (like in our 10GbE IOV
controllers), then PF and VF drivers could be the same. Each VF can be
independently driven by such native netdev driver; this includes the
ability to run a native driver in a guest in passthru mode. 
A PF driver in a privileged domain doesn't even have to be present.

 
 
  The typical usecase is assigning the VF to the guest directly, so
  there's only enough functionality in the host side to allocate a VF,
  configure it, and assign it (and propagate AER).  This is with
separate
  PF and VF driver.
 
  As Anthony mentioned, we are interested in allowing the host to use
the
  VF.  This could be useful for containers as well as