Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-17 Thread Yu Zhao

Rusty Russell wrote:

On Friday 07 November 2008 18:17:54 Zhao, Yu wrote:
 Greg KH wrote:
  On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote:
  Greg KH wrote:
  On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
  I don't think we really know what the One True Usage model is for VF
  devices. Chris Wright has some ideas, I have some ideas and Yu Zhao
  has some ideas. I bet there's other people who have other ideas too.
 
  I'd love to hear those ideas.
 
  We've been talking about avoiding hardware passthrough entirely and
  just backing a virtio-net backend driver by a dedicated VF in the
  host. That avoids a huge amount of guest-facing complexity, let's
  migration Just Work, and should give the same level of performance.

 This can be commonly used not only with VF -- devices that have multiple
 DMA queues (e.g., Intel VMDq, Neterion Xframe) and even traditional
 devices can also take the advantage of this.

 CC Rusty Russel in case he has more comments.

Yes, even dumb devices could use this mechanism if you wanted to bind an 
entire device solely to one guest.


We don't have network infrastructure for this today, but my thought was 
to do something in dev_alloc_skb and dev_kfree_skb et al.


Is there any discussion about this on the netdev? Any prototype
available? If not, I'd like to create one and evaluate the performance
of virtio-net solution again the hardware passthrough.

Thanks,
Yu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-16 Thread Avi Kivity

Anthony Liguori wrote:
I don't think it's established that PV/VF will have less latency than 
using virtio-net.  virtio-net requires a world switch to send a group 
of packets.  The cost of this (if it stays in kernel) is only a few 
thousand cycles on the most modern processors.


Using VT-d means that for every DMA fetch that misses in the IOTLB, 
you potentially have to do four memory fetches to main memory.  There 
will be additional packet latency using VT-d compared to native, it's 
just not known how much at this time.


If the IOTLB has intermediate TLB entries like the processor, we're 
talking just one or two fetches.  That's a lot less than the cacheline 
bouncing that virtio and kvm interrupt injection incur right now.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-16 Thread Zhao, Yu

Avi Kivity wrote:

Anthony Liguori wrote:
I don't think it's established that PV/VF will have less latency than 
using virtio-net.  virtio-net requires a world switch to send a group 
of packets.  The cost of this (if it stays in kernel) is only a few 
thousand cycles on the most modern processors.


Using VT-d means that for every DMA fetch that misses in the IOTLB, 
you potentially have to do four memory fetches to main memory.  There 
will be additional packet latency using VT-d compared to native, it's 
just not known how much at this time.


If the IOTLB has intermediate TLB entries like the processor, we're 
talking just one or two fetches.  That's a lot less than the cacheline 
bouncing that virtio and kvm interrupt injection incur right now.




The PCI SIG Address Translation Service (ATS) specifies a way that uses 
an Address Translation Cache (ATC) in the Endpoint to reduce the latency.


The Linux kernel support for ATS capability will come soon.

Thanks,
Yu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-14 Thread Greg KH
On Fri, Nov 14, 2008 at 03:56:19PM +0800, Zhao, Yu wrote:
 Hi Greg KH,

 I updated PF driver to use latest SR-IOV API in the patch set v6, and 
 attached it. Please take a look and please let us know if you have any 
 comments.

Is this driver already upstream?  If so, can you just send the diff that
adds the SR-IOV changes to it?  Otherwise it's a bit hard to just pick
out those pieces, right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-14 Thread Rose, Gregory V
It's not upstream yet.  However, if you grep through for CONFIG_PCI_IOV you'll 
see all the relevant code in those sections.

- Greg (Rose that is)

-Original Message-
From: Greg KH [mailto:[EMAIL PROTECTED]
Sent: Friday, November 14, 2008 9:40 AM
To: Zhao, Yu
Cc: Rose, Gregory V; Dong, Eddie; kvm@vger.kernel.org; Barnes, Jesse; Ronciak, 
John; Nakajima, Jun; Yu, Wilfred; Li, Xin B; Li, Susie
Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

On Fri, Nov 14, 2008 at 03:56:19PM +0800, Zhao, Yu wrote:
 Hi Greg KH,

 I updated PF driver to use latest SR-IOV API in the patch set v6, and
 attached it. Please take a look and please let us know if you have any
 comments.

Is this driver already upstream?  If so, can you just send the diff that
adds the SR-IOV changes to it?  Otherwise it's a bit hard to just pick
out those pieces, right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-14 Thread Greg KH
On Fri, Nov 14, 2008 at 09:48:15AM -0800, Rose, Gregory V wrote:
 It's not upstream yet.  However, if you grep through for
 CONFIG_PCI_IOV you'll see all the relevant code in those sections.

Wouldn't it make more sense for the IOV code to be reworked to not
require #ifdefs in a driver?  There seems to be a bit too much #ifdef
code in this driver right now :(

What is the status of submitting it upstream and getting netdev review
of it?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-14 Thread Ronciak, John
We have been waiting for the kernel IOV work to be in place upstream completely 
before we submitted the drivers.  Jeff Garzik won't take driver changes that 
have no user.  So as the kernel work completes, we'll submit the driver(s).

We have been talking about putting out the changes as RFC.  If that make sense 
we can do that.

Cheers,
John
---
Those who would give up essential Liberty, to purchase a little temporary 
Safety, deserve neither Liberty nor Safety., Benjamin Franklin 1755 
 

-Original Message-
From: Greg KH [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 14, 2008 10:39 AM
To: Rose, Gregory V
Cc: Zhao, Yu; Dong, Eddie; kvm@vger.kernel.org; Barnes, Jesse; 
Ronciak, John; Nakajima, Jun; Yu, Wilfred; Li, Xin B; Li, Susie
Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

On Fri, Nov 14, 2008 at 09:48:15AM -0800, Rose, Gregory V wrote:
 It's not upstream yet.  However, if you grep through for
 CONFIG_PCI_IOV you'll see all the relevant code in those sections.

Wouldn't it make more sense for the IOV code to be reworked to not
require #ifdefs in a driver?  There seems to be a bit too much #ifdef
code in this driver right now :(

What is the status of submitting it upstream and getting netdev review
of it?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-14 Thread Greg KH
A: No.
Q: Should I include quotations after my reply?

On Fri, Nov 14, 2008 at 11:49:52AM -0700, Ronciak, John wrote:
 We have been waiting for the kernel IOV work to be in place upstream
 completely before we submitted the drivers.  Jeff Garzik won't take
 driver changes that have no user.  So as the kernel work completes,
 we'll submit the driver(s).
 
 We have been talking about putting out the changes as RFC.  If that
 make sense we can do that.

That would make sense, as I had to ask multiple times if a driver was
actually using the IOV code that we could review to see if the api was
sane for it.

good luck,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-13 Thread Yu Zhao
On Sat, Nov 08, 2008 at 02:48:25AM +0800, Greg KH wrote:
 On Fri, Nov 07, 2008 at 11:17:40PM +0800, Yu Zhao wrote:
  While we are arguing what the software model the SR-IOV should be, let me 
  ask two simple questions first:
 
  1, What does the SR-IOV looks like?
  2, Why do we need to support it?
 
 I don't think we need to worry about those questions, as we can see what
 the SR-IOV interface looks like by looking at the PCI spec, and we know
 Linux needs to support it, as Linux needs to support everything :)
 
 (note, community members that can not see the PCI specs at this point in
 time, please know that we are working on resolving these issues,
 hopefully we will have some good news within a month or so.)

Thanks for doing this!

 
  As you know the Linux kernel is the base of various virtual machine 
  monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support in 
  the kernel because mostly it helps high-end users (IT departments, HPC, 
  etc.) to share limited hardware resources among hundreds or even thousands 
  virtual machines and hence reduce the cost. How can we make these virtual 
  machine monitors utilize the advantage of SR-IOV without spending too much 
  effort meanwhile remaining architectural correctness? I believe making VF 
  represent as much closer as a normal PCI device (struct pci_dev) is the 
  best way in current situation, because this is not only what the hardware 
  designers expect us to do but also the usage model that KVM, Xen and other 
  VMMs have already supported.
 
 But would such an api really take advantage of the new IOV interfaces
 that are exposed by the new device type?

The SR-IOV is a very straightforward capability -- it can only reside in
the Physical Function's (the real device) config space and controls the
allocation of the Virtual Function by several registers. What we can do
in the PCI layer is to make the SR-IOV device spawn VF upon user request,
and register VF to the PCI core. The functionality of SR-IOV device (both
the PF and VF) can vary at a large range and their drivers (same as normal
PCI device driver) are responsible for handling device specific stuff.

So it looks like we can get all work done in the PCI layer with only two
interfaces: one for the PF driver to register itself as a SR-IOV capable
driver, expose the sysfs (or ioctl) interface to receive user request, and
allocate 'pci_dev' for VF; another one to cleanup all stuff when the PF
driver unregisters itself (e.g., the driver is removed or the device is
going to power-saving mode.).

 
  I agree that API in the SR-IOV pacth is arguable and the concerns such as 
  lack of PF driver, etc. are also valid. But I personally think these stuff 
  are not essential problems to me and other SR-IOV driver developers.
 
 How can the lack of a PF driver not be a valid concern at this point in
 time?  Without such a driver written, how can we know that the SR-IOV
 interface as created is sufficient, or that it even works properly?
 
 Here's what I see we need to have before we can evaluate if the IOV core
 PCI patches are acceptable:
   - a driver that uses this interface
   - a PF driver that uses this interface.
 
 Without those, we can't determine if the infrastructure provided by the
 IOV core even is sufficient, right?

Yes, using a PF driver to evaluate the SR-IOV core is necessary. And only
the PF driver can use the interface since the VF shouldn't have the SR-IOV
capability in its config space according to the spec.

Regards,
Yu

 Rumor has it that there is both of the above things floating around, can
 someone please post them to the linux-pci list so that we can see how
 this all works together?
 
 thanks,
 
 greg k-h
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-12 Thread Anthony Liguori

Andi Kleen wrote:

Anthony Liguori [EMAIL PROTECTED] writes:

What we would rather do in KVM, is have the VFs appear in the host as
standard network devices.  We would then like to back our existing PV
driver to this VF directly bypassing the host networking stack.  A key
feature here is being able to fill the VF's receive queue with guest
memory instead of host kernel memory so that you can get zero-copy
receive traffic.  This will perform just as well as doing passthrough
(at least) and avoid all that ugliness of dealing with SR-IOV in the
guest.


But you shift a lot of ugliness into the host network stack again.
Not sure that is a good trade off.

Also it would always require context switches and I believe one
of the reasons for the PV/VF model is very low latency IO and having
heavyweight switches to the host and back would be against that.


I don't think it's established that PV/VF will have less latency than 
using virtio-net.  virtio-net requires a world switch to send a group of 
packets.  The cost of this (if it stays in kernel) is only a few 
thousand cycles on the most modern processors.


Using VT-d means that for every DMA fetch that misses in the IOTLB, you 
potentially have to do four memory fetches to main memory.  There will 
be additional packet latency using VT-d compared to native, it's just 
not known how much at this time.


Regards,

Anthony Liguori



-Andi



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-10 Thread Greg KH
On Sun, Nov 09, 2008 at 09:37:20PM +0200, Avi Kivity wrote:
 Greg KH wrote:
 On Sun, Nov 09, 2008 at 02:44:06PM +0200, Avi Kivity wrote:
   
 Greg KH wrote:
 
 It's that second part that I'm worried about.  How is that going to
 happen?  Do you have any patches that show this kind of assignment?

 
 For kvm, this is in 2.6.28-rc.
 

 Where?  I just looked and couldn't find anything, but odds are I was
 looking in the wrong place :(

   

 arch/x86/kvm/vtd.c: iommu integration (allows assigning the device's memory 
 resources)

That file is not in 2.6.28-rc4 :(


 virt/kvm/irq*: interrupt redirection (allows assigning the device's 
 interrupt resources)

I only see virt/kvm/irq_comm.c in 2.6.28-rc4.

 the rest (pci config space, pio redirection) are in userspace.

So you don't need these pci core changes at all?

 Note there are two ways to assign a device to a guest:

 - run the VF driver in the guest: this has the advantage of best 
 performance, but requires pinning all guest memory, makes live migration 
 a tricky proposition, and ties the guest to the underlying hardware.

 Is this what you would prefer for kvm?


 It's not my personal preference, but it is a supported configuration.  For 
 some use cases it is the only one that makes sense.

 Again, VF-in-guest and VF-in-host both have their places.  And since Linux 
 can be both guest and host, it's best if the VF driver knows nothing about 
 SR-IOV; it's just a pci driver.  The PF driver should emulate anything that 
 SR-IOV does not provide (like missing pci config space).

Yes, we need both.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-09 Thread Avi Kivity

Greg KH wrote:

It's that second part that I'm worried about.  How is that going to
happen?  Do you have any patches that show this kind of assignment?

  


For kvm, this is in 2.6.28-rc.

Note there are two ways to assign a device to a guest:

- run the VF driver in the guest: this has the advantage of best 
performance, but requires pinning all guest memory, makes live migration 
a tricky proposition, and ties the guest to the underlying hardware.
- run the VF driver in the host, and use virtio to connect the guest to 
the host: allows paging the guest and allows straightforward live 
migration, but reduces performance, and hides any features not exposed 
by virtio from the guest.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-09 Thread Avi Kivity

Muli Ben-Yehuda wrote:

We've been talking about avoiding hardware passthrough entirely and
just backing a virtio-net backend driver by a dedicated VF in the
host.  That avoids a huge amount of guest-facing complexity, let's
migration Just Work, and should give the same level of performance.



I don't believe that it will, and every benchmark I've seen or have
done so far shows a significant performance gap between virtio and
direct assignment, even on 1G ethernet. I am willing however to
reserve judgement until someone implements your suggestion and
actually measures it, preferably on 10G ethernet.
  


Right now virtio copies data, and has other inefficiencies.  With a 
dedicated VF, we can eliminate the copies.


CPU utilization and latency will be worse.  If we can limit the 
slowdowns to an acceptable amount, the simplicity and other advantages 
of VF-in-host may outweigh the performance degradation.



No doubt device assignment---and SR-IOV in particular---are complex,
but I hardly think ignoring it as you seem to propose is the right
approach.


I agree.  We should hedge our bets and support both models.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-09 Thread Avi Kivity

Andi Kleen wrote:

Anthony Liguori [EMAIL PROTECTED] writes:
  

What we would rather do in KVM, is have the VFs appear in the host as
standard network devices.  We would then like to back our existing PV
driver to this VF directly bypassing the host networking stack.  A key
feature here is being able to fill the VF's receive queue with guest
memory instead of host kernel memory so that you can get zero-copy
receive traffic.  This will perform just as well as doing passthrough
(at least) and avoid all that ugliness of dealing with SR-IOV in the
guest.



But you shift a lot of ugliness into the host network stack again.
Not sure that is a good trade off.
  


The net effect will be positive.  We will finally have aio networking 
from userspace (can send process memory without resorting to 
sendfile()), and we'll be able to assign a queue to a process (which 
will enable all sorts of interesting high performance things; basically 
VJ channels without kernel involvement).



Also it would always require context switches and I believe one
of the reasons for the PV/VF model is very low latency IO and having
heavyweight switches to the host and back would be against that.
  


It's true that latency would suffer (or alternatively cpu consumption 
would increase).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-09 Thread Avi Kivity

Greg KH wrote:

We've been talking about avoiding hardware passthrough entirely and
just backing a virtio-net backend driver by a dedicated VF in the
host.  That avoids a huge amount of guest-facing complexity, let's
migration Just Work, and should give the same level of performance.



Does that involve this patch set?  Or a different type of interface.
  


So long as the VF is exposed as a standalone PCI device, it's the same 
interface.  In fact you can take a random PCI card and expose it to a 
guest this way; it doesn't have to be SR-IOV.  Of course, with a 
standard PCI card you won't get much sharing (a quad port NIC will be 
good for four guests).


We'll need other changes in the network stack, but these are orthogonal 
to SR-IOV.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-09 Thread Avi Kivity

Matthew Wilcox wrote:
What we would rather do in KVM, is have the VFs appear in the host as 
standard network devices.  We would then like to back our existing PV 
driver to this VF directly bypassing the host networking stack.  A key 
feature here is being able to fill the VF's receive queue with guest 
memory instead of host kernel memory so that you can get zero-copy 
receive traffic.  This will perform just as well as doing passthrough 
(at least) and avoid all that ugliness of dealing with SR-IOV in the guest.



This argues for ignoring the SR-IOV mess completely.


It does, but VF-in-host is not the only model that we want to support.  
It's just the most appealing.


There will definitely be people who want to run VF-in-guest.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-09 Thread Greg KH
On Sun, Nov 09, 2008 at 02:44:06PM +0200, Avi Kivity wrote:
 Greg KH wrote:
 It's that second part that I'm worried about.  How is that going to
 happen?  Do you have any patches that show this kind of assignment?

   

 For kvm, this is in 2.6.28-rc.

Where?  I just looked and couldn't find anything, but odds are I was
looking in the wrong place :(

 Note there are two ways to assign a device to a guest:

 - run the VF driver in the guest: this has the advantage of best 
 performance, but requires pinning all guest memory, makes live migration a 
 tricky proposition, and ties the guest to the underlying hardware.

Is this what you would prefer for kvm?

 - run the VF driver in the host, and use virtio to connect the guest to the 
 host: allows paging the guest and allows straightforward live migration, 
 but reduces performance, and hides any features not exposed by virtio from 
 the guest.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-09 Thread Avi Kivity

Greg KH wrote:

On Sun, Nov 09, 2008 at 02:44:06PM +0200, Avi Kivity wrote:
  

Greg KH wrote:


It's that second part that I'm worried about.  How is that going to
happen?  Do you have any patches that show this kind of assignment?

  
  

For kvm, this is in 2.6.28-rc.



Where?  I just looked and couldn't find anything, but odds are I was
looking in the wrong place :(

  


arch/x86/kvm/vtd.c: iommu integration (allows assigning the device's 
memory resources)
virt/kvm/irq*: interrupt redirection (allows assigning the device's 
interrupt resources)


the rest (pci config space, pio redirection) are in userspace.


Note there are two ways to assign a device to a guest:

- run the VF driver in the guest: this has the advantage of best 
performance, but requires pinning all guest memory, makes live migration a 
tricky proposition, and ties the guest to the underlying hardware.



Is this what you would prefer for kvm?

  


It's not my personal preference, but it is a supported configuration.  
For some use cases it is the only one that makes sense.


Again, VF-in-guest and VF-in-host both have their places.  And since 
Linux can be both guest and host, it's best if the VF driver knows 
nothing about SR-IOV; it's just a pci driver.  The PF driver should 
emulate anything that SR-IOV does not provide (like missing pci config 
space).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-08 Thread Fischer, Anna
 Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
 Importance: High

 On Fri, Nov 07, 2008 at 11:17:40PM +0800, Yu Zhao wrote:
  While we are arguing what the software model the SR-IOV should be,
 let me
  ask two simple questions first:
 
  1, What does the SR-IOV looks like?
  2, Why do we need to support it?

 I don't think we need to worry about those questions, as we can see
 what
 the SR-IOV interface looks like by looking at the PCI spec, and we know
 Linux needs to support it, as Linux needs to support everything :)

 (note, community members that can not see the PCI specs at this point
 in
 time, please know that we are working on resolving these issues,
 hopefully we will have some good news within a month or so.)

  As you know the Linux kernel is the base of various virtual machine
  monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support
 in
  the kernel because mostly it helps high-end users (IT departments,
 HPC,
  etc.) to share limited hardware resources among hundreds or even
 thousands
  virtual machines and hence reduce the cost. How can we make these
 virtual
  machine monitors utilize the advantage of SR-IOV without spending too
 much
  effort meanwhile remaining architectural correctness? I believe
 making VF
  represent as much closer as a normal PCI device (struct pci_dev) is
 the
  best way in current situation, because this is not only what the
 hardware
  designers expect us to do but also the usage model that KVM, Xen and
 other
  VMMs have already supported.

 But would such an api really take advantage of the new IOV interfaces
 that are exposed by the new device type?

I agree with what Yu says. The idea is to have hardware capabilities to
virtualize a PCI device in a way that those virtual devices can represent
full PCI devices. The advantage of that is that those virtual device can
then be used like any other standard PCI device, meaning we can use existing
OS tools, configuration mechanism etc. to start working with them. Also, when
using a virtualization-based system, e.g. Xen or KVM, we do not need
to introduce new mechanisms to make use of SR-IOV, because we can handle
VFs as full PCI devices.

A virtual PCI device in hardware (a VF) can be as powerful or complex as
you like, or it can be very simple. But the big advantage of SR-IOV is
that hardware presents a complete PCI device to the OS - as opposed to
some resources, or queues, that need specific new configuration and
assignment mechanisms in order to use them with a guest OS (like, for
example, VMDq or similar technologies).

Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-08 Thread Leonid Grossman


 -Original Message-
 From: Fischer, Anna [mailto:[EMAIL PROTECTED]
 Sent: Saturday, November 08, 2008 3:10 AM
 To: Greg KH; Yu Zhao
 Cc: Matthew Wilcox; Anthony Liguori; H L; [EMAIL PROTECTED];
 [EMAIL PROTECTED]; Chiang, Alexander;
[EMAIL PROTECTED];
 [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED];
 [EMAIL PROTECTED]; kvm@vger.kernel.org;
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; Leonid Grossman;
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
 


  But would such an api really take advantage of the new IOV
interfaces
  that are exposed by the new device type?
 
 I agree with what Yu says. The idea is to have hardware capabilities
to
 virtualize a PCI device in a way that those virtual devices can
represent
 full PCI devices. The advantage of that is that those virtual device
can
 then be used like any other standard PCI device, meaning we can use
 existing
 OS tools, configuration mechanism etc. to start working with them.
Also,
 when
 using a virtualization-based system, e.g. Xen or KVM, we do not need
 to introduce new mechanisms to make use of SR-IOV, because we can
handle
 VFs as full PCI devices.
 
 A virtual PCI device in hardware (a VF) can be as powerful or complex
as
 you like, or it can be very simple. But the big advantage of SR-IOV is
 that hardware presents a complete PCI device to the OS - as opposed to
 some resources, or queues, that need specific new configuration and
 assignment mechanisms in order to use them with a guest OS (like, for
 example, VMDq or similar technologies).
 
 Anna


Ditto. 
Taking netdev interface as an example - a queue pair is a great way to
scale across cpu cores in a single OS image, but it is just not a good
way to share device across multiple OS images. 
The best unit of virtualization is a VF that is implemented as a
complete netdev pci device (not a subset of a pci device).
 This way, native netdev device drivers can work for direct hw access to
a VF as is, and most/all Linux networking features (including VMQ)
will work in a guest.
Also, guest migration for netdev interfaces (both direct and virtual)
can be supported via native Linux mechanism (bonding driver), while Dom0
can retain veto power over any guest direct interface operation it
deems privileged (vlan, mac address, promisc mode, bandwidth allocation
between VFs, etc.).
 
Leonid
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-08 Thread Muli Ben-Yehuda
On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote:

 We've been talking about avoiding hardware passthrough entirely and
 just backing a virtio-net backend driver by a dedicated VF in the
 host.  That avoids a huge amount of guest-facing complexity, let's
 migration Just Work, and should give the same level of performance.

I don't believe that it will, and every benchmark I've seen or have
done so far shows a significant performance gap between virtio and
direct assignment, even on 1G ethernet. I am willing however to
reserve judgement until someone implements your suggestion and
actually measures it, preferably on 10G ethernet.

No doubt device assignment---and SR-IOV in particular---are complex,
but I hardly think ignoring it as you seem to propose is the right
approach.

Cheers,
Muli
-- 
The First Workshop on I/O Virtualization (WIOV '08)
Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/
   -
SYSTOR 2009---The Israeli Experimental Systems Conference
http://www.haifa.il.ibm.com/conferences/systor2009/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-07 Thread Andi Kleen
Anthony Liguori [EMAIL PROTECTED] writes:

 What we would rather do in KVM, is have the VFs appear in the host as
 standard network devices.  We would then like to back our existing PV
 driver to this VF directly bypassing the host networking stack.  A key
 feature here is being able to fill the VF's receive queue with guest
 memory instead of host kernel memory so that you can get zero-copy
 receive traffic.  This will perform just as well as doing passthrough
 (at least) and avoid all that ugliness of dealing with SR-IOV in the
 guest.

But you shift a lot of ugliness into the host network stack again.
Not sure that is a good trade off.

Also it would always require context switches and I believe one
of the reasons for the PV/VF model is very low latency IO and having
heavyweight switches to the host and back would be against that.

-Andi

-- 
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-07 Thread Yu Zhao
While we are arguing what the software model the SR-IOV should be, let 
me ask two simple questions first:


1, What does the SR-IOV looks like?
2, Why do we need to support it?

I'm sure people have different understandings from their own view 
points. No one is wrong, but, please don't make thing complicated and 
don't ignore user requirements.


PCI SIG and hardware vendors create such thing intending to make 
hardware resource in one PCI device be shared from different software 
instances -- I guess all of us agree with this. No doubt PF is real 
function in the PCI device, but VF is different? No, it also has its own 
Bus, Device and Function numbers, and PCI configuration space and Memory 
Space (MMIO). To be more detailed, it can response to and initiate PCI 
Transaction Layer Protocol packets, which means it can do everything a 
PF can in PCI level. From these obvious behaviors, we can conclude PCI 
SIG model VF as a normal PCI device function, even it's not standalone.


As you know the Linux kernel is the base of various virtual machine 
monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support in 
the kernel because mostly it helps high-end users (IT departments, HPC, 
etc.) to share limited hardware resources among hundreds or even 
thousands virtual machines and hence reduce the cost. How can we make 
these virtual machine monitors utilize the advantage of SR-IOV without 
spending too much effort meanwhile remaining architectural correctness? 
I believe making VF represent as much closer as a normal PCI device 
(struct pci_dev) is the best way in current situation, because this is 
not only what the hardware designers expect us to do but also the usage 
model that KVM, Xen and other VMMs have already supported.


I agree that API in the SR-IOV pacth is arguable and the concerns such 
as lack of PF driver, etc. are also valid. But I personally think these 
stuff are not essential problems to me and other SR-IOV driver 
developers. People can refine things but don't want to recreate things 
in another totally different way especially that way doesn't bring them 
obvious benefits.


As I can see that we are now reaching a point that a decision must be 
made, I know this is such difficult thing in an open and free community 
but fortunately we have a lot of talented and experienced people here. 
So let's make it happen, and keep our loyal users happy!


Thanks,
Yu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-07 Thread Yu Zhao

Anthony Liguori wrote:

Matthew Wilcox wrote:

[Anna, can you fix your word-wrapping please?  Your lines appear to be
infinitely long which is most unpleasant to reply to]

On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote:
 

Where would the VF drivers have to be associated?  On the pci_dev
level or on a higher one?
  

A VF appears to the Linux OS as a standard (full, additional) PCI
device. The driver is associated in the same way as for a normal PCI
device. Ideally, you would use SR-IOV devices on a virtualized system,
for example, using Xen. A VF can then be assigned to a guest domain as
a full PCI device.



It's not clear thats the right solution.  If the VF devices are _only_
going to be used by the guest, then arguably, we don't want to create
pci_devs for them in the host.  (I think it _is_ the right answer, but I
want to make it clear there's multiple opinions on this).
  


The VFs shouldn't be limited to being used by the guest.


Yes, VF driver running in the host is supported :-)



SR-IOV is actually an incredibly painful thing.  You need to have a VF 
driver in the guest, do hardware pass through, have a PV driver stub in 
the guest that's hypervisor specific (a VF is not usable on it's own), 
have a device specific backend in the VMM, and if you want to do live 
migration, have another PV driver in the guest that you can do teaming 
with.  Just a mess.


Actually not so mess. VF driver can be a plain PCI device driver and 
doesn't require any backend in the VMM, or hypervisor specific 
knowledge, if the hardware is properly designed. In this case PF driver 
controls hardware resource allocation for VFs and VF driver can work 
without any communication to PF driver or VMM.




What we would rather do in KVM, is have the VFs appear in the host as 
standard network devices.  We would then like to back our existing PV 
driver to this VF directly bypassing the host networking stack.  A key 
feature here is being able to fill the VF's receive queue with guest 
memory instead of host kernel memory so that you can get zero-copy 
receive traffic.  This will perform just as well as doing passthrough 
(at least) and avoid all that ugliness of dealing with SR-IOV in the guest.


If the hardware supports both SR-IOV and IOMMU, I wouldn't suggest 
people to do so, because they will get better performance by directly 
assigning VF to the guest.


However, lots of low-end machines don't have SR-IOV and IOMMU support. 
They may have multi queue NIC, which uses built-in L2 switch to dispense 
packets to different DMA queue according to MAC address. They definitely 
can benefit a lot if there is software support for the DMA queue hooking 
virtio-net backend as you suggested.




This eliminates all of the mess of various drivers in the guest and all 
the associated baggage of doing hardware passthrough.


So IMHO, having VFs be usable in the host is absolutely critical because 
I think it's the only reasonable usage model.


Please don't worry, we have take this usage model as well as container 
model into account when designing SR-IOV framework for the kernel.




Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe linux-pci in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-07 Thread Greg KH
On Fri, Nov 07, 2008 at 11:17:40PM +0800, Yu Zhao wrote:
 While we are arguing what the software model the SR-IOV should be, let me 
 ask two simple questions first:

 1, What does the SR-IOV looks like?
 2, Why do we need to support it?

I don't think we need to worry about those questions, as we can see what
the SR-IOV interface looks like by looking at the PCI spec, and we know
Linux needs to support it, as Linux needs to support everything :)

(note, community members that can not see the PCI specs at this point in
time, please know that we are working on resolving these issues,
hopefully we will have some good news within a month or so.)

 As you know the Linux kernel is the base of various virtual machine 
 monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support in 
 the kernel because mostly it helps high-end users (IT departments, HPC, 
 etc.) to share limited hardware resources among hundreds or even thousands 
 virtual machines and hence reduce the cost. How can we make these virtual 
 machine monitors utilize the advantage of SR-IOV without spending too much 
 effort meanwhile remaining architectural correctness? I believe making VF 
 represent as much closer as a normal PCI device (struct pci_dev) is the 
 best way in current situation, because this is not only what the hardware 
 designers expect us to do but also the usage model that KVM, Xen and other 
 VMMs have already supported.

But would such an api really take advantage of the new IOV interfaces
that are exposed by the new device type?

 I agree that API in the SR-IOV pacth is arguable and the concerns such as 
 lack of PF driver, etc. are also valid. But I personally think these stuff 
 are not essential problems to me and other SR-IOV driver developers.

How can the lack of a PF driver not be a valid concern at this point in
time?  Without such a driver written, how can we know that the SR-IOV
interface as created is sufficient, or that it even works properly?

Here's what I see we need to have before we can evaluate if the IOV core
PCI patches are acceptable:
  - a driver that uses this interface
  - a PF driver that uses this interface.

Without those, we can't determine if the infrastructure provided by the
IOV core even is sufficient, right?

Rumor has it that there is both of the above things floating around, can
someone please post them to the linux-pci list so that we can see how
this all works together?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread H L

Greetings (from a new lurker to the list),

To your question Greg, yes and sort of ;-).  I have started taking a look 
at these patches with a strong interest in understanding how they work.  I've 
built a kernel with them and tried out a few things with real SR-IOV hardware.

--
Lance Hartmann




--- On Wed, 11/5/08, Greg KH [EMAIL PROTECTED] wrote:
 
 Is there any actual users of this API around yet?  How was
 it tested as
 there is no hardware to test on?  Which drivers are going
 to have to be
 rewritten to take advantage of this new interface?
 
 thanks,
 
 greg k-h
 ___
 Virtualization mailing list
 [EMAIL PROTECTED]
 https://lists.linux-foundation.org/mailman/listinfo/virtualization


  

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Greg KH
On Thu, Nov 06, 2008 at 07:40:12AM -0800, H L wrote:
 
 Greetings (from a new lurker to the list),

Welcome!

 To your question Greg, yes and sort of ;-).  I have started taking
 a look at these patches with a strong interest in understanding how
 they work.  I've built a kernel with them and tried out a few things
 with real SR-IOV hardware.

Did you have to modify individual drivers to take advantage of this
code?  It looks like the core code will run on this type of hardware,
but there seems to be no real advantage until a driver is modified to
use it, right?

Or am I missing some great advantage to having this code without
modified drivers?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread H L
I have not modified any existing drivers, but instead I threw together a 
bare-bones module enabling me to make a call to pci_iov_register() and then 
poke at an SR-IOV adapter's /sys entries for which no driver was loaded.

It appears from my perusal thus far that drivers using these new SR-IOV patches 
will require modification; i.e. the driver associated with the Physical 
Function (PF) will be required to make the pci_iov_register() call along with 
the requisite notify() function.  Essentially this suggests to me a model for 
the PF driver to perform any global actions or setup on behalf of VFs before 
enabling them after which VF drivers could be associated.

I have so far only seen Yu Zhao's 7-patch set.  I've not yet looked at his 
subsequently tendered 15-patch set so I don't know what has changed.The 
hardware/firmware implementation for any given SR-IOV compatible device, will 
determine the extent of differences required between a PF driver and a VF 
driver.

--
Lance Hartmann


--- On Thu, 11/6/08, Greg KH [EMAIL PROTECTED] wrote:


 Date: Thursday, November 6, 2008, 9:43 AM
 On Thu, Nov 06, 2008 at 07:40:12AM -0800, H L wrote:
  
  Greetings (from a new lurker to the list),
 
 Welcome!
 
  To your question Greg, yes and sort
 of ;-).  I have started taking
  a look at these patches with a strong interest in
 understanding how
  they work.  I've built a kernel with them and
 tried out a few things
  with real SR-IOV hardware.
 
 Did you have to modify individual drivers to take advantage
 of this
 code?  It looks like the core code will run on this type of
 hardware,
 but there seems to be no real advantage until a driver is
 modified to
 use it, right?
 
 Or am I missing some great advantage to having this code
 without
 modified drivers?
 
 thanks,
 
 greg k-h


  

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Greg KH

A: No.
Q: Should I include quotations after my reply?

On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
 I have not modified any existing drivers, but instead I threw together
 a bare-bones module enabling me to make a call to pci_iov_register()
 and then poke at an SR-IOV adapter's /sys entries for which no driver
 was loaded.
 
 It appears from my perusal thus far that drivers using these new
 SR-IOV patches will require modification; i.e. the driver associated
 with the Physical Function (PF) will be required to make the
 pci_iov_register() call along with the requisite notify() function.
 Essentially this suggests to me a model for the PF driver to perform
 any global actions or setup on behalf of VFs before enabling them
 after which VF drivers could be associated.

Where would the VF drivers have to be associated?  On the pci_dev
level or on a higher one?

Will all drivers that want to bind to a VF device need to be
rewritten?

 I have so far only seen Yu Zhao's 7-patch set.  I've not yet looked
 at his subsequently tendered 15-patch set so I don't know what has
 changed.The hardware/firmware implementation for any given SR-IOV
 compatible device, will determine the extent of differences required
 between a PF driver and a VF driver.

Yeah, that's what I'm worried/curious about.  Without seeing the code
for such a driver, how can we properly evaluate if this infrastructure
is the correct one and the proper way to do all of this?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Fischer, Anna
 On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
  I have not modified any existing drivers, but instead I threw
 together
  a bare-bones module enabling me to make a call to pci_iov_register()
  and then poke at an SR-IOV adapter's /sys entries for which no driver
  was loaded.
 
  It appears from my perusal thus far that drivers using these new
  SR-IOV patches will require modification; i.e. the driver associated
  with the Physical Function (PF) will be required to make the
  pci_iov_register() call along with the requisite notify() function.
  Essentially this suggests to me a model for the PF driver to perform
  any global actions or setup on behalf of VFs before enabling them
  after which VF drivers could be associated.

 Where would the VF drivers have to be associated?  On the pci_dev
 level or on a higher one?

A VF appears to the Linux OS as a standard (full, additional) PCI device. The 
driver is associated in the same way as for a normal PCI device. Ideally, you 
would use SR-IOV devices on a virtualized system, for example, using Xen. A VF 
can then be assigned to a guest domain as a full PCI device.

 Will all drivers that want to bind to a VF device need to be
 rewritten?

Currently, any vendor providing a SR-IOV device needs to provide a PF driver 
and a VF driver that runs on their hardware. A VF driver does not necessarily 
need to know much about SR-IOV but just run on the presented PCI device. You 
might want to have a communication channel between PF and VF driver though, for 
various reasons, if such a channel is not provided in hardware.

  I have so far only seen Yu Zhao's 7-patch set.  I've not yet looked
  at his subsequently tendered 15-patch set so I don't know what has
  changed.The hardware/firmware implementation for any given SR-IOV
  compatible device, will determine the extent of differences required
  between a PF driver and a VF driver.

 Yeah, that's what I'm worried/curious about.  Without seeing the code
 for such a driver, how can we properly evaluate if this infrastructure
 is the correct one and the proper way to do all of this?

Yu's API allows a PF driver to register with the Linux PCI code and use it to 
activate VFs and allocate their resources. The PF driver needs to be modified 
to work with that API. While you can argue about how that API is supposed to 
look like, it is clear that such an API is required in some form. The PF driver 
needs to know when VFs are active as it might want to allocate further 
(device-specific) resources to VFs or initiate further (device-specific) 
configurations. While probably a lot of SR-IOV specific code has to be in the 
PF driver, there is also support required from the Linux PCI subsystem, which 
is to some extend provided by Yu's patches.

Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Matthew Wilcox
On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
 On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
  I have not modified any existing drivers, but instead I threw together
  a bare-bones module enabling me to make a call to pci_iov_register()
  and then poke at an SR-IOV adapter's /sys entries for which no driver
  was loaded.
  
  It appears from my perusal thus far that drivers using these new
  SR-IOV patches will require modification; i.e. the driver associated
  with the Physical Function (PF) will be required to make the
  pci_iov_register() call along with the requisite notify() function.
  Essentially this suggests to me a model for the PF driver to perform
  any global actions or setup on behalf of VFs before enabling them
  after which VF drivers could be associated.
 
 Where would the VF drivers have to be associated?  On the pci_dev
 level or on a higher one?
 
 Will all drivers that want to bind to a VF device need to be
 rewritten?

The current model being implemented by my colleagues has separate
drivers for the PF (aka native) and VF devices.  I don't personally
believe this is the correct path, but I'm reserving judgement until I
see some code.

I don't think we really know what the One True Usage model is for VF
devices.  Chris Wright has some ideas, I have some ideas and Yu Zhao has
some ideas.  I bet there's other people who have other ideas too.

-- 
Matthew Wilcox  Intel Open Source Technology Centre
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Greg KH
On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
 On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
  On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
   I have not modified any existing drivers, but instead I threw together
   a bare-bones module enabling me to make a call to pci_iov_register()
   and then poke at an SR-IOV adapter's /sys entries for which no driver
   was loaded.
   
   It appears from my perusal thus far that drivers using these new
   SR-IOV patches will require modification; i.e. the driver associated
   with the Physical Function (PF) will be required to make the
   pci_iov_register() call along with the requisite notify() function.
   Essentially this suggests to me a model for the PF driver to perform
   any global actions or setup on behalf of VFs before enabling them
   after which VF drivers could be associated.
  
  Where would the VF drivers have to be associated?  On the pci_dev
  level or on a higher one?
  
  Will all drivers that want to bind to a VF device need to be
  rewritten?
 
 The current model being implemented by my colleagues has separate
 drivers for the PF (aka native) and VF devices.  I don't personally
 believe this is the correct path, but I'm reserving judgement until I
 see some code.

Hm, I would like to see that code before we can properly evaluate this
interface.  Especially as they are all tightly tied together.

 I don't think we really know what the One True Usage model is for VF
 devices.  Chris Wright has some ideas, I have some ideas and Yu Zhao has
 some ideas.  I bet there's other people who have other ideas too.

I'd love to hear those ideas.

Rumor has it, there is some Xen code floating around to support this
already, is that true?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Greg KH
On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote:
  On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
   I have not modified any existing drivers, but instead I threw
  together
   a bare-bones module enabling me to make a call to pci_iov_register()
   and then poke at an SR-IOV adapter's /sys entries for which no driver
   was loaded.
  
   It appears from my perusal thus far that drivers using these new
   SR-IOV patches will require modification; i.e. the driver associated
   with the Physical Function (PF) will be required to make the
   pci_iov_register() call along with the requisite notify() function.
   Essentially this suggests to me a model for the PF driver to perform
   any global actions or setup on behalf of VFs before enabling them
   after which VF drivers could be associated.
 
  Where would the VF drivers have to be associated?  On the pci_dev
  level or on a higher one?
 
 A VF appears to the Linux OS as a standard (full, additional) PCI
 device. The driver is associated in the same way as for a normal PCI
 device. Ideally, you would use SR-IOV devices on a virtualized system,
 for example, using Xen. A VF can then be assigned to a guest domain as
 a full PCI device.

It's that second part that I'm worried about.  How is that going to
happen?  Do you have any patches that show this kind of assignment?

  Will all drivers that want to bind to a VF device need to be
  rewritten?
 
 Currently, any vendor providing a SR-IOV device needs to provide a PF
 driver and a VF driver that runs on their hardware.

Are there any such drivers available yet?

 A VF driver does not necessarily need to know much about SR-IOV but
 just run on the presented PCI device. You might want to have a
 communication channel between PF and VF driver though, for various
 reasons, if such a channel is not provided in hardware.

Agreed, but what does that channel look like in Linux?

I have some ideas of what I think it should look like, but if people
already have code, I'd love to see that as well.

   I have so far only seen Yu Zhao's 7-patch set.  I've not yet looked
   at his subsequently tendered 15-patch set so I don't know what has
   changed.The hardware/firmware implementation for any given SR-IOV
   compatible device, will determine the extent of differences required
   between a PF driver and a VF driver.
 
  Yeah, that's what I'm worried/curious about.  Without seeing the code
  for such a driver, how can we properly evaluate if this infrastructure
  is the correct one and the proper way to do all of this?
 
 Yu's API allows a PF driver to register with the Linux PCI code and
 use it to activate VFs and allocate their resources. The PF driver
 needs to be modified to work with that API. While you can argue about
 how that API is supposed to look like, it is clear that such an API is
 required in some form.

I totally agree, I'm arguing about what that API looks like :)

I want to see some code...

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread H L

--- On Thu, 11/6/08, Greg KH [EMAIL PROTECTED] wrote:
 
 On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
  I have not modified any existing drivers, but instead
 I threw together
  a bare-bones module enabling me to make a call to
 pci_iov_register()
  and then poke at an SR-IOV adapter's /sys entries
 for which no driver
  was loaded.
  
  It appears from my perusal thus far that drivers using
 these new
  SR-IOV patches will require modification; i.e. the
 driver associated
  with the Physical Function (PF) will be required to
 make the
  pci_iov_register() call along with the requisite
 notify() function.
  Essentially this suggests to me a model for the PF
 driver to perform
  any global actions or setup on behalf of
 VFs before enabling them
  after which VF drivers could be associated.
 
 Where would the VF drivers have to be associated?  On the
 pci_dev
 level or on a higher one?


I have not yet fully grocked Yu Zhao's model to answer this.  That said, I 
would *hope* to find it on the pci_dev level.


 Will all drivers that want to bind to a VF
 device need to be
 rewritten?

Not necessarily, or perhaps minimally; depends on hardware/firmware and actions 
the driver wants to take.  An example here might assist.  Let's just say 
someone has created, oh, I don't know, maybe an SR-IOV NIC.  Now, for 'general' 
I/O operations to pass network traffic back and forth there would ideally be no 
difference in the actions and therefore behavior of a PF driver and a VF 
driver.  But, what do you do in the instance a VF wants to change link-speed?  
As that physical characteristic affects all VFs, how do you handle that?  This 
is where the hardware/firmware implementation part comes to play.  If a VF 
driver performs some actions to initiate the change in link speed, the logic in 
the adapter could be anything like:

1.  Acknowledge the request as if it were really done, but effectively ignore 
it.  The Independent Hardware Vendor (IHV) might dictate that if you want to 
change any global characteristics of an adapter, you may only do so via the 
PF driver.  Granted, this, depending on the device class, may just not be 
acceptable.

2.  Acknowledge the request and then trigger an interrupt to the PF driver to 
have it assist.  The PF driver might then just set the new link-speed, or it 
could result in a PF driver communicating by some mechanism to all of the VF 
driver instances that this change of link-speed was requested.

3.  Acknowledge the request and perform inner PF and VF communication of this 
event within the logic of the card (e.g. to vote on whether or not to perform 
this action) with interrupts and associated status delivered to all PF and VF 
drivers.

The list goes on.

 
  I have so far only seen Yu Zhao's
 7-patch set.  I've not yet looked
  at his subsequently tendered 15-patch set
 so I don't know what has
  changed.The hardware/firmware implementation for
 any given SR-IOV
  compatible device, will determine the extent of
 differences required
  between a PF driver and a VF driver.
 
 Yeah, that's what I'm worried/curious about. 
 Without seeing the code
 for such a driver, how can we properly evaluate if this
 infrastructure
 is the correct one and the proper way to do all of this?


As the example above demonstrates, that's a tough question to answer.  Ideally, 
in my view, there would only be one driver written per SR-IOV device and it 
would contain the logic to do the right things based on whether its running 
as a PF or VF with that determination easily accomplished by testing the 
existence of the SR-IOV extended capability.Then, in an effort to minimize 
(if not eliminate) the complexities of driver-to-driver actions for fielding 
global events, contain as much of the logic as is possible within the 
adapter.  Minimizing the efforts required for the device driver writers in my 
opinion paves the way to greater adoption of this technology.



  

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Matthew Wilcox

[Anna, can you fix your word-wrapping please?  Your lines appear to be
infinitely long which is most unpleasant to reply to]

On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote:
  Where would the VF drivers have to be associated?  On the pci_dev
  level or on a higher one?
 
 A VF appears to the Linux OS as a standard (full, additional) PCI
 device. The driver is associated in the same way as for a normal PCI
 device. Ideally, you would use SR-IOV devices on a virtualized system,
 for example, using Xen. A VF can then be assigned to a guest domain as
 a full PCI device.

It's not clear thats the right solution.  If the VF devices are _only_
going to be used by the guest, then arguably, we don't want to create
pci_devs for them in the host.  (I think it _is_ the right answer, but I
want to make it clear there's multiple opinions on this).

  Will all drivers that want to bind to a VF device need to be
  rewritten?
 
 Currently, any vendor providing a SR-IOV device needs to provide a PF
 driver and a VF driver that runs on their hardware. A VF driver does not
 necessarily need to know much about SR-IOV but just run on the presented
 PCI device. You might want to have a communication channel between PF
 and VF driver though, for various reasons, if such a channel is not
 provided in hardware.

That is one model.  Another model is to provide one driver that can
handle both PF and VF devices.  A third model is to provide, say, a
Windows VF driver and a Xen PF driver and only support Windows-on-Xen.
(This last would probably be an exercise in foot-shooting, but
nevertheless, I've heard it mooted).

  Yeah, that's what I'm worried/curious about.  Without seeing the code
  for such a driver, how can we properly evaluate if this infrastructure
  is the correct one and the proper way to do all of this?
 
 Yu's API allows a PF driver to register with the Linux PCI code and use
 it to activate VFs and allocate their resources. The PF driver needs to
 be modified to work with that API. While you can argue about how that API
 is supposed to look like, it is clear that such an API is required in some
 form. The PF driver needs to know when VFs are active as it might want to
 allocate further (device-specific) resources to VFs or initiate further
 (device-specific) configurations. While probably a lot of SR-IOV specific
 code has to be in the PF driver, there is also support required from
 the Linux PCI subsystem, which is to some extend provided by Yu's patches.

Everyone agrees that some support is necessary.  The question is exactly
what it looks like.  I must confess to not having reviewed this latest
patch series yet -- I'm a little burned out on patch review.

-- 
Matthew Wilcox  Intel Open Source Technology Centre
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Greg KH
On Thu, Nov 06, 2008 at 10:05:39AM -0800, H L wrote:
 
 --- On Thu, 11/6/08, Greg KH [EMAIL PROTECTED] wrote:
  
  On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
   I have not modified any existing drivers, but instead
  I threw together
   a bare-bones module enabling me to make a call to
  pci_iov_register()
   and then poke at an SR-IOV adapter's /sys entries
  for which no driver
   was loaded.
   
   It appears from my perusal thus far that drivers using
  these new
   SR-IOV patches will require modification; i.e. the
  driver associated
   with the Physical Function (PF) will be required to
  make the
   pci_iov_register() call along with the requisite
  notify() function.
   Essentially this suggests to me a model for the PF
  driver to perform
   any global actions or setup on behalf of
  VFs before enabling them
   after which VF drivers could be associated.
  
  Where would the VF drivers have to be associated?  On the
  pci_dev
  level or on a higher one?
 
 
 I have not yet fully grocked Yu Zhao's model to answer this.  That
 said, I would *hope* to find it on the pci_dev level.

Me too.

  Will all drivers that want to bind to a VF
  device need to be
  rewritten?
 
 Not necessarily, or perhaps minimally; depends on hardware/firmware
 and actions the driver wants to take.  An example here might assist.
 Let's just say someone has created, oh, I don't know, maybe an SR-IOV
 NIC.  Now, for 'general' I/O operations to pass network traffic back
 and forth there would ideally be no difference in the actions and
 therefore behavior of a PF driver and a VF driver.  But, what do you
 do in the instance a VF wants to change link-speed?  As that physical
 characteristic affects all VFs, how do you handle that?  This is where
 the hardware/firmware implementation part comes to play.  If a VF
 driver performs some actions to initiate the change in link speed, the
 logic in the adapter could be anything like:

snip

Yes, I agree that all of this needs to be done, somehow.

It's that somehow that I am interested in trying to see how it works
out.

  
   I have so far only seen Yu Zhao's
  7-patch set.  I've not yet looked
   at his subsequently tendered 15-patch set
  so I don't know what has
   changed.The hardware/firmware implementation for
  any given SR-IOV
   compatible device, will determine the extent of
  differences required
   between a PF driver and a VF driver.
  
  Yeah, that's what I'm worried/curious about. 
  Without seeing the code
  for such a driver, how can we properly evaluate if this
  infrastructure
  is the correct one and the proper way to do all of this?
 
 
 As the example above demonstrates, that's a tough question to answer.
 Ideally, in my view, there would only be one driver written per SR-IOV
 device and it would contain the logic to do the right things based
 on whether its running as a PF or VF with that determination easily
 accomplished by testing the existence of the SR-IOV extended
 capability.Then, in an effort to minimize (if not eliminate) the
 complexities of driver-to-driver actions for fielding global events,
 contain as much of the logic as is possible within the adapter.
 Minimizing the efforts required for the device driver writers in my
 opinion paves the way to greater adoption of this technology.

Yes, making things easier is the key here.

Perhaps some of this could be hidden with a new bus type for these kinds
of devices?  Or a virtual bus of pci devices that the original SR-IOV
device creates that corrispond to the individual virtual PCI devices?
If that were the case, then it might be a lot easier in the end.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Fischer, Anna
 Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

 On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote:
   On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
I have not modified any existing drivers, but instead I threw
   together
a bare-bones module enabling me to make a call to
 pci_iov_register()
and then poke at an SR-IOV adapter's /sys entries for which no
 driver
was loaded.
   
It appears from my perusal thus far that drivers using these new
SR-IOV patches will require modification; i.e. the driver
 associated
with the Physical Function (PF) will be required to make the
pci_iov_register() call along with the requisite notify()
 function.
Essentially this suggests to me a model for the PF driver to
 perform
any global actions or setup on behalf of VFs before enabling
 them
after which VF drivers could be associated.
  
   Where would the VF drivers have to be associated?  On the pci_dev
   level or on a higher one?
 
  A VF appears to the Linux OS as a standard (full, additional) PCI
  device. The driver is associated in the same way as for a normal PCI
  device. Ideally, you would use SR-IOV devices on a virtualized
 system,
  for example, using Xen. A VF can then be assigned to a guest domain
 as
  a full PCI device.

 It's that second part that I'm worried about.  How is that going to
 happen?  Do you have any patches that show this kind of assignment?

That depends on your setup. Using Xen, you could assign the VF to a guest 
domain like any other PCI device, e.g. using PCI pass-through. For VMware, KVM, 
there are standard ways to do that, too. I currently don't see why SR-IOV 
devices would need any specific, non-standard mechanism for device assignment.


   Will all drivers that want to bind to a VF device need to be
   rewritten?
 
  Currently, any vendor providing a SR-IOV device needs to provide a PF
  driver and a VF driver that runs on their hardware.

 Are there any such drivers available yet?

I don't know.


  A VF driver does not necessarily need to know much about SR-IOV but
  just run on the presented PCI device. You might want to have a
  communication channel between PF and VF driver though, for various
  reasons, if such a channel is not provided in hardware.

 Agreed, but what does that channel look like in Linux?

 I have some ideas of what I think it should look like, but if people
 already have code, I'd love to see that as well.

At this point I would guess that this code is vendor specific, as are the 
drivers. The issue I see is that most likely drivers will run in different 
environments, for example, in Xen the PF driver runs in a driver domain while a 
VF driver runs in a guest VM. So a communication channel would need to be 
either Xen specific, or vendor specific. Also, a guest using the VF might run 
Windows while the PF might be controlled under Linux.

Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Simon Horman
On Thu, Nov 06, 2008 at 09:53:08AM -0800, Greg KH wrote:
 On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
  On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
   On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
I have not modified any existing drivers, but instead I threw together
a bare-bones module enabling me to make a call to pci_iov_register()
and then poke at an SR-IOV adapter's /sys entries for which no driver
was loaded.

It appears from my perusal thus far that drivers using these new
SR-IOV patches will require modification; i.e. the driver associated
with the Physical Function (PF) will be required to make the
pci_iov_register() call along with the requisite notify() function.
Essentially this suggests to me a model for the PF driver to perform
any global actions or setup on behalf of VFs before enabling them
after which VF drivers could be associated.
   
   Where would the VF drivers have to be associated?  On the pci_dev
   level or on a higher one?
   
   Will all drivers that want to bind to a VF device need to be
   rewritten?
  
  The current model being implemented by my colleagues has separate
  drivers for the PF (aka native) and VF devices.  I don't personally
  believe this is the correct path, but I'm reserving judgement until I
  see some code.
 
 Hm, I would like to see that code before we can properly evaluate this
 interface.  Especially as they are all tightly tied together.
 
  I don't think we really know what the One True Usage model is for VF
  devices.  Chris Wright has some ideas, I have some ideas and Yu Zhao has
  some ideas.  I bet there's other people who have other ideas too.
 
 I'd love to hear those ideas.
 
 Rumor has it, there is some Xen code floating around to support this
 already, is that true?

Xen patches were posted to xen-devel by Yu Zhao on the 29th of September [1].
Unfortunately the only responses that I can find are a) that the patches
were mangled and b) they seem to include changes (by others) that have
been merged into Linux. I have confirmed that both of these concerns
are valid.

I have not yet examined the difference, if any, in the approach taken by Yu
to SR-IOV in Linux and Xen. Unfortunately comparison is less than trivial
due to the gaping gap in kernel versions between Linux-Xen (2.6.18.8) and
Linux itself.

One approach that I was considering in order to familiarise myself with the
code was to backport the v6 Linux patches (this thread) to Linux-Xen. I made a
start on that, but again due to kernel version differences it is non-trivial.

[1] http://lists.xensource.com/archives/html/xen-devel/2008-09/msg00923.html

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Anthony Liguori

Matthew Wilcox wrote:

[Anna, can you fix your word-wrapping please?  Your lines appear to be
infinitely long which is most unpleasant to reply to]

On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote:
  

Where would the VF drivers have to be associated?  On the pci_dev
level or on a higher one?
  

A VF appears to the Linux OS as a standard (full, additional) PCI
device. The driver is associated in the same way as for a normal PCI
device. Ideally, you would use SR-IOV devices on a virtualized system,
for example, using Xen. A VF can then be assigned to a guest domain as
a full PCI device.



It's not clear thats the right solution.  If the VF devices are _only_
going to be used by the guest, then arguably, we don't want to create
pci_devs for them in the host.  (I think it _is_ the right answer, but I
want to make it clear there's multiple opinions on this).
  


The VFs shouldn't be limited to being used by the guest.

SR-IOV is actually an incredibly painful thing.  You need to have a VF 
driver in the guest, do hardware pass through, have a PV driver stub in 
the guest that's hypervisor specific (a VF is not usable on it's own), 
have a device specific backend in the VMM, and if you want to do live 
migration, have another PV driver in the guest that you can do teaming 
with.  Just a mess.


What we would rather do in KVM, is have the VFs appear in the host as 
standard network devices.  We would then like to back our existing PV 
driver to this VF directly bypassing the host networking stack.  A key 
feature here is being able to fill the VF's receive queue with guest 
memory instead of host kernel memory so that you can get zero-copy 
receive traffic.  This will perform just as well as doing passthrough 
(at least) and avoid all that ugliness of dealing with SR-IOV in the guest.


This eliminates all of the mess of various drivers in the guest and all 
the associated baggage of doing hardware passthrough.


So IMHO, having VFs be usable in the host is absolutely critical because 
I think it's the only reasonable usage model.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Anthony Liguori

Greg KH wrote:

On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
  

I don't think we really know what the One True Usage model is for VF
devices.  Chris Wright has some ideas, I have some ideas and Yu Zhao has
some ideas.  I bet there's other people who have other ideas too.



I'd love to hear those ideas.
  


We've been talking about avoiding hardware passthrough entirely and just 
backing a virtio-net backend driver by a dedicated VF in the host.  That 
avoids a huge amount of guest-facing complexity, let's migration Just 
Work, and should give the same level of performance.


Regards,

Anthony Liguori


Rumor has it, there is some Xen code floating around to support this
already, is that true?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Matthew Wilcox
On Thu, Nov 06, 2008 at 04:38:40PM -0600, Anthony Liguori wrote:
 It's not clear thats the right solution.  If the VF devices are _only_
 going to be used by the guest, then arguably, we don't want to create
 pci_devs for them in the host.  (I think it _is_ the right answer, but I
 want to make it clear there's multiple opinions on this).
 
 The VFs shouldn't be limited to being used by the guest.
 
 SR-IOV is actually an incredibly painful thing.  You need to have a VF 
 driver in the guest, do hardware pass through, have a PV driver stub in 
 the guest that's hypervisor specific (a VF is not usable on it's own), 
 have a device specific backend in the VMM, and if you want to do live 
 migration, have another PV driver in the guest that you can do teaming 
 with.  Just a mess.

Not to mention that you basically have to statically allocate them up
front.

 What we would rather do in KVM, is have the VFs appear in the host as 
 standard network devices.  We would then like to back our existing PV 
 driver to this VF directly bypassing the host networking stack.  A key 
 feature here is being able to fill the VF's receive queue with guest 
 memory instead of host kernel memory so that you can get zero-copy 
 receive traffic.  This will perform just as well as doing passthrough 
 (at least) and avoid all that ugliness of dealing with SR-IOV in the guest.

This argues for ignoring the SR-IOV mess completely.  Just have the
host driver expose multiple 'ethN' devices.

 This eliminates all of the mess of various drivers in the guest and all 
 the associated baggage of doing hardware passthrough.
 
 So IMHO, having VFs be usable in the host is absolutely critical because 
 I think it's the only reasonable usage model.
 
 Regards,
 
 Anthony Liguori
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Matthew Wilcox  Intel Open Source Technology Centre
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Chris Wright
* Greg KH ([EMAIL PROTECTED]) wrote:
 On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
  On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
   On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
I have not modified any existing drivers, but instead I threw together
a bare-bones module enabling me to make a call to pci_iov_register()
and then poke at an SR-IOV adapter's /sys entries for which no driver
was loaded.

It appears from my perusal thus far that drivers using these new
SR-IOV patches will require modification; i.e. the driver associated
with the Physical Function (PF) will be required to make the
pci_iov_register() call along with the requisite notify() function.
Essentially this suggests to me a model for the PF driver to perform
any global actions or setup on behalf of VFs before enabling them
after which VF drivers could be associated.
   
   Where would the VF drivers have to be associated?  On the pci_dev
   level or on a higher one?
   
   Will all drivers that want to bind to a VF device need to be
   rewritten?
  
  The current model being implemented by my colleagues has separate
  drivers for the PF (aka native) and VF devices.  I don't personally
  believe this is the correct path, but I'm reserving judgement until I
  see some code.
 
 Hm, I would like to see that code before we can properly evaluate this
 interface.  Especially as they are all tightly tied together.
 
  I don't think we really know what the One True Usage model is for VF
  devices.  Chris Wright has some ideas, I have some ideas and Yu Zhao has
  some ideas.  I bet there's other people who have other ideas too.
 
 I'd love to hear those ideas.

First there's the question of how to represent the VF on the host.
Ideally (IMO) this would show up as a normal interface so that normal tools
can configure the interface.  This is not exactly how the first round of
patches were designed.

Second there's the question of reserving the BDF on the host such that
we don't have two drivers (one in the host and one in a guest) trying to
drive the same device (an issue that shows up for device assignment as
well as VF assignment).

Third there's the question of whether the VF can be used in the host at
all.

Fourth there's the question of whether the VF and PF drivers are the
same or separate.

The typical usecase is assigning the VF to the guest directly, so
there's only enough functionality in the host side to allocate a VF,
configure it, and assign it (and propagate AER).  This is with separate
PF and VF driver.

As Anthony mentioned, we are interested in allowing the host to use the
VF.  This could be useful for containers as well as dedicating a VF (a
set of device resources) to a guest w/out passing it through.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Dong, Eddie

 What we would rather do in KVM, is have the VFs appear in
 the host as standard network devices.  We would then like
 to back our existing PV driver to this VF directly
 bypassing the host networking stack.  A key feature here
 is being able to fill the VF's receive queue with guest
 memory instead of host kernel memory so that you can get
 zero-copy  
 receive traffic.  This will perform just as well as doing
 passthrough (at least) and avoid all that ugliness of
 dealing with SR-IOV in the guest. 
 

Anthony:
This is already addressed by VMDq solution(or so called netchannel2), 
right? Qing He is debugging the KVM side patch and pretty much close to end.

For this single purpose, we don't need SR-IOV. BTW at least Intel 
SR-IOV NIC also supports VMDq, so you can achieve this by simply use native 
VMDq enabled driver here, plus the work we are debugging now.

Thx, eddie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Nakajima, Jun
On 11/6/2008 2:38:40 PM, Anthony Liguori wrote:
 Matthew Wilcox wrote:
  [Anna, can you fix your word-wrapping please?  Your lines appear to
  be infinitely long which is most unpleasant to reply to]
 
  On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote:
 
Where would the VF drivers have to be associated?  On the pci_dev
level or on a higher one?
   
   A VF appears to the Linux OS as a standard (full, additional) PCI
   device. The driver is associated in the same way as for a normal
   PCI device. Ideally, you would use SR-IOV devices on a virtualized
   system, for example, using Xen. A VF can then be assigned to a
   guest domain as a full PCI device.
  
 
  It's not clear thats the right solution.  If the VF devices are
  _only_ going to be used by the guest, then arguably, we don't want
  to create pci_devs for them in the host.  (I think it _is_ the right
  answer, but I want to make it clear there's multiple opinions on this).
 

 The VFs shouldn't be limited to being used by the guest.

 SR-IOV is actually an incredibly painful thing.  You need to have a VF
 driver in the guest, do hardware pass through, have a PV driver stub
 in the guest that's hypervisor specific (a VF is not usable on it's
 own), have a device specific backend in the VMM, and if you want to do
 live migration, have another PV driver in the guest that you can do
 teaming with.  Just a mess.

Actually a PV driver stub in the guest _was_ correct; I admit that I stated 
so at a virt mini summit more than a half year ago ;-). But the things have 
changed, and such a stub is no longer required (at least in our 
implementation). The major benefit of VF drivers now is that they are 
VMM-agnostic.


 What we would rather do in KVM, is have the VFs appear in the host as
 standard network devices.  We would then like to back our existing PV
 driver to this VF directly bypassing the host networking stack.  A key
 feature here is being able to fill the VF's receive queue with guest
 memory instead of host kernel memory so that you can get zero-copy
 receive traffic.  This will perform just as well as doing passthrough
 (at
 least) and avoid all that ugliness of dealing with SR-IOV in the guest.

 This eliminates all of the mess of various drivers in the guest and
 all the associated baggage of doing hardware passthrough.

 So IMHO, having VFs be usable in the host is absolutely critical
 because I think it's the only reasonable usage model.

As Eddie said, VMDq is better for this model, and the feature is already 
available today. It is much simpler because it was designed for such purposes. 
It does not require hardware pass-through (e.g. VT-d) or VFs as a PCI device, 
either.


 Regards,

 Anthony Liguori
 --
 To unsubscribe from this list: send the line unsubscribe kvm in the
 body of a message to [EMAIL PROTECTED] More majordomo info at
 http://vger.kernel.org/majordomo-info.html
 .
Jun Nakajima | Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Zhao, Yu

Greg KH wrote:

On Wed, Oct 22, 2008 at 04:38:09PM +0800, Yu Zhao wrote:

Greetings,

Following patches are intended to support SR-IOV capability in the
Linux kernel. With these patches, people can turn a PCI device with
the capability into multiple ones from software perspective, which
will benefit KVM and achieve other purposes such as QoS, security,
and etc.


Is there any actual users of this API around yet?  How was it tested as
there is no hardware to test on?  Which drivers are going to have to be
rewritten to take advantage of this new interface?


Yes, the API is used by Intel, HP, NextIO and some other anonymous 
companies as they rise questions and send me feedback. I haven't seen 
their works but I guess some of drivers using SR-IOV API are going to be 
released soon.


My test was done with Intel 82576 Gigabit Ethernet Controller. The 
product brief is at 
http://download.intel.com/design/network/ProdBrf/320025.pdf and the spec 
is available at 
http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf


Regards,
Yu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Zhao, Yu

Greg KH wrote:

On Thu, Nov 06, 2008 at 10:05:39AM -0800, H L wrote:

--- On Thu, 11/6/08, Greg KH [EMAIL PROTECTED] wrote:
 

On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:

I have not modified any existing drivers, but instead

I threw together

a bare-bones module enabling me to make a call to

pci_iov_register()

and then poke at an SR-IOV adapter's /sys entries

for which no driver

was loaded.

It appears from my perusal thus far that drivers using

these new

SR-IOV patches will require modification; i.e. the

driver associated

with the Physical Function (PF) will be required to

make the

pci_iov_register() call along with the requisite

notify() function.

Essentially this suggests to me a model for the PF

driver to perform

any global actions or setup on behalf of

VFs before enabling them

after which VF drivers could be associated.

Where would the VF drivers have to be associated?  On the
pci_dev
level or on a higher one?


I have not yet fully grocked Yu Zhao's model to answer this.  That
said, I would *hope* to find it on the pci_dev level.


Me too.


VF is kind of lightweight PCI device, and it's represented by struct 
pci_dev. VF driver bounds to the pci_dev and works in the same way as 
other drivers.





Will all drivers that want to bind to a VF
device need to be
rewritten?

Not necessarily, or perhaps minimally; depends on hardware/firmware
and actions the driver wants to take.  An example here might assist.
Let's just say someone has created, oh, I don't know, maybe an SR-IOV
NIC.  Now, for 'general' I/O operations to pass network traffic back
and forth there would ideally be no difference in the actions and
therefore behavior of a PF driver and a VF driver.  But, what do you
do in the instance a VF wants to change link-speed?  As that physical
characteristic affects all VFs, how do you handle that?  This is where
the hardware/firmware implementation part comes to play.  If a VF
driver performs some actions to initiate the change in link speed, the
logic in the adapter could be anything like:


snip

Yes, I agree that all of this needs to be done, somehow.

It's that somehow that I am interested in trying to see how it works
out.


This is device specific part. VF driver is free to do what it wants to 
do with device specific registers and resources, and wouldn't concern us 
as far as it behaves as PCI device driver.





I have so far only seen Yu Zhao's

7-patch set.  I've not yet looked

at his subsequently tendered 15-patch set

so I don't know what has

changed.The hardware/firmware implementation for

any given SR-IOV

compatible device, will determine the extent of

differences required

between a PF driver and a VF driver.
Yeah, that's what I'm worried/curious about. 
Without seeing the code

for such a driver, how can we properly evaluate if this
infrastructure
is the correct one and the proper way to do all of this?


As the example above demonstrates, that's a tough question to answer.
Ideally, in my view, there would only be one driver written per SR-IOV
device and it would contain the logic to do the right things based
on whether its running as a PF or VF with that determination easily
accomplished by testing the existence of the SR-IOV extended
capability.Then, in an effort to minimize (if not eliminate) the
complexities of driver-to-driver actions for fielding global events,
contain as much of the logic as is possible within the adapter.
Minimizing the efforts required for the device driver writers in my
opinion paves the way to greater adoption of this technology.


Yes, making things easier is the key here.

Perhaps some of this could be hidden with a new bus type for these kinds
of devices?  Or a virtual bus of pci devices that the original SR-IOV
device creates that corrispond to the individual virtual PCI devices?
If that were the case, then it might be a lot easier in the end.


PCI SIG only defines SR-IOV at PCI level, we can't predict what the 
hardware vendors would implement at device specific logic level.


An example of SR-IOV NIC: PF may not have network functionality, it only 
controls VFs. Because people only want to use VFs in virtual machines, 
they don't need network functionality in the environment (e.g. 
hypervisor) where PF resides.


Thanks,
Yu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Greg KH
On Fri, Nov 07, 2008 at 01:18:52PM +0800, Zhao, Yu wrote:
 Greg KH wrote:
 On Wed, Oct 22, 2008 at 04:38:09PM +0800, Yu Zhao wrote:
 Greetings,

 Following patches are intended to support SR-IOV capability in the
 Linux kernel. With these patches, people can turn a PCI device with
 the capability into multiple ones from software perspective, which
 will benefit KVM and achieve other purposes such as QoS, security,
 and etc.
 Is there any actual users of this API around yet?  How was it tested as
 there is no hardware to test on?  Which drivers are going to have to be
 rewritten to take advantage of this new interface?

 Yes, the API is used by Intel, HP, NextIO and some other anonymous 
 companies as they rise questions and send me feedback. I haven't seen their 
 works but I guess some of drivers using SR-IOV API are going to be released 
 soon.

Well, we can't merge infrastructure without seeing the users of that
infrastructure, right?

 My test was done with Intel 82576 Gigabit Ethernet Controller. The product 
 brief is at http://download.intel.com/design/network/ProdBrf/320025.pdf and 
 the spec is available at 
 http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf

Cool, do you have that driver we can see?

How does it interact and handle the kvm and xen issues that have been
posted?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Greg KH
On Thu, Nov 06, 2008 at 03:54:06PM -0800, Chris Wright wrote:
 * Greg KH ([EMAIL PROTECTED]) wrote:
  On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
   On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
 I have not modified any existing drivers, but instead I threw together
 a bare-bones module enabling me to make a call to pci_iov_register()
 and then poke at an SR-IOV adapter's /sys entries for which no driver
 was loaded.
 
 It appears from my perusal thus far that drivers using these new
 SR-IOV patches will require modification; i.e. the driver associated
 with the Physical Function (PF) will be required to make the
 pci_iov_register() call along with the requisite notify() function.
 Essentially this suggests to me a model for the PF driver to perform
 any global actions or setup on behalf of VFs before enabling them
 after which VF drivers could be associated.

Where would the VF drivers have to be associated?  On the pci_dev
level or on a higher one?

Will all drivers that want to bind to a VF device need to be
rewritten?
   
   The current model being implemented by my colleagues has separate
   drivers for the PF (aka native) and VF devices.  I don't personally
   believe this is the correct path, but I'm reserving judgement until I
   see some code.
  
  Hm, I would like to see that code before we can properly evaluate this
  interface.  Especially as they are all tightly tied together.
  
   I don't think we really know what the One True Usage model is for VF
   devices.  Chris Wright has some ideas, I have some ideas and Yu Zhao has
   some ideas.  I bet there's other people who have other ideas too.
  
  I'd love to hear those ideas.
 
 First there's the question of how to represent the VF on the host.
 Ideally (IMO) this would show up as a normal interface so that normal tools
 can configure the interface.  This is not exactly how the first round of
 patches were designed.
 
 Second there's the question of reserving the BDF on the host such that
 we don't have two drivers (one in the host and one in a guest) trying to
 drive the same device (an issue that shows up for device assignment as
 well as VF assignment).
 
 Third there's the question of whether the VF can be used in the host at
 all.
 
 Fourth there's the question of whether the VF and PF drivers are the
 same or separate.
 
 The typical usecase is assigning the VF to the guest directly, so
 there's only enough functionality in the host side to allocate a VF,
 configure it, and assign it (and propagate AER).  This is with separate
 PF and VF driver.
 
 As Anthony mentioned, we are interested in allowing the host to use the
 VF.  This could be useful for containers as well as dedicating a VF (a
 set of device resources) to a guest w/out passing it through.

All of this looks great.  So, with all of these questions, how does the
current code pertain to these issues?  It seems like we have a long way
to go...

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Greg KH
On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote:
 Greg KH wrote:
 On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
   
 I don't think we really know what the One True Usage model is for VF
 devices.  Chris Wright has some ideas, I have some ideas and Yu Zhao has
 some ideas.  I bet there's other people who have other ideas too.
 

 I'd love to hear those ideas.
   

 We've been talking about avoiding hardware passthrough entirely and
 just backing a virtio-net backend driver by a dedicated VF in the
 host.  That avoids a huge amount of guest-facing complexity, let's
 migration Just Work, and should give the same level of performance.

Does that involve this patch set?  Or a different type of interface.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Greg KH
On Thu, Nov 06, 2008 at 03:58:54PM -0700, Matthew Wilcox wrote:
  What we would rather do in KVM, is have the VFs appear in the host as 
  standard network devices.  We would then like to back our existing PV 
  driver to this VF directly bypassing the host networking stack.  A key 
  feature here is being able to fill the VF's receive queue with guest 
  memory instead of host kernel memory so that you can get zero-copy 
  receive traffic.  This will perform just as well as doing passthrough 
  (at least) and avoid all that ugliness of dealing with SR-IOV in the guest.
 
 This argues for ignoring the SR-IOV mess completely.  Just have the
 host driver expose multiple 'ethN' devices.

That would work, but do we want to do that for every different type of
driver?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Leonid Grossman


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf
Of
 Zhao, Yu
 Sent: Thursday, November 06, 2008 11:06 PM
 To: Chris Wright
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED];
 Matthew Wilcox; Greg KH; [EMAIL PROTECTED];
[EMAIL PROTECTED];
 [EMAIL PROTECTED]; [EMAIL PROTECTED];
 kvm@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
 
 Chris Wright wrote:
  * Greg KH ([EMAIL PROTECTED]) wrote:
  On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
  On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
  On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
  I have not modified any existing drivers, but instead I threw
 together
  a bare-bones module enabling me to make a call to
pci_iov_register()
  and then poke at an SR-IOV adapter's /sys entries for which no
 driver
  was loaded.
 
  It appears from my perusal thus far that drivers using these new
  SR-IOV patches will require modification; i.e. the driver
associated
  with the Physical Function (PF) will be required to make the
  pci_iov_register() call along with the requisite notify()
function.
  Essentially this suggests to me a model for the PF driver to
perform
  any global actions or setup on behalf of VFs before enabling
them
  after which VF drivers could be associated.
  Where would the VF drivers have to be associated?  On the
pci_dev
  level or on a higher one?
 
  Will all drivers that want to bind to a VF device need to be
  rewritten?
  The current model being implemented by my colleagues has separate
  drivers for the PF (aka native) and VF devices.  I don't
personally
  believe this is the correct path, but I'm reserving judgement
until I
  see some code.
  Hm, I would like to see that code before we can properly evaluate
this
  interface.  Especially as they are all tightly tied together.
 
  I don't think we really know what the One True Usage model is for
VF
  devices.  Chris Wright has some ideas, I have some ideas and Yu
Zhao
 has
  some ideas.  I bet there's other people who have other ideas too.
  I'd love to hear those ideas.
 
  First there's the question of how to represent the VF on the host.
  Ideally (IMO) this would show up as a normal interface so that
normal
 tools
  can configure the interface.  This is not exactly how the first
round of
  patches were designed.
 
 Whether the VF can show up as a normal interface is decided by VF
 driver. VF is represented by 'pci_dev' at PCI level, so VF driver can
be
 loaded as normal PCI device driver.
 
 What the software representation (eth, framebuffer, etc.) created by
VF
 driver is not controlled by SR-IOV framework.
 
 So you definitely can use normal tool to configure the VF if its
driver
 supports that :-)
 
 
  Second there's the question of reserving the BDF on the host such
that
  we don't have two drivers (one in the host and one in a guest)
trying to
  drive the same device (an issue that shows up for device assignment
as
  well as VF assignment).
 
 If we don't reserve BDF for the device, they can't work neither in the
 host nor the guest.
 
 Without BDF, we can't access the config space of the device, the
device
 also can't do DMA.
 
 Did I miss your point?
 
 
  Third there's the question of whether the VF can be used in the host
at
  all.
 
 Why can't? My VFs work well in the host as normal PCI devices :-)
 
 
  Fourth there's the question of whether the VF and PF drivers are the
  same or separate.
 
 As I mentioned in another email of this thread. We can't predict how
 hardware vendor creates their SR-IOV device. PCI SIG doesn't define
 device specific logics.
 
 So I think the answer of this question is up to the device driver
 developers. If PF and VF in a SR-IOV device have similar logics, then
 they can combine the driver. Otherwise, e.g., if PF doesn't have real
 functionality at all -- it only has registers to control internal
 resource allocation for VFs, then the drivers should be separate,
right?


Right, this really depends upon the functionality behind a VF. If VF is
done as a subset of netdev interface (for example, a queue pair), then a
split VF/PF driver model and a proprietary communication channel is in
order. 

If each VF is done as a complete netdev interface (like in our 10GbE IOV
controllers), then PF and VF drivers could be the same. Each VF can be
independently driven by such native netdev driver; this includes the
ability to run a native driver in a guest in passthru mode. 
A PF driver in a privileged domain doesn't even have to be present.

 
 
  The typical usecase is assigning the VF to the guest directly, so
  there's only enough functionality in the host side to allocate a VF,
  configure it, and assign it (and propagate AER).  This is with
separate
  PF and VF driver.
 
  As Anthony mentioned, we are interested in allowing the host to use
the
  VF.  This could be useful for containers as well

Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Zhao, Yu

Greg KH wrote:

On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote:

Greg KH wrote:

On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
  

I don't think we really know what the One True Usage model is for VF
devices.  Chris Wright has some ideas, I have some ideas and Yu Zhao has
some ideas.  I bet there's other people who have other ideas too.


I'd love to hear those ideas.
  

We've been talking about avoiding hardware passthrough entirely and
just backing a virtio-net backend driver by a dedicated VF in the
host.  That avoids a huge amount of guest-facing complexity, let's
migration Just Work, and should give the same level of performance.


This can be commonly used not only with VF -- devices that have multiple 
DMA queues (e.g., Intel VMDq, Neterion Xframe) and even traditional 
devices can also take the advantage of this.


CC Rusty Russel in case he has more comments.



Does that involve this patch set?  Or a different type of interface.


I think that is a different type of interface. We need to hook the DMA 
interface in the device driver to virtio-net backend so the hardware 
(normal device, VF, VMDq, etc.) can DMA data to/from the virtio-net backend.


Regards,
Yu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-05 Thread Greg KH
On Wed, Oct 22, 2008 at 04:38:09PM +0800, Yu Zhao wrote:
 Greetings,
 
 Following patches are intended to support SR-IOV capability in the
 Linux kernel. With these patches, people can turn a PCI device with
 the capability into multiple ones from software perspective, which
 will benefit KVM and achieve other purposes such as QoS, security,
 and etc.

Is there any actual users of this API around yet?  How was it tested as
there is no hardware to test on?  Which drivers are going to have to be
rewritten to take advantage of this new interface?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html