Re: [PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

2014-09-17 Thread Ira W. Snyder
On Tue, Sep 16, 2014 at 10:22:27PM -0700, Andy Lutomirski wrote:
 On non-PPC systems, virtio_pci should use the DMA API.  This fixes
 virtio_pci on Xen.  On PPC, using the DMA API would break things, so
 we need to preserve the old behavior.
 
 The big comment in this patch explains the considerations in more
 detail.
 
 Signed-off-by: Andy Lutomirski l...@amacapital.net
 ---
  drivers/virtio/virtio_pci.c | 90 
 -
  1 file changed, 81 insertions(+), 9 deletions(-)
 
 diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
 index a1f299fa4626..8ddb0a641878 100644
 --- a/drivers/virtio/virtio_pci.c
 +++ b/drivers/virtio/virtio_pci.c
 @@ -80,8 +80,10 @@ struct virtio_pci_vq_info
   /* the number of entries in the queue */
   int num;
  
 - /* the virtual address of the ring queue */
 - void *queue;
 + /* the ring queue */
 + void *queue;/* virtual address */
 + dma_addr_t queue_dma_addr;  /* bus address */
 + bool use_dma_api;   /* are we using the DMA API? */
  
   /* the list node for the virtqueues list */
   struct list_head node;
 @@ -388,6 +390,50 @@ static int vp_request_intx(struct virtio_device *vdev)
   return err;
  }
  
 +static bool vp_use_dma_api(void)
 +{
 + /*
 +  * Due to limitations of the DMA API, we only have two choices:
 +  * use the DMA API (e.g. set up IOMMU mappings or apply Xen's
 +  * physical-to-machine translation) or use direct physical
 +  * addressing.  Furthermore, there's no sensible way yet for the
 +  * PCI bus code to tell us whether we're supposed to act like a
 +  * normal PCI device (and use the DMA API) or to do something
 +  * else.  So we're stuck with heuristics here.
 +  *
 +  * In general, we would prefer to use the DMA API, since we
 +  * might be driving a physical device, and such devices *must*
 +  * use the DMA API if there is an IOMMU involved.
 +  *
 +  * On x86, there are no physically-mapped emulated virtio PCI
 +  * devices that live behind an IOMMU.  On ARM, there don't seem
 +  * to be any hypervisors that use virtio_pci (as opposed to
 +  * virtio_mmio) that also emulate an IOMMU.  So using the DMI

Hi,

I noticed a typo here. It should say DMA not DMI. Just thought I'd
point it out.

Ira

 +  * API is safe.
 +  *
 +  * On PowerPC, it's the other way around.  There usually is an
 +  * IOMMU between us and the virtio PCI device, but the device is
 +  * probably emulated and ignores the IOMMU.  Unfortunately, we
 +  * can't tell whether we're talking to an emulated device or to
 +  * a physical device that really lives behind the IOMMU.  That
 +  * means that we're stuck with ignoring the DMA API.
 +  */
 +
 +#ifdef CONFIG_PPC
 + return false;
 +#else
 + /*
 +  * Minor optimization: if the platform promises to have physical
 +  * PCI DMA, we turn off DMA mapping in virtio_ring.  If the
 +  * platform's DMA API implementation is well optimized, this
 +  * should have almost no effect, but we already have a branch in
 +  * the vring code, and we can avoid any further indirection with
 +  * very little effort.
 +  */
 + return !PCI_DMA_BUS_IS_PHYS;
 +#endif
 +}
 +
  static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 void (*callback)(struct virtqueue *vq),
 const char *name,
 @@ -416,21 +462,30 @@ static struct virtqueue *setup_vq(struct virtio_device 
 *vdev, unsigned index,
  
   info-num = num;
   info-msix_vector = msix_vec;
 + info-use_dma_api = vp_use_dma_api();
  
 - size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
 - info-queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO);
 + size = vring_size(num, VIRTIO_PCI_VRING_ALIGN);
 + if (info-use_dma_api) {
 + info-queue = dma_zalloc_coherent(vdev-dev.parent, size,
 +   info-queue_dma_addr,
 +   GFP_KERNEL);
 + } else {
 + info-queue = alloc_pages_exact(PAGE_ALIGN(size),
 + GFP_KERNEL|__GFP_ZERO);
 + info-queue_dma_addr = virt_to_phys(info-queue);
 + }
   if (info-queue == NULL) {
   err = -ENOMEM;
   goto out_info;
   }
  
   /* activate the queue */
 - iowrite32(virt_to_phys(info-queue)  VIRTIO_PCI_QUEUE_ADDR_SHIFT,
 + iowrite32(info-queue_dma_addr  VIRTIO_PCI_QUEUE_ADDR_SHIFT,
 vp_dev-ioaddr + VIRTIO_PCI_QUEUE_PFN);
  
   /* create the vring */
   vq = vring_new_virtqueue(index, info-num, VIRTIO_PCI_VRING_ALIGN, vdev,
 -  true, false, info-queue,
 +  true, info-use_dma_api, info-queue,
  

Re: [RFC]vhost/vhost-net backend for PCI cards

2013-02-27 Thread Ira W. Snyder
On Wed, Feb 27, 2013 at 03:50:54AM -0800, Nikhil Rao wrote:
 
 We are implementing a driver for a PCIe card that runs Linux. This card
 needs virtual network/disk/console devices, so we have reused the
 virtio devices on on the card and provided a host backend that interacts
 with the virtio devices through the card's driver. 
 
 this approach is very much like what was proposed on this thread
 http://permalink.gmane.org/gmane.linux.ports.sh.devel/10379
 
 We will posting the driver soon, so perhaps I am jumping the gun with my
 question below about replacing our backend with vhost.
 
 It is possible for vhost (along with vhost-net in the case of
 virtio-net) to serve as the backend. The copy between virtio buffers and
 skbs happens in the tun/tap driver which means tun/tap may need to use a
 HW DMA engine (the card has one) for copy across the bus to get close to
 the full PCIe bandwidth.
 
 tun/tap was probably never designed for this use case, but reusing vhost
 does simplify our backend since it is now only involved in setup and
 potentially has a performance/memory footprint advantage due to avoiding
 context switches/intermediate buffer copy and this idea can be
 generalized to other cards as well.
 
 Comments/suggestions ?
 
 Thanks,
 Nikhil
 
 ___
 Virtualization mailing list
 Virtualization@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Hi Nikhil,

I don't have any code to offer, but may be able to provide some
suggestions. I work on a system which has a single (x86) host computer,
and many PowerPC data processing boards, which are connected via PCI.
This sounds similar to your hardware.

Our system was developed before vhost existed. I built a (fairly dumb)
network driver that just transfers packets over PCI using the PowerPC
DMA controller. It works, however I think a more generic virtio solution
will work better. A virtio solution will also allow other types of
devices besides a network interface.

I have done some studying of rproc/rpmsg/vhost/vringh, and may have some
suggestions about those pieces of kernel functionality.

A HW DMA engine is absolutely needed to get good performance over the
PCI bus. I don't have experience with PCIe.

You may want to investigate rproc/rpmsg to help do virtio device
discovery.

When dealing with virtio, it may be helpful to think of your PCIe card
as the host. In virtio nomenclature, the host is in charge of
copying data. Your HW DMA engine needs to be controlled by the host.

Your main computer (the computer the PCIe card plugs into) will be a
virtio guest and will run the virtio-net/virtio-console/etc. drivers.

Several vendors have contacted me privately to ask for the code for my
(dumb) network-over-PCI driver. A generic solution to this problem will
definitely find a userbase.

I look forward to porting the code to run on my PowerPC PCI boards when
it becomes available. I am able to help review code as well.

Good luck!
Ira
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC]vhost/vhost-net backend for PCI cards

2013-02-27 Thread Ira W. Snyder
On Wed, Feb 27, 2013 at 04:58:20AM -0800, Nikhil Rao wrote:
 On Wed, 2013-02-27 at 11:17 -0800, Ira W. Snyder wrote:
 
  Hi Nikhil,
  
  I don't have any code to offer, but may be able to provide some
  suggestions. I work on a system which has a single (x86) host computer,
  and many PowerPC data processing boards, which are connected via PCI.
  This sounds similar to your hardware.
  
  Our system was developed before vhost existed. I built a (fairly dumb)
  network driver that just transfers packets over PCI using the PowerPC
  DMA controller. It works, however I think a more generic virtio solution
  will work better. A virtio solution will also allow other types of
  devices besides a network interface.
  
  I have done some studying of rproc/rpmsg/vhost/vringh, and may have some
  suggestions about those pieces of kernel functionality.
  
  A HW DMA engine is absolutely needed to get good performance over the
  PCI bus. I don't have experience with PCIe.
  
  You may want to investigate rproc/rpmsg to help do virtio device
  discovery.
 
  When dealing with virtio, it may be helpful to think of your PCIe card
  as the host. 
 
 We wanted to support a host-based disk, using virtio-blk on the card
 seemed to be a good way to do this, given that the card runs Linux.
 
 also from a performance perspective, which would be better ? [virtio-net
 on the card/backend on the host cpu] v/s [virtio-net on the
 hostcpu/backend on the card] ?  given that the host cpu is more powerful
 than the card cpu.
 

I never considered using virtio-blk, so I don't have any input about it.

I don't know much about virtio performance either. The experts on this
list will have to send their input.

  In virtio nomenclature, the host is in charge of copying data. 
  Your HW DMA engine needs to be controlled by the host.
 
 In our case, the host driver controls the HW DMA engine (a subset of the
 card DMA engine channels are under host control). But you may be
 referring to the case where the host doesn't have access to the card's
 DMA engine.
 

That's right. On our PowerPC cards, the DMA hardware can be controlled
by either the PowerPC processor, or the PCI host system, but not both.
We need the DMA for various operations on the PowerPC card itself, so
the PCI host system cannot be used to control the DMA hardware.

This is also true for other vendors who contacted me privately.

Your PCIe card seems to have better features than any similar systems
I've worked with.

I skimmed the code being used with rpmsg/rproc/virtio on the ARM DSPs on
the OMAP platform. They use the DSP as the virtio host (it copies
memory, performing the same function as vhost) and the OMAP as the
virtio guest (it runs virtio-net/etc.).

Among all virtio guest drivers (virtio-net/virtio-console/etc.), I
think virtio-blk behaves differently from the rest. All of the others
work better with DMA hardware when the PCI card is the virtio host.

You might want to contact Ohad Ben-Cohen for his advice. He did a lot of
work on drivers/rpmsg and drivers/remoteproc. He works with the OMAP
hardware.

Ira

  
  Your main computer (the computer the PCIe card plugs into) will be a
  virtio guest and will run the virtio-net/virtio-console/etc. drivers.
  
  Several vendors have contacted me privately to ask for the code for my
  (dumb) network-over-PCI driver. A generic solution to this problem will
  definitely find a userbase.
  
  I look forward to porting the code to run on my PowerPC PCI boards when
  it becomes available. I am able to help review code as well.
  
  Good luck!
  Ira
 
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 00/02] virtio: Virtio platform driver

2011-03-16 Thread Ira W. Snyder
On Wed, Mar 16, 2011 at 02:17:15PM +0900, Magnus Damm wrote:
 Hi Rusty,
 
 On Wed, Mar 16, 2011 at 12:46 PM, Rusty Russell ru...@rustcorp.com.au wrote:
  On Thu, 10 Mar 2011 16:05:41 +0900, Magnus Damm magnus.d...@gmail.com 
  wrote:
  virtio: Virtio platform driver
 
  [PATCH 01/02] virtio: Break out lguest virtio code to virtio_lguest.c
  [PATCH 02/02] virtio: Add virtio platform driver
 
  I have no problem with these patches, but it's just churn until we see
  your actual drivers.
 
 Well, actually this platform driver is used together with already
 existing drivers, so there are no new virtio drivers to wait for.
 
 The drivers that have been tested are so far:
 
 CONFIG_VIRTIO_CONSOLE=y
 CONFIG_VIRTIO_NET=y
 
 At this point there are four different pieces of code working together
 
 1) Virtio platform driver patches (for guest)
 2) SH4AL-DSP guest kernel patch
 3) ARM UIO driver patches (for host)
 4) User space backing code for ARM based on lguest.c
 
 These patches in this mail thread are 1), and I decided to brush up
 that portion and submit upstream because it's the part that is easiest
 to break out. I intend to post the rest bit by bit over time, but if
 someone is interested then I can post everything at once too.
 

I'm very interested in the full series of patches. I want to do
something similar to talk between two Linux kernels (x86 and PowerPC)
connected by a PCI bus.

Thanks,
Ira

  The S/390 devs might be interested, as their bus is very similar too...
 
 The lguest device code is very similar as well, perhaps it's worth
 refactoring that a bit to build on top of the platform driver. Not
 sure if you see that as a move in the right direction though.
 
 Thanks for your feedback!
 
 / magnus
 ___
 Virtualization mailing list
 Virtualization@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/virtualization
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: virtio over PCI

2010-03-03 Thread Ira W. Snyder
On Wed, Mar 03, 2010 at 05:09:48PM +1100, Michael Ellerman wrote:
 Hi guys,
 
 I was looking around at virtio over PCI stuff and noticed you had
 started some work on a driver. The last I can find via google is v2 from
 mid last year, is that as far as it got?
 
 http://lkml.org/lkml/2009/2/23/353
 

Yep, that is pretty much as far as I got. It was more-or-less rejected
because I hooked two instances of virtio-net together, rather than
having a proper backend and using virtio-net as the frontend.

I got started on writing a backend, which was never posted to LKML
because I never finished it. Feel free to take the code and use it to
start your own project. Note that vhost-net exists now, and is an
in-kernel backend for virtio-net. It *may* be possible to use this,
rather than writing a userspace backend as I started to do.
http://www.mmarray.org/~iws/virtio-phys/

I also got started with the alacrityvm project, developing a driver for
their virtualization framework. That project is nowhere near finished.
The virtualization folks basically told GHaskins (alacrityvm author)
that alacrityvm wouldn't ever make it to mainline Linux.
http://www.mmarray.org/~iws/vbus/

Unfortunately, I've been pulled onto other projects for the time being.
However, I'd really like to be able to use a virtio-over-PCI style
driver, rather than relying on my own custom (slow, unoptimized) network
driver (PCINet).

If you get something mostly working (and mostly agreed upon by the
virtualization guys), I will make the time to test it and get it cleaned
up. I've had 10+ people email me privately about this kind of driver
now. It is an area where Linux is sorely lacking.

I'm happy to provide any help I can, including testing on
MPC8349EA-based system. I would suggest talking to the virtualization
mailing list before you get too deep in the project. They sometimes have
good advice. I've added them to the CC list, so maybe they can comment.
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Good luck, and let me know if I can help.
Ira
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-25 Thread Ira W. Snyder
On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote:
 What it is: vhost net is a character device that can be used to reduce
 the number of system calls involved in virtio networking.
 Existing virtio net code is used in the guest without modification.
 
 There's similarity with vringfd, with some differences and reduced scope
 - uses eventfd for signalling
 - structures can be moved around in memory at any time (good for migration)
 - support memory table and not just an offset (needed for kvm)
 
 common virtio related code has been put in a separate file vhost.c and
 can be made into a separate module if/when more backends appear.  I used
 Rusty's lguest.c as the source for developing this part : this supplied
 me with witty comments I wouldn't be able to write myself.
 
 What it is not: vhost net is not a bus, and not a generic new system
 call. No assumptions are made on how guest performs hypercalls.
 Userspace hypervisors are supported as well as kvm.
 
 How it works: Basically, we connect virtio frontend (configured by
 userspace) to a backend. The backend could be a network device, or a
 tun-like device. In this version I only support raw socket as a backend,
 which can be bound to e.g. SR IOV, or to macvlan device.  Backend is
 also configured by userspace, including vlan/mac etc.
 
 Status:
 This works for me, and I haven't see any crashes.
 I have done some light benchmarking (with v4), compared to userspace, I
 see improved latency (as I save up to 4 system calls per packet) but not
 bandwidth/CPU (as TSO and interrupt mitigation are not supported).  For
 ping benchmark (where there's no TSO) troughput is also improved.
 
 Features that I plan to look at in the future:
 - tap support
 - TSO
 - interrupt mitigation
 - zero copy
 
 Acked-by: Arnd Bergmann a...@arndb.de
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 ---
  MAINTAINERS|   10 +
  arch/x86/kvm/Kconfig   |1 +
  drivers/Makefile   |1 +
  drivers/vhost/Kconfig  |   11 +
  drivers/vhost/Makefile |2 +
  drivers/vhost/net.c|  475 ++
  drivers/vhost/vhost.c  |  688 
 
  drivers/vhost/vhost.h  |  122 
  include/linux/Kbuild   |1 +
  include/linux/miscdevice.h |1 +
  include/linux/vhost.h  |  101 +++
  11 files changed, 1413 insertions(+), 0 deletions(-)
  create mode 100644 drivers/vhost/Kconfig
  create mode 100644 drivers/vhost/Makefile
  create mode 100644 drivers/vhost/net.c
  create mode 100644 drivers/vhost/vhost.c
  create mode 100644 drivers/vhost/vhost.h
  create mode 100644 include/linux/vhost.h
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index b1114cf..de4587f 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -5431,6 +5431,16 @@ S: Maintained
  F:   Documentation/filesystems/vfat.txt
  F:   fs/fat/
  
 +VIRTIO HOST (VHOST)
 +P:   Michael S. Tsirkin
 +M:   m...@redhat.com
 +L:   k...@vger.kernel.org
 +L:   virtualizat...@lists.osdl.org
 +L:   net...@vger.kernel.org
 +S:   Maintained
 +F:   drivers/vhost/
 +F:   include/linux/vhost.h
 +
  VIA RHINE NETWORK DRIVER
  M:   Roger Luethi r...@hellgate.ch
  S:   Maintained
 diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
 index b84e571..94f44d9 100644
 --- a/arch/x86/kvm/Kconfig
 +++ b/arch/x86/kvm/Kconfig
 @@ -64,6 +64,7 @@ config KVM_AMD
  
  # OK, it's a little counter-intuitive to do this, but it puts it neatly under
  # the virtualization menu.
 +source drivers/vhost/Kconfig
  source drivers/lguest/Kconfig
  source drivers/virtio/Kconfig
  
 diff --git a/drivers/Makefile b/drivers/Makefile
 index bc4205d..1551ae1 100644
 --- a/drivers/Makefile
 +++ b/drivers/Makefile
 @@ -105,6 +105,7 @@ obj-$(CONFIG_HID) += hid/
  obj-$(CONFIG_PPC_PS3)+= ps3/
  obj-$(CONFIG_OF) += of/
  obj-$(CONFIG_SSB)+= ssb/
 +obj-$(CONFIG_VHOST_NET)  += vhost/
  obj-$(CONFIG_VIRTIO) += virtio/
  obj-$(CONFIG_VLYNQ)  += vlynq/
  obj-$(CONFIG_STAGING)+= staging/
 diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
 new file mode 100644
 index 000..d955406
 --- /dev/null
 +++ b/drivers/vhost/Kconfig
 @@ -0,0 +1,11 @@
 +config VHOST_NET
 + tristate Host kernel accelerator for virtio net
 + depends on NET  EVENTFD
 + ---help---
 +   This kernel module can be loaded in host kernel to accelerate
 +   guest networking with virtio_net. Not to be confused with virtio_net
 +   module itself which needs to be loaded in guest kernel.
 +
 +   To compile this driver as a module, choose M here: the module will
 +   be called vhost_net.
 +
 diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
 new file mode 100644
 index 000..72dd020
 --- /dev/null
 +++ b/drivers/vhost/Makefile
 @@ -0,0 +1,2 @@
 +obj-$(CONFIG_VHOST_NET) += vhost_net.o
 +vhost_net-y := vhost.o net.o
 diff --git 

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-24 Thread Ira W. Snyder
On Thu, Sep 24, 2009 at 10:18:28AM +0300, Avi Kivity wrote:
 On 09/24/2009 12:15 AM, Gregory Haskins wrote:
 
  There are various aspects about designing high-performance virtual
  devices such as providing the shortest paths possible between the
  physical resources and the consumers.  Conversely, we also need to
  ensure that we meet proper isolation/protection guarantees at the same
  time.  What this means is there are various aspects to any
  high-performance PV design that require to be placed in-kernel to
  maximize the performance yet properly isolate the guest.
 
  For instance, you are required to have your signal-path (interrupts and
  hypercalls), your memory-path (gpa translation), and
  addressing/isolation model in-kernel to maximize performance.
 
 
  Exactly.  That's what vhost puts into the kernel and nothing more.
   
  Actually, no.  Generally, _KVM_ puts those things into the kernel, and
  vhost consumes them.  Without KVM (or something equivalent), vhost is
  incomplete.  One of my goals with vbus is to generalize the something
  equivalent part here.
 
 
 I don't really see how vhost and vbus are different here.  vhost expects 
 signalling to happen through a couple of eventfds and requires someone 
 to supply them and implement kernel support (if needed).  vbus requires 
 someone to write a connector to provide the signalling implementation.  
 Neither will work out-of-the-box when implementing virtio-net over 
 falling dominos, for example.
 
  Vbus accomplishes its in-kernel isolation model by providing a
  container concept, where objects are placed into this container by
  userspace.  The host kernel enforces isolation/protection by using a
  namespace to identify objects that is only relevant within a specific
  container's context (namely, a u32 dev-id).  The guest addresses the
  objects by its dev-id, and the kernel ensures that the guest can't
  access objects outside of its dev-id namespace.
 
 
  vhost manages to accomplish this without any kernel support.
   
  No, vhost manages to accomplish this because of KVMs kernel support
  (ioeventfd, etc).   Without a KVM-like in-kernel support, vhost is a
  merely a kind of tuntap-like clone signalled by eventfds.
 
 
 Without a vbus-connector-falling-dominos, vbus-venet can't do anything 
 either.  Both vhost and vbus need an interface, vhost's is just narrower 
 since it doesn't do configuration or enumeration.
 
  This goes directly to my rebuttal of your claim that vbus places too
  much in the kernel.  I state that, one way or the other, address decode
  and isolation _must_ be in the kernel for performance.  Vbus does this
  with a devid/container scheme.  vhost+virtio-pci+kvm does it with
  pci+pio+ioeventfd.
 
 
 vbus doesn't do kvm guest address decoding for the fast path.  It's 
 still done by ioeventfd.
 
The guest
  simply has not access to any vhost resources other than the guest-host
  doorbell, which is handed to the guest outside vhost (so it's somebody
  else's problem, in userspace).
   
  You mean _controlled_ by userspace, right?  Obviously, the other side of
  the kernel still needs to be programmed (ioeventfd, etc).  Otherwise,
  vhost would be pointless: e.g. just use vanilla tuntap if you don't need
  fast in-kernel decoding.
 
 
 Yes (though for something like level-triggered interrupts we're probably 
 keeping it in userspace, enjoying the benefits of vhost data path while 
 paying more for signalling).
 
  All that is required is a way to transport a message with a devid
  attribute as an address (such as DEVCALL(devid)) and the framework
  provides the rest of the decode+execute function.
 
 
  vhost avoids that.
   
  No, it doesn't avoid it.  It just doesn't specify how its done, and
  relies on something else to do it on its behalf.
 
 
 That someone else can be in userspace, apart from the actual fast path.
 
  Conversely, vbus specifies how its done, but not how to transport the
  verb across the wire.  That is the role of the vbus-connector abstraction.
 
 
 So again, vbus does everything in the kernel (since it's so easy and 
 cheap) but expects a vbus-connector.  vhost does configuration in 
 userspace (since it's so clunky and fragile) but expects a couple of 
 eventfds.
 
  Contrast this to vhost+virtio-pci (called simply vhost from here).
 
 
  It's the wrong name.  vhost implements only the data path.
   
  Understood, but vhost+virtio-pci is what I am contrasting, and I use
  vhost for short from that point on because I am too lazy to type the
  whole name over and over ;)
 
 
 If you #define A A+B+C don't expect intelligent conversation afterwards.
 
  It is not immune to requiring in-kernel addressing support either, but
  rather it just does it differently (and its not as you might expect via
  qemu).
 
  Vhost relies on QEMU to render PCI objects to the guest, which the guest
  assigns resources (such 

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-22 Thread Ira W. Snyder
On Tue, Sep 22, 2009 at 12:43:36PM +0300, Avi Kivity wrote:
 On 09/22/2009 12:43 AM, Ira W. Snyder wrote:
 
  Sure, virtio-ira and he is on his own to make a bus-model under that, or
  virtio-vbus + vbus-ira-connector to use the vbus framework.  Either
  model can work, I agree.
 
   
  Yes, I'm having to create my own bus model, a-la lguest, virtio-pci, and
  virtio-s390. It isn't especially easy. I can steal lots of code from the
  lguest bus model, but sometimes it is good to generalize, especially
  after the fourth implemention or so. I think this is what GHaskins tried
  to do.
 
 
 Yes.  vbus is more finely layered so there is less code duplication.
 
 The virtio layering was more or less dictated by Xen which doesn't have 
 shared memory (it uses grant references instead).  As a matter of fact 
 lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that 
 part is duplicated.  It's probably possible to add a virtio-shmem.ko 
 library that people who do have shared memory can reuse.
 

Seems like a nice benefit of vbus.

  I've given it some thought, and I think that running vhost-net (or
  similar) on the ppc boards, with virtio-net on the x86 crate server will
  work. The virtio-ring abstraction is almost good enough to work for this
  situation, but I had to re-invent it to work with my boards.
 
  I've exposed a 16K region of memory as PCI BAR1 from my ppc board.
  Remember that this is the host system. I used each 4K block as a
  device descriptor which contains:
 
  1) the type of device, config space, etc. for virtio
  2) the desc table (virtio memory descriptors, see virtio-ring)
  3) the avail table (available entries in the desc table)
 
 
 Won't access from x86 be slow to this memory (on the other hand, if you 
 change it to main memory access from ppc will be slow... really depends 
 on how your system is tuned.
 

Writes across the bus are fast, reads across the bus are slow. These are
just the descriptor tables for memory buffers, not the physical memory
buffers themselves.

These only need to be written by the guest (x86), and read by the host
(ppc). The host never changes the tables, so we can cache a copy in the
guest, for a fast detach_buf() implementation (see virtio-ring, which
I'm copying the design from).

The only accesses are writes across the PCI bus. There is never a need
to do a read (except for slow-path configuration).

  Parts 2 and 3 are repeated three times, to allow for a maximum of three
  virtqueues per device. This is good enough for all current drivers.
 
 
 The plan is to switch to multiqueue soon.  Will not affect you if your 
 boards are uniprocessor or small smp.
 

Everything I have is UP. I don't need extreme performance, either.
40MB/sec is the minimum I need to reach, though I'd like to have some
headroom.

For reference, using the CPU to handle data transfers, I get ~2MB/sec
transfers. Using the DMA engine, I've hit about 60MB/sec with my
crossed-wires virtio-net.

  I've gotten plenty of email about this from lots of interested
  developers. There are people who would like this kind of system to just
  work, while having to write just some glue for their device, just like a
  network driver. I hunch most people have created some proprietary mess
  that basically works, and left it at that.
 
 
 So long as you keep the system-dependent features hookable or 
 configurable, it should work.
 
  So, here is a desperate cry for help. I'd like to make this work, and
  I'd really like to see it in mainline. I'm trying to give back to the
  community from which I've taken plenty.
 
 
 Not sure who you're crying for help to.  Once you get this working, post 
 patches.  If the patches are reasonably clean and don't impact 
 performance for the main use case, and if you can show the need, I 
 expect they'll be merged.
 

In the spirit of post early and often, I'm making my code available,
that's all. I'm asking anyone interested for some review, before I have
to re-code this for about the fifth time now. I'm trying to avoid
Haskins' situation, where he's invented and debugged a lot of new code,
and then been told to do it completely differently.

Yes, the code I posted is only compile-tested, because quite a lot of
code (kernel and userspace) must be working before anything works at
all. I hate to design the whole thing, then be told that something
fundamental about it is wrong, and have to completely re-write it.

Thanks for the comments,
Ira

 -- 
 error compiling committee.c: too many arguments to function
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-21 Thread Ira W. Snyder
On Wed, Sep 16, 2009 at 11:11:57PM -0400, Gregory Haskins wrote:
 Avi Kivity wrote:
  On 09/16/2009 10:22 PM, Gregory Haskins wrote:
  Avi Kivity wrote:

  On 09/16/2009 05:10 PM, Gregory Haskins wrote:
  
  If kvm can do it, others can.
 
   
  The problem is that you seem to either hand-wave over details like
  this,
  or you give details that are pretty much exactly what vbus does
  already.
 My point is that I've already sat down and thought about these
  issues
  and solved them in a freely available GPL'ed software package.
 
 
  In the kernel.  IMO that's the wrong place for it.
   
  3) in-kernel: You can do something like virtio-net to vhost to
  potentially meet some of the requirements, but not all.
 
  In order to fully meet (3), you would need to do some of that stuff you
  mentioned in the last reply with muxing device-nr/reg-nr.  In addition,
  we need to have a facility for mapping eventfds and establishing a
  signaling mechanism (like PIO+qid), etc. KVM does this with
  IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be
  invented.
 
  
  irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted.
 
 Not per se, but it needs to be interfaced.  How do I register that
 eventfd with the fastpath in Ira's rig? How do I signal the eventfd
 (x86-ppc, and ppc-x86)?
 

Sorry to reply so late to this thread, I've been on vacation for the
past week. If you'd like to continue in another thread, please start it
and CC me.

On the PPC, I've got a hardware doorbell register which generates 30
distiguishable interrupts over the PCI bus. I have outbound and inbound
registers, which can be used to signal the other side.

I assume it isn't too much code to signal an eventfd in an interrupt
handler. I haven't gotten to this point in the code yet.

 To take it to the next level, how do I organize that mechanism so that
 it works for more than one IO-stream (e.g. address the various queues
 within ethernet or a different device like the console)?  KVM has
 IOEVENTFD and IRQFD managed with MSI and PIO.  This new rig does not
 have the luxury of an established IO paradigm.
 
 Is vbus the only way to implement a solution?  No.  But it is _a_ way,
 and its one that was specifically designed to solve this very problem
 (as well as others).
 
 (As an aside, note that you generally will want an abstraction on top of
 irqfd/eventfd like shm-signal or virtqueues to do shared-memory based
 event mitigation, but I digress.  That is a separate topic).
 
  
  To meet performance, this stuff has to be in kernel and there has to be
  a way to manage it.
  
  and management belongs in userspace.
 
 vbus does not dictate where the management must be.  Its an extensible
 framework, governed by what you plug into it (ala connectors and devices).
 
 For instance, the vbus-kvm connector in alacrityvm chooses to put DEVADD
 and DEVDROP hotswap events into the interrupt stream, because they are
 simple and we already needed the interrupt stream anyway for fast-path.
 
 As another example: venet chose to put -call(MACQUERY) config-space
 into its call namespace because its simple, and we already need
 -calls() for fastpath.  It therefore exports an attribute to sysfs that
 allows the management app to set it.
 
 I could likewise have designed the connector or device-model differently
 as to keep the mac-address and hotswap-events somewhere else (QEMU/PCI
 userspace) but this seems silly to me when they are so trivial, so I didn't.
 
  
  Since vbus was designed to do exactly that, this is
  what I would advocate.  You could also reinvent these concepts and put
  your own mux and mapping code in place, in addition to all the other
  stuff that vbus does.  But I am not clear why anyone would want to.
 
  
  Maybe they like their backward compatibility and Windows support.
 
 This is really not relevant to this thread, since we are talking about
 Ira's hardware.  But if you must bring this up, then I will reiterate
 that you just design the connector to interface with QEMU+PCI and you
 have that too if that was important to you.
 
 But on that topic: Since you could consider KVM a motherboard
 manufacturer of sorts (it just happens to be virtual hardware), I don't
 know why KVM seems to consider itself the only motherboard manufacturer
 in the world that has to make everything look legacy.  If a company like
 ASUS wants to add some cutting edge IO controller/bus, they simply do
 it.  Pretty much every product release may contain a different array of
 devices, many of which are not backwards compatible with any prior
 silicon.  The guy/gal installing Windows on that system may see a ? in
 device-manager until they load a driver that supports the new chip, and
 subsequently it works.  It is certainly not a requirement to make said
 chip somehow work with existing drivers/facilities on bare metal, per
 se.  Why should virtual systems be different?
 
 So, yeah, the 

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-03 Thread Ira W. Snyder
On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote:
 What it is: vhost net is a character device that can be used to reduce
 the number of system calls involved in virtio networking.
 Existing virtio net code is used in the guest without modification.
 
 There's similarity with vringfd, with some differences and reduced scope
 - uses eventfd for signalling
 - structures can be moved around in memory at any time (good for migration)
 - support memory table and not just an offset (needed for kvm)
 
 common virtio related code has been put in a separate file vhost.c and
 can be made into a separate module if/when more backends appear.  I used
 Rusty's lguest.c as the source for developing this part : this supplied
 me with witty comments I wouldn't be able to write myself.
 
 What it is not: vhost net is not a bus, and not a generic new system
 call. No assumptions are made on how guest performs hypercalls.
 Userspace hypervisors are supported as well as kvm.
 
 How it works: Basically, we connect virtio frontend (configured by
 userspace) to a backend. The backend could be a network device, or a
 tun-like device. In this version I only support raw socket as a backend,
 which can be bound to e.g. SR IOV, or to macvlan device.  Backend is
 also configured by userspace, including vlan/mac etc.
 
 Status:
 This works for me, and I haven't see any crashes.
 I have done some light benchmarking (with v4), compared to userspace, I
 see improved latency (as I save up to 4 system calls per packet) but not
 bandwidth/CPU (as TSO and interrupt mitigation are not supported).  For
 ping benchmark (where there's no TSO) troughput is also improved.
 
 Features that I plan to look at in the future:
 - tap support
 - TSO
 - interrupt mitigation
 - zero copy
 

Hello Michael,

I've started looking at vhost with the intention of using it over PCI to
connect physical machines together.

The part that I am struggling with the most is figuring out which parts
of the rings are in the host's memory, and which parts are in the
guest's memory.

If I understand everything correctly, the rings are all userspace
addresses, which means that they can be moved around in physical memory,
and get pushed out to swap. AFAIK, this is impossible to handle when
connecting two physical systems, you'd need the rings available in IO
memory (PCI memory), so you can ioreadXX() them instead. To the best of
my knowledge, I shouldn't be using copy_to_user() on an __iomem address.
Also, having them migrate around in memory would be a bad thing.

Also, I'm having trouble figuring out how the packet contents are
actually copied from one system to the other. Could you point this out
for me?

Is there somewhere I can find the userspace code (kvm, qemu, lguest,
etc.) code needed for interacting with the vhost misc device so I can
get a better idea of how userspace is supposed to work? (Features
negotiation, etc.)

Thanks,
Ira

 Acked-by: Arnd Bergmann a...@arndb.de
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 ---
  MAINTAINERS|   10 +
  arch/x86/kvm/Kconfig   |1 +
  drivers/Makefile   |1 +
  drivers/vhost/Kconfig  |   11 +
  drivers/vhost/Makefile |2 +
  drivers/vhost/net.c|  475 ++
  drivers/vhost/vhost.c  |  688 
 
  drivers/vhost/vhost.h  |  122 
  include/linux/Kbuild   |1 +
  include/linux/miscdevice.h |1 +
  include/linux/vhost.h  |  101 +++
  11 files changed, 1413 insertions(+), 0 deletions(-)
  create mode 100644 drivers/vhost/Kconfig
  create mode 100644 drivers/vhost/Makefile
  create mode 100644 drivers/vhost/net.c
  create mode 100644 drivers/vhost/vhost.c
  create mode 100644 drivers/vhost/vhost.h
  create mode 100644 include/linux/vhost.h
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index b1114cf..de4587f 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -5431,6 +5431,16 @@ S: Maintained
  F:   Documentation/filesystems/vfat.txt
  F:   fs/fat/
  
 +VIRTIO HOST (VHOST)
 +P:   Michael S. Tsirkin
 +M:   m...@redhat.com
 +L:   k...@vger.kernel.org
 +L:   virtualizat...@lists.osdl.org
 +L:   net...@vger.kernel.org
 +S:   Maintained
 +F:   drivers/vhost/
 +F:   include/linux/vhost.h
 +
  VIA RHINE NETWORK DRIVER
  M:   Roger Luethi r...@hellgate.ch
  S:   Maintained
 diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
 index b84e571..94f44d9 100644
 --- a/arch/x86/kvm/Kconfig
 +++ b/arch/x86/kvm/Kconfig
 @@ -64,6 +64,7 @@ config KVM_AMD
  
  # OK, it's a little counter-intuitive to do this, but it puts it neatly under
  # the virtualization menu.
 +source drivers/vhost/Kconfig
  source drivers/lguest/Kconfig
  source drivers/virtio/Kconfig
  
 diff --git a/drivers/Makefile b/drivers/Makefile
 index bc4205d..1551ae1 100644
 --- a/drivers/Makefile
 +++ b/drivers/Makefile
 @@ -105,6 +105,7 @@ obj-$(CONFIG_HID) += hid/
  obj-$(CONFIG_PPC_PS3)   

Re: [PATCH 2/2] vhost_net: a kernel-level virtio server

2009-08-13 Thread Ira W. Snyder
On Wed, Aug 12, 2009 at 08:31:04PM +0300, Michael S. Tsirkin wrote:
 On Wed, Aug 12, 2009 at 10:19:22AM -0700, Ira W. Snyder wrote:

[ snip out code ]

   
   We discussed this before, and I still think this could be directly derived
   from struct virtqueue, in the same way that vring_virtqueue is derived 
   from
   struct virtqueue. That would make it possible for simple device drivers
   to use the same driver in both host and guest, similar to how Ira Snyder
   used virtqueues to make virtio_net run between two hosts running the
   same code [1].
   
   Ideally, I guess you should be able to even make virtio_net work in the
   host if you do that, but that could bring other complexities.
  
  I have no comments about the vhost code itself, I haven't reviewed it.
  
  It might be interesting to try using a virtio-net in the host kernel to
  communicate with the virtio-net running in the guest kernel. The lack of
  a management interface is the biggest problem you will face (setting MAC
  addresses, negotiating features, etc. doesn't work intuitively).
 
 That was one of the reasons I decided to move most of code out to
 userspace. My kernel driver only handles datapath,
 it's much smaller than virtio net.
 
  Getting
  the network interfaces talking is relatively easy.
  
  Ira
 
 Tried this, but
 - guest memory isn't pinned, so copy_to_user
   to access it, errors need to be handled in a sane way
 - used/available roles are reversed
 - kick/interrupt roles are reversed
 
 So most of the code then looks like
 
   if (host) {
   } else {
   }
   return
 
 
 The only common part is walking the descriptor list,
 but that's like 10 lines of code.
 
 At which point it's better to keep host/guest code separate, IMO.
 

Ok, that makes sense. Let me see if I understand the concept of the
driver. Here's a picture of what makes sense to me:

guest system
-
| userspace applications|
-
| kernel network stack  |
-
| virtio-net|
-
| transport (virtio-ring, etc.) |
-
   |
   |
-
| transport (virtio-ring, etc.) |
-
| some driver (maybe vhost?)| -- [1]
-
| kernel network stack  |
-
host system

From the host's network stack, packets can be forwarded out to the
physical network, or be consumed by a normal userspace application on
the host. Just as if this were any other network interface.

In my patch, [1] was the virtio-net driver, completely unmodified.

So, does this patch accomplish the above diagram? If so, why the
copy_to_user(), etc? Maybe I'm confusing this with my system, where the
guest is another physical system, separated by the PCI bus.

Ira
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] vhost_net: a kernel-level virtio server

2009-08-13 Thread Ira W. Snyder
On Wed, Aug 12, 2009 at 07:03:22PM +0200, Arnd Bergmann wrote:
 On Monday 10 August 2009, Michael S. Tsirkin wrote:
 
  +struct workqueue_struct *vhost_workqueue;
 
 [nitpicking] This could be static. 
 
  +/* The virtqueue structure describes a queue attached to a device. */
  +struct vhost_virtqueue {
  +   struct vhost_dev *dev;
  +
  +   /* The actual ring of buffers. */
  +   struct mutex mutex;
  +   unsigned int num;
  +   struct vring_desc __user *desc;
  +   struct vring_avail __user *avail;
  +   struct vring_used __user *used;
  +   struct file *kick;
  +   struct file *call;
  +   struct file *error;
  +   struct eventfd_ctx *call_ctx;
  +   struct eventfd_ctx *error_ctx;
  +
  +   struct vhost_poll poll;
  +
  +   /* The routine to call when the Guest pings us, or timeout. */
  +   work_func_t handle_kick;
  +
  +   /* Last available index we saw. */
  +   u16 last_avail_idx;
  +
  +   /* Last index we used. */
  +   u16 last_used_idx;
  +
  +   /* Outstanding buffers */
  +   unsigned int inflight;
  +
  +   /* Is this blocked? */
  +   bool blocked;
  +
  +   struct iovec iov[VHOST_NET_MAX_SG];
  +
  +} cacheline_aligned;
 
 We discussed this before, and I still think this could be directly derived
 from struct virtqueue, in the same way that vring_virtqueue is derived from
 struct virtqueue. That would make it possible for simple device drivers
 to use the same driver in both host and guest, similar to how Ira Snyder
 used virtqueues to make virtio_net run between two hosts running the
 same code [1].
 
 Ideally, I guess you should be able to even make virtio_net work in the
 host if you do that, but that could bring other complexities.

I have no comments about the vhost code itself, I haven't reviewed it.

It might be interesting to try using a virtio-net in the host kernel to
communicate with the virtio-net running in the guest kernel. The lack of
a management interface is the biggest problem you will face (setting MAC
addresses, negotiating features, etc. doesn't work intuitively). Getting
the network interfaces talking is relatively easy.

Ira
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization