Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-10-03 Thread Avi Kivity
On 10/01/2009 09:24 PM, Gregory Haskins wrote:
>
>> Virtualization is about not doing that.  Sometimes it's necessary (when
>> you have made unfixable design mistakes), but just to replace a bus,
>> with no advantages to the guest that has to be changed (other
>> hypervisors or hypervisorless deployment scenarios aren't).
>>  
> The problem is that your continued assertion that there is no advantage
> to the guest is a completely unsubstantiated claim.  As it stands right
> now, I have a public git tree that, to my knowledge, is the fastest KVM
> PV networking implementation around.  It also has capabilities that are
> demonstrably not found elsewhere, such as the ability to render generic
> shared-memory interconnects (scheduling, timers), interrupt-priority
> (qos), and interrupt-coalescing (exit-ratio reduction).  I designed each
> of these capabilities after carefully analyzing where KVM was coming up
> short.
>
> Those are facts.
>
> I can't easily prove which of my new features alone are what makes it
> special per se, because I don't have unit tests for each part that
> breaks it down.  What I _can_ state is that its the fastest and most
> feature rich KVM-PV tree that I am aware of, and others may download and
> test it themselves to verify my claims.
>

If you wish to introduce a feature which has downsides (and to me, vbus 
has downsides) then you must prove it is necessary on its own merits.  
venet is pretty cool but I need proof before I believe its performance 
is due to vbus and not to venet-host.

> The disproof, on the other hand, would be in a counter example that
> still meets all the performance and feature criteria under all the same
> conditions while maintaining the existing ABI.  To my knowledge, this
> doesn't exist.
>

mst is working on it and we should have it soon.

> Therefore, if you believe my work is irrelevant, show me a git tree that
> accomplishes the same feats in a binary compatible way, and I'll rethink
> my position.  Until then, complaining about lack of binary compatibility
> is pointless since it is not an insurmountable proposition, and the one
> and only available solution declares it a required casualty.
>

Fine, let's defer it until vhost-net is up and running.

>> Well, Xen requires pre-translation (since the guest has to give the host
>> (which is just another guest) permissions to access the data).
>>  
> Actually I am not sure that it does require pre-translation.  You might
> be able to use the memctx->copy_to/copy_from scheme in post translation
> as well, since those would be able to communicate to something like the
> xen kernel.  But I suppose either method would result in extra exits, so
> there is no distinct benefit using vbus there..as you say below "they're
> just different".
>
> The biggest difference is that my proposed model gets around the notion
> that the entire guest address space can be represented by an arbitrary
> pointer.  For instance, the copy_to/copy_from routines take a GPA, but
> may use something indirect like a DMA controller to access that GPA.  On
> the other hand, virtio fully expects a viable pointer to come out of the
> interface iiuc.  This is in part what makes vbus more adaptable to non-virt.
>

No, virtio doesn't expect a pointer (this is what makes Xen possible).  
vhost does; but nothing prevents an interested party from adapting it.

>>> An interesting thing here is that you don't even need a fancy
>>> multi-homed setup to see the effects of my exit-ratio reduction work:
>>> even single port configurations suffer from the phenomenon since many
>>> devices have multiple signal-flows (e.g. network adapters tend to have
>>> at least 3 flows: rx-ready, tx-complete, and control-events (link-state,
>>> etc).  Whats worse, is that the flows often are indirectly related (for
>>> instance, many host adapters will free tx skbs during rx operations, so
>>> you tend to get bursts of tx-completes at the same time as rx-ready.  If
>>> the flows map 1:1 with IDT, they will suffer the same problem.
>>>
>>>
>> You can simply use the same vector for both rx and tx and poll both at
>> every interrupt.
>>  
> Yes, but that has its own problems: e.g. additional exits or at least
> additional overhead figuring out what happens each time.

If you're just coalescing tx and rx, it's an additional memory read 
(which you have anyway in the vbus interrupt queue).

> This is even
> more important as we scale out to MQ which may have dozens of queue
> pairs.  You really want finer grained signal-path decode if you want
> peak performance.
>

MQ definitely wants per-queue or per-queue-pair vectors, and it 
definitely doesn't want all interrupts to be serviced by a single 
interrupt queue (you could/should make the queue per-vcpu).

>>> Its important to note here that we are actually looking at the interrupt
>>> rate, not the exit rate (which is usually a multiple of the interrupt
>>> rate, since you have to 

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-10-01 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/30/2009 10:04 PM, Gregory Haskins wrote:
> 
> 
>>> A 2.6.27 guest, or Windows guest with the existing virtio drivers,
>>> won't work
>>> over vbus.
>>>  
>> Binary compatibility with existing virtio drivers, while nice to have,
>> is not a specific requirement nor goal.  We will simply load an updated
>> KMP/MSI into those guests and they will work again.  As previously
>> discussed, this is how more or less any system works today.  It's like
>> we are removing an old adapter card and adding a new one to "uprev the
>> silicon".
>>
> 
> Virtualization is about not doing that.  Sometimes it's necessary (when
> you have made unfixable design mistakes), but just to replace a bus,
> with no advantages to the guest that has to be changed (other
> hypervisors or hypervisorless deployment scenarios aren't).

The problem is that your continued assertion that there is no advantage
to the guest is a completely unsubstantiated claim.  As it stands right
now, I have a public git tree that, to my knowledge, is the fastest KVM
PV networking implementation around.  It also has capabilities that are
demonstrably not found elsewhere, such as the ability to render generic
shared-memory interconnects (scheduling, timers), interrupt-priority
(qos), and interrupt-coalescing (exit-ratio reduction).  I designed each
of these capabilities after carefully analyzing where KVM was coming up
short.

Those are facts.

I can't easily prove which of my new features alone are what makes it
special per se, because I don't have unit tests for each part that
breaks it down.  What I _can_ state is that its the fastest and most
feature rich KVM-PV tree that I am aware of, and others may download and
test it themselves to verify my claims.

The disproof, on the other hand, would be in a counter example that
still meets all the performance and feature criteria under all the same
conditions while maintaining the existing ABI.  To my knowledge, this
doesn't exist.

Therefore, if you believe my work is irrelevant, show me a git tree that
accomplishes the same feats in a binary compatible way, and I'll rethink
my position.  Until then, complaining about lack of binary compatibility
is pointless since it is not an insurmountable proposition, and the one
and only available solution declares it a required casualty.

> 
>>>   Further, non-shmem virtio can't work over vbus.
>>>  
>> Actually I misspoke earlier when I said virtio works over non-shmem.
>> Thinking about it some more, both virtio and vbus fundamentally require
>> shared-memory, since sharing their metadata concurrently on both sides
>> is their raison d'être.
>>
>> The difference is that virtio utilizes a pre-translation/mapping (via
>> ->add_buf) from the guest side.  OTOH, vbus uses a post translation
>> scheme (via memctx) from the host-side.  If anything, vbus is actually
>> more flexible because it doesn't assume the entire guest address space
>> is directly mappable.
>>
>> In summary, your statement is incorrect (though it is my fault for
>> putting that idea in your head).
>>
> 
> Well, Xen requires pre-translation (since the guest has to give the host
> (which is just another guest) permissions to access the data).

Actually I am not sure that it does require pre-translation.  You might
be able to use the memctx->copy_to/copy_from scheme in post translation
as well, since those would be able to communicate to something like the
xen kernel.  But I suppose either method would result in extra exits, so
there is no distinct benefit using vbus there..as you say below "they're
just different".

The biggest difference is that my proposed model gets around the notion
that the entire guest address space can be represented by an arbitrary
pointer.  For instance, the copy_to/copy_from routines take a GPA, but
may use something indirect like a DMA controller to access that GPA.  On
the other hand, virtio fully expects a viable pointer to come out of the
interface iiuc.  This is in part what makes vbus more adaptable to non-virt.

> So neither is a superset of the other, they're just different.
> 
> It doesn't really matter since Xen is unlikely to adopt virtio.

Agreed.

> 
>> An interesting thing here is that you don't even need a fancy
>> multi-homed setup to see the effects of my exit-ratio reduction work:
>> even single port configurations suffer from the phenomenon since many
>> devices have multiple signal-flows (e.g. network adapters tend to have
>> at least 3 flows: rx-ready, tx-complete, and control-events (link-state,
>> etc).  Whats worse, is that the flows often are indirectly related (for
>> instance, many host adapters will free tx skbs during rx operations, so
>> you tend to get bursts of tx-completes at the same time as rx-ready.  If
>> the flows map 1:1 with IDT, they will suffer the same problem.
>>
> 
> You can simply use the same vector for both rx and tx and poll both at
> every interrupt.

Yes, but that has its own problems: e.

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-10-01 Thread Michael S. Tsirkin
On Thu, Oct 01, 2009 at 10:34:17AM +0200, Avi Kivity wrote:
>> Second, I do not use ioeventfd anymore because it has too many problems
>> with the surrounding technology.  However, that is a topic for a
>> different thread.
>>
>
> Please post your issues.  I see ioeventfd/irqfd as critical kvm interfaces.

I second that. AFAIK ioeventfd/irqfd got exposed to userspace in 2.6.32-rc1,
if there are issues we better nail them before 2.6.32 is out.
And yes, please start a different thread.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-10-01 Thread Avi Kivity
On 09/30/2009 10:04 PM, Gregory Haskins wrote:


>> A 2.6.27 guest, or Windows guest with the existing virtio drivers, won't work
>> over vbus.
>>  
> Binary compatibility with existing virtio drivers, while nice to have,
> is not a specific requirement nor goal.  We will simply load an updated
> KMP/MSI into those guests and they will work again.  As previously
> discussed, this is how more or less any system works today.  It's like
> we are removing an old adapter card and adding a new one to "uprev the
> silicon".
>

Virtualization is about not doing that.  Sometimes it's necessary (when 
you have made unfixable design mistakes), but just to replace a bus, 
with no advantages to the guest that has to be changed (other 
hypervisors or hypervisorless deployment scenarios aren't).

>>   Further, non-shmem virtio can't work over vbus.
>>  
> Actually I misspoke earlier when I said virtio works over non-shmem.
> Thinking about it some more, both virtio and vbus fundamentally require
> shared-memory, since sharing their metadata concurrently on both sides
> is their raison d'être.
>
> The difference is that virtio utilizes a pre-translation/mapping (via
> ->add_buf) from the guest side.  OTOH, vbus uses a post translation
> scheme (via memctx) from the host-side.  If anything, vbus is actually
> more flexible because it doesn't assume the entire guest address space
> is directly mappable.
>
> In summary, your statement is incorrect (though it is my fault for
> putting that idea in your head).
>

Well, Xen requires pre-translation (since the guest has to give the host 
(which is just another guest) permissions to access the data).  So 
neither is a superset of the other, they're just different.

It doesn't really matter since Xen is unlikely to adopt virtio.

> An interesting thing here is that you don't even need a fancy
> multi-homed setup to see the effects of my exit-ratio reduction work:
> even single port configurations suffer from the phenomenon since many
> devices have multiple signal-flows (e.g. network adapters tend to have
> at least 3 flows: rx-ready, tx-complete, and control-events (link-state,
> etc).  Whats worse, is that the flows often are indirectly related (for
> instance, many host adapters will free tx skbs during rx operations, so
> you tend to get bursts of tx-completes at the same time as rx-ready.  If
> the flows map 1:1 with IDT, they will suffer the same problem.
>

You can simply use the same vector for both rx and tx and poll both at 
every interrupt.

> In any case, here is an example run of a simple single-homed guest over
> standard GigE.  Whats interesting here is that .qnotify to .notify
> ratio, as this is the interrupt-to-signal ratio.  In this case, its
> 170047/151918, which comes out to about 11% savings in interrupt injections:
>
> vbus-guest:/home/ghaskins # netperf -H dev
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> dev.laurelwood.net (192.168.1.10) port 0 AF_INET
> Recv   SendSend
> Socket Socket  Message  Elapsed
> Size   SizeSize Time Throughput
> bytes  bytes   bytessecs.10^6bits/sec
>
> 1048576  16384  1638410.01 940.77
> vbus-guest:/home/ghaskins # cat /sys/kernel/debug/pci-to-vbus-bridge
>.events: 170048
>.qnotify   : 151918
>.qinject   : 0
>.notify: 170047
>.inject: 18238
>.bridgecalls   : 18
>.buscalls  : 12
> vbus-guest:/home/ghaskins # cat /proc/interrupts
>  CPU0
> 0: 87   IO-APIC-edge  timer
> 1:  6   IO-APIC-edge  i8042
> 4:733   IO-APIC-edge  serial
> 6:  2   IO-APIC-edge  floppy
> 7:  0   IO-APIC-edge  parport0
> 8:  0   IO-APIC-edge  rtc0
> 9:  0   IO-APIC-fasteoi   acpi
>10:  0   IO-APIC-fasteoi   virtio1
>12: 90   IO-APIC-edge  i8042
>14:   3041   IO-APIC-edge  ata_piix
>15:   1008   IO-APIC-edge  ata_piix
>24: 151933   PCI-MSI-edge  vbus
>25:  0   PCI-MSI-edge  virtio0-config
>26:190   PCI-MSI-edge  virtio0-input
>27: 28   PCI-MSI-edge  virtio0-output
>   NMI:  0   Non-maskable interrupts
>   LOC:   9854   Local timer interrupts
>   SPU:  0   Spurious interrupts
>   CNT:  0   Performance counter interrupts
>   PND:  0   Performance pending work
>   RES:  0   Rescheduling interrupts
>   CAL:  0   Function call interrupts
>   TLB:  0   TLB shootdowns
>   TRM:  0   Thermal event interrupts
>   THR:  0   Threshold APIC interrupts
>   MCE:  0   Machine check exceptions
>   MCP:  1   Machine check polls
>   ERR:  0
>   MIS:  0
>
> Its important to note here that we are actually loo

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-30 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/26/2009 12:32 AM, Gregory Haskins wrote:

 I realize in retrospect that my choice of words above implies vbus _is_
 complete, but this is not what I was saying.  What I was trying to
 convey is that vbus is _more_ complete.  Yes, in either case some kind
 of glue needs to be written.  The difference is that vbus implements
 more of the glue generally, and leaves less required to be customized
 for each iteration.


>>>
>>> No argument there.  Since you care about non-virt scenarios and virtio
>>> doesn't, naturally vbus is a better fit for them as the code stands.
>>>  
>> Thanks for finally starting to acknowledge there's a benefit, at least.
>>
> 
> I think I've mentioned vbus' finer grained layers as helpful here,
> though I doubt the value of this.  Hypervisors are added rarely, while
> devices and drivers are added (and modified) much more often.  I don't
> buy the anything-to-anything promise.

The ease in which a new hypervisor should be able to integrate into the
stack is only one of vbus's many benefits.

> 
>> To be more precise, IMO virtio is designed to be a performance oriented
>> ring-based driver interface that supports all types of hypervisors (e.g.
>> shmem based kvm, and non-shmem based Xen).  vbus is designed to be a
>> high-performance generic shared-memory interconnect (for rings or
>> otherwise) framework for environments where linux is the underpinning
>> "host" (physical or virtual).  They are distinctly different, but
>> complementary (the former addresses the part of the front-end, and
>> latter addresses the back-end, and a different part of the front-end).
>>
> 
> They're not truly complementary since they're incompatible.

No, that is incorrect.  Not to be rude, but for clarity:

  Complementary \Com`ple*men"ta*ry\, a.
 Serving to fill out or to complete; as, complementary
 numbers.
 [1913 Webster]

Citation: www.dict.org

IOW: Something being complementary has nothing to do with guest/host
binary compatibility.  virtio-pci and virtio-vbus are both equally
complementary to virtio since they fill in the bottom layer of the
virtio stack.

So yes, vbus is truly complementary to virtio afaict.

> A 2.6.27 guest, or Windows guest with the existing virtio drivers, won't work
> over vbus.

Binary compatibility with existing virtio drivers, while nice to have,
is not a specific requirement nor goal.  We will simply load an updated
KMP/MSI into those guests and they will work again.  As previously
discussed, this is how more or less any system works today.  It's like
we are removing an old adapter card and adding a new one to "uprev the
silicon".

>  Further, non-shmem virtio can't work over vbus.

Actually I misspoke earlier when I said virtio works over non-shmem.
Thinking about it some more, both virtio and vbus fundamentally require
shared-memory, since sharing their metadata concurrently on both sides
is their raison d'être.

The difference is that virtio utilizes a pre-translation/mapping (via
->add_buf) from the guest side.  OTOH, vbus uses a post translation
scheme (via memctx) from the host-side.  If anything, vbus is actually
more flexible because it doesn't assume the entire guest address space
is directly mappable.

In summary, your statement is incorrect (though it is my fault for
putting that idea in your head).

>  Since
> virtio is guest-oriented and host-agnostic, it can't ignore
> non-shared-memory hosts (even though it's unlikely virtio will be
> adopted there)

Well, to be fair no one said it has to ignore them.  Either virtio-vbus
transport is present and available to the virtio stack, or it isn't.  If
its present, it may or may not publish objects for consumption.
Providing a virtio-vbus transport in no way limits or degrades the
existing capabilities of the virtio stack.  It only enhances them.

I digress.  The whole point is moot since I realized that the non-shmem
distinction isn't accurate anyway.  They both require shared-memory for
the metadata, and IIUC virtio requires the entire address space to be
mappable whereas vbus only assumes the metadata is.

> 
>> In addition, the kvm-connector used in AlacrityVM's design strives to
>> add value and improve performance via other mechanisms, such as dynamic
>>   allocation, interrupt coalescing (thus reducing exit-ratio, which is a
>> serious issue in KVM)
> 
> Do you have measurements of inter-interrupt coalescing rates (excluding
> intra-interrupt coalescing).

I actually do not have a rig setup to explicitly test inter-interrupt
rates at the moment.  Once things stabilize for me, I will try to
re-gather some numbers here.  Last time I looked, however, there were
some decent savings for inter as well.

Inter rates are interesting because they are what tends to ramp up with
IO load more than intra since guest interrupt mitigation techniques like
NAPI often quell intra-rates naturally.  This is especially true for
data-cent

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-27 Thread Avi Kivity
On 09/26/2009 12:32 AM, Gregory Haskins wrote:
>>>
>>> I realize in retrospect that my choice of words above implies vbus _is_
>>> complete, but this is not what I was saying.  What I was trying to
>>> convey is that vbus is _more_ complete.  Yes, in either case some kind
>>> of glue needs to be written.  The difference is that vbus implements
>>> more of the glue generally, and leaves less required to be customized
>>> for each iteration.
>>>
>>>
>>
>> No argument there.  Since you care about non-virt scenarios and virtio
>> doesn't, naturally vbus is a better fit for them as the code stands.
>>  
> Thanks for finally starting to acknowledge there's a benefit, at least.
>

I think I've mentioned vbus' finer grained layers as helpful here, 
though I doubt the value of this.  Hypervisors are added rarely, while 
devices and drivers are added (and modified) much more often.  I don't 
buy the anything-to-anything promise.

> To be more precise, IMO virtio is designed to be a performance oriented
> ring-based driver interface that supports all types of hypervisors (e.g.
> shmem based kvm, and non-shmem based Xen).  vbus is designed to be a
> high-performance generic shared-memory interconnect (for rings or
> otherwise) framework for environments where linux is the underpinning
> "host" (physical or virtual).  They are distinctly different, but
> complementary (the former addresses the part of the front-end, and
> latter addresses the back-end, and a different part of the front-end).
>

They're not truly complementary since they're incompatible.  A 2.6.27 
guest, or Windows guest with the existing virtio drivers, won't work 
over vbus.  Further, non-shmem virtio can't work over vbus.  Since 
virtio is guest-oriented and host-agnostic, it can't ignore 
non-shared-memory hosts (even though it's unlikely virtio will be 
adopted there).

> In addition, the kvm-connector used in AlacrityVM's design strives to
> add value and improve performance via other mechanisms, such as dynamic
>   allocation, interrupt coalescing (thus reducing exit-ratio, which is a
> serious issue in KVM)

Do you have measurements of inter-interrupt coalescing rates (excluding 
intra-interrupt coalescing).

> and priortizable/nestable signals.
>

That doesn't belong in a bus.

> Today there is a large performance disparity between what a KVM guest
> sees and what a native linux application sees on that same host.  Just
> take a look at some of my graphs between "virtio", and "native", for
> example:
>
> http://developer.novell.com/wiki/images/b/b7/31-rc4_throughput.png
>

That's a red herring.  The problem is not with virtio as an ABI, but 
with its implementation in userspace.  vhost-net should offer equivalent 
performance to vbus.

> A dominant vbus design principle is to try to achieve the same IO
> performance for all "linux applications" whether they be literally
> userspace applications, or things like KVM vcpus or Ira's physical
> boards.  It also aims to solve problems not previously expressible with
> current technologies (even virtio), like nested real-time.
>
> And even though you repeatedly insist otherwise, the neat thing here is
> that the two technologies mesh (at least under certain circumstances,
> like when virtio is deployed on a shared-memory friendly linux backend
> like KVM).  I hope that my stack diagram below depicts that clearly.
>

Right, when you ignore the points where they don't fit, it's a perfect mesh.

>> But that's not a strong argument for vbus; instead of adding vbus you
>> could make virtio more friendly to non-virt
>>  
> Actually, it _is_ a strong argument then because adding vbus is what
> helps makes virtio friendly to non-virt, at least for when performance
> matters.
>

As vhost-net shows, you can do that without vbus and without breaking 
compatibility.



>> Right.  virtio assumes that it's in a virt scenario and that the guest
>> architecture already has enumeration and hotplug mechanisms which it
>> would prefer to use.  That happens to be the case for kvm/x86.
>>  
> No, virtio doesn't assume that.  It's stack provides the "virtio-bus"
> abstraction and what it does assume is that it will be wired up to
> something underneath. Kvm/x86 conveniently has pci, so the virtio-pci
> adapter was created to reuse much of that facility.  For other things
> like lguest and s360, something new had to be created underneath to make
> up for the lack of pci-like support.
>

Right, I was wrong there.  But it does allow you to have a 1:1 mapping 
between native devices and virtio devices.


>>> So to answer your question, the difference is that the part that has to
>>> be customized in vbus should be a fraction of what needs to be
>>> customized with vhost because it defines more of the stack.
>>>
>> But if you want to use the native mechanisms, vbus doesn't have any
>> added value.
>>  
> First of all, thats incorrect.  If you want to use the "native"

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-27 Thread Michael S. Tsirkin
On Fri, Sep 25, 2009 at 10:01:58AM -0700, Ira W. Snyder wrote:
> > +   case VHOST_SET_VRING_KICK:
> > +   r = copy_from_user(&f, argp, sizeof f);
> > +   if (r < 0)
> > +   break;
> > +   eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
> > +   if (IS_ERR(eventfp))
> > +   return PTR_ERR(eventfp);
> > +   if (eventfp != vq->kick) {
> > +   pollstop = filep = vq->kick;
> > +   pollstart = vq->kick = eventfp;
> > +   } else
> > +   filep = eventfp;
> > +   break;
> > +   case VHOST_SET_VRING_CALL:
> > +   r = copy_from_user(&f, argp, sizeof f);
> > +   if (r < 0)
> > +   break;
> > +   eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
> > +   if (IS_ERR(eventfp))
> > +   return PTR_ERR(eventfp);
> > +   if (eventfp != vq->call) {
> > +   filep = vq->call;
> > +   ctx = vq->call_ctx;
> > +   vq->call = eventfp;
> > +   vq->call_ctx = eventfp ?
> > +   eventfd_ctx_fileget(eventfp) : NULL;
> > +   } else
> > +   filep = eventfp;
> > +   break;
> > +   case VHOST_SET_VRING_ERR:
> > +   r = copy_from_user(&f, argp, sizeof f);
> > +   if (r < 0)
> > +   break;
> > +   eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
> > +   if (IS_ERR(eventfp))
> > +   return PTR_ERR(eventfp);
> > +   if (eventfp != vq->error) {
> > +   filep = vq->error;
> > +   vq->error = eventfp;
> > +   ctx = vq->error_ctx;
> > +   vq->error_ctx = eventfp ?
> > +   eventfd_ctx_fileget(eventfp) : NULL;
> > +   } else
> > +   filep = eventfp;
> > +   break;
> 
> I'm not sure how these eventfd's save a trip to userspace.
> 
> AFAICT, eventfd's cannot be used to signal another part of the kernel,
> they can only be used to wake up userspace.

Yes, they can.  See irqfd code in virt/kvm/eventfd.c.

> In my system, when an IRQ for kick() comes in, I have an eventfd which
> gets signalled to notify userspace. When I want to send a call(), I have
> to use a special ioctl(), just like lguest does.
> 
> Doesn't this mean that for call(), vhost is just going to signal an
> eventfd to wake up userspace, which is then going to call ioctl(), and
> then we're back in kernelspace. Seems like a wasted userspace
> round-trip.
> 
> Or am I mis-reading this code?

Yes. Kernel can poll eventfd and deliver an interrupt directly
without involving userspace.

> PS - you can see my current code at:
> http://www.mmarray.org/~iws/virtio-phys/
> 
> Thanks,
> Ira
> 
> > +   default:
> > +   r = -ENOIOCTLCMD;
> > +   }
> > +
> > +   if (pollstop && vq->handle_kick)
> > +   vhost_poll_stop(&vq->poll);
> > +
> > +   if (ctx)
> > +   eventfd_ctx_put(ctx);
> > +   if (filep)
> > +   fput(filep);
> > +
> > +   if (pollstart && vq->handle_kick)
> > +   vhost_poll_start(&vq->poll, vq->kick);
> > +
> > +   mutex_unlock(&vq->mutex);
> > +
> > +   if (pollstop && vq->handle_kick)
> > +   vhost_poll_flush(&vq->poll);
> > +   return 0;
> > +}
> > +
> > +long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, unsigned 
> > long arg)
> > +{
> > +   void __user *argp = (void __user *)arg;
> > +   long r;
> > +
> > +   mutex_lock(&d->mutex);
> > +   /* If you are not the owner, you can become one */
> > +   if (ioctl == VHOST_SET_OWNER) {
> > +   r = vhost_dev_set_owner(d);
> > +   goto done;
> > +   }
> > +
> > +   /* You must be the owner to do anything else */
> > +   r = vhost_dev_check_owner(d);
> > +   if (r)
> > +   goto done;
> > +
> > +   switch (ioctl) {
> > +   case VHOST_SET_MEM_TABLE:
> > +   r = vhost_set_memory(d, argp);
> > +   break;
> > +   default:
> > +   r = vhost_set_vring(d, ioctl, argp);
> > +   break;
> > +   }
> > +done:
> > +   mutex_unlock(&d->mutex);
> > +   return r;
> > +}
> > +
> > +static const struct vhost_memory_region *find_region(struct vhost_memory 
> > *mem,
> > +__u64 addr, __u32 len)
> > +{
> > +   struct vhost_memory_region *reg;
> > +   int i;
> > +   /* linear search is not brilliant, but we really have on the order of 6
> > +* regions in practice */
> > +   for (i = 0; i < mem->nregions; ++i) {
> > +   reg = mem->regions + i;
> > +   if (reg->guest_phys_addr <= addr &&
> > +   reg->guest_phys_addr + reg->memory_size - 1 >= addr)
> > +   return reg;
> > +   }
> > +   return NULL;
> > +}
> > +
> > +int translate_desc(struct vhost_dev *dev, u64 addr, u32 len,
> > +  struct iovec iov[], int iov_size)
> > +{
> > +   const s

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-25 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/24/2009 09:03 PM, Gregory Haskins wrote:
>>
>>> I don't really see how vhost and vbus are different here.  vhost expects
>>> signalling to happen through a couple of eventfds and requires someone
>>> to supply them and implement kernel support (if needed).  vbus requires
>>> someone to write a connector to provide the signalling implementation.
>>> Neither will work out-of-the-box when implementing virtio-net over
>>> falling dominos, for example.
>>>  
>> I realize in retrospect that my choice of words above implies vbus _is_
>> complete, but this is not what I was saying.  What I was trying to
>> convey is that vbus is _more_ complete.  Yes, in either case some kind
>> of glue needs to be written.  The difference is that vbus implements
>> more of the glue generally, and leaves less required to be customized
>> for each iteration.
>>
> 
> 
> No argument there.  Since you care about non-virt scenarios and virtio
> doesn't, naturally vbus is a better fit for them as the code stands.

Thanks for finally starting to acknowledge there's a benefit, at least.

To be more precise, IMO virtio is designed to be a performance oriented
ring-based driver interface that supports all types of hypervisors (e.g.
shmem based kvm, and non-shmem based Xen).  vbus is designed to be a
high-performance generic shared-memory interconnect (for rings or
otherwise) framework for environments where linux is the underpinning
"host" (physical or virtual).  They are distinctly different, but
complementary (the former addresses the part of the front-end, and
latter addresses the back-end, and a different part of the front-end).

In addition, the kvm-connector used in AlacrityVM's design strives to
add value and improve performance via other mechanisms, such as dynamic
 allocation, interrupt coalescing (thus reducing exit-ratio, which is a
serious issue in KVM) and priortizable/nestable signals.

Today there is a large performance disparity between what a KVM guest
sees and what a native linux application sees on that same host.  Just
take a look at some of my graphs between "virtio", and "native", for
example:

http://developer.novell.com/wiki/images/b/b7/31-rc4_throughput.png

A dominant vbus design principle is to try to achieve the same IO
performance for all "linux applications" whether they be literally
userspace applications, or things like KVM vcpus or Ira's physical
boards.  It also aims to solve problems not previously expressible with
current technologies (even virtio), like nested real-time.

And even though you repeatedly insist otherwise, the neat thing here is
that the two technologies mesh (at least under certain circumstances,
like when virtio is deployed on a shared-memory friendly linux backend
like KVM).  I hope that my stack diagram below depicts that clearly.


> But that's not a strong argument for vbus; instead of adding vbus you
> could make virtio more friendly to non-virt

Actually, it _is_ a strong argument then because adding vbus is what
helps makes virtio friendly to non-virt, at least for when performance
matters.

> (there's a limit how far you
> can take this, not imposed by the code, but by virtio's charter as a
> virtual device driver framework).
> 
>> Going back to our stack diagrams, you could think of a vhost solution
>> like this:
>>
>> --
>> | virtio-net
>> --
>> | virtio-ring
>> --
>> | virtio-bus
>> --
>> | ? undefined-1 ?
>> --
>> | vhost
>> --
>>
>> and you could think of a vbus solution like this
>>
>> --
>> | virtio-net
>> --
>> | virtio-ring
>> --
>> | virtio-bus
>> --
>> | bus-interface
>> --
>> | ? undefined-2 ?
>> --
>> | bus-model
>> --
>> | virtio-net-device (vhost ported to vbus model? :)
>> --
>>
>>
>> So the difference between vhost and vbus in this particular context is
>> that you need to have "undefined-1" do device discovery/hotswap,
>> config-space, address-decode/isolation, signal-path routing, memory-path
>> routing, etc.  Today this function is filled by things like virtio-pci,
>> pci-bus, KVM/ioeventfd, and QEMU for x86.  I am not as familiar with
>> lguest, but presumably it is filled there by components like
>> virtio-lguest, lguest-bus, lguest.ko, and lguest-launcher.  And to use
>> more contemporary examples, we might have virtio-domino, domino-bus,
>> domino.ko, and domino-launcher as well as virtio-ira, ira-bus, ira.ko,
>> and ira-launcher.
>>
>> Contrast this to the vbus stack:  The bus-X components (when optionally
>> employed by the connector designer) do device-discovery, hotswap,
>> config-space, address-decode/isolation, signal-path and memory-path
>> routing, etc in a general (and pv-centric) way.

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-25 Thread Ira W. Snyder
On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote:
> What it is: vhost net is a character device that can be used to reduce
> the number of system calls involved in virtio networking.
> Existing virtio net code is used in the guest without modification.
> 
> There's similarity with vringfd, with some differences and reduced scope
> - uses eventfd for signalling
> - structures can be moved around in memory at any time (good for migration)
> - support memory table and not just an offset (needed for kvm)
> 
> common virtio related code has been put in a separate file vhost.c and
> can be made into a separate module if/when more backends appear.  I used
> Rusty's lguest.c as the source for developing this part : this supplied
> me with witty comments I wouldn't be able to write myself.
> 
> What it is not: vhost net is not a bus, and not a generic new system
> call. No assumptions are made on how guest performs hypercalls.
> Userspace hypervisors are supported as well as kvm.
> 
> How it works: Basically, we connect virtio frontend (configured by
> userspace) to a backend. The backend could be a network device, or a
> tun-like device. In this version I only support raw socket as a backend,
> which can be bound to e.g. SR IOV, or to macvlan device.  Backend is
> also configured by userspace, including vlan/mac etc.
> 
> Status:
> This works for me, and I haven't see any crashes.
> I have done some light benchmarking (with v4), compared to userspace, I
> see improved latency (as I save up to 4 system calls per packet) but not
> bandwidth/CPU (as TSO and interrupt mitigation are not supported).  For
> ping benchmark (where there's no TSO) troughput is also improved.
> 
> Features that I plan to look at in the future:
> - tap support
> - TSO
> - interrupt mitigation
> - zero copy
> 
> Acked-by: Arnd Bergmann 
> Signed-off-by: Michael S. Tsirkin 
> 
> ---
>  MAINTAINERS|   10 +
>  arch/x86/kvm/Kconfig   |1 +
>  drivers/Makefile   |1 +
>  drivers/vhost/Kconfig  |   11 +
>  drivers/vhost/Makefile |2 +
>  drivers/vhost/net.c|  475 ++
>  drivers/vhost/vhost.c  |  688 
> 
>  drivers/vhost/vhost.h  |  122 
>  include/linux/Kbuild   |1 +
>  include/linux/miscdevice.h |1 +
>  include/linux/vhost.h  |  101 +++
>  11 files changed, 1413 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/vhost/Kconfig
>  create mode 100644 drivers/vhost/Makefile
>  create mode 100644 drivers/vhost/net.c
>  create mode 100644 drivers/vhost/vhost.c
>  create mode 100644 drivers/vhost/vhost.h
>  create mode 100644 include/linux/vhost.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b1114cf..de4587f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5431,6 +5431,16 @@ S: Maintained
>  F:   Documentation/filesystems/vfat.txt
>  F:   fs/fat/
>  
> +VIRTIO HOST (VHOST)
> +P:   Michael S. Tsirkin
> +M:   m...@redhat.com
> +L:   k...@vger.kernel.org
> +L:   virtualizat...@lists.osdl.org
> +L:   net...@vger.kernel.org
> +S:   Maintained
> +F:   drivers/vhost/
> +F:   include/linux/vhost.h
> +
>  VIA RHINE NETWORK DRIVER
>  M:   Roger Luethi 
>  S:   Maintained
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index b84e571..94f44d9 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -64,6 +64,7 @@ config KVM_AMD
>  
>  # OK, it's a little counter-intuitive to do this, but it puts it neatly under
>  # the virtualization menu.
> +source drivers/vhost/Kconfig
>  source drivers/lguest/Kconfig
>  source drivers/virtio/Kconfig
>  
> diff --git a/drivers/Makefile b/drivers/Makefile
> index bc4205d..1551ae1 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -105,6 +105,7 @@ obj-$(CONFIG_HID) += hid/
>  obj-$(CONFIG_PPC_PS3)+= ps3/
>  obj-$(CONFIG_OF) += of/
>  obj-$(CONFIG_SSB)+= ssb/
> +obj-$(CONFIG_VHOST_NET)  += vhost/
>  obj-$(CONFIG_VIRTIO) += virtio/
>  obj-$(CONFIG_VLYNQ)  += vlynq/
>  obj-$(CONFIG_STAGING)+= staging/
> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> new file mode 100644
> index 000..d955406
> --- /dev/null
> +++ b/drivers/vhost/Kconfig
> @@ -0,0 +1,11 @@
> +config VHOST_NET
> + tristate "Host kernel accelerator for virtio net"
> + depends on NET && EVENTFD
> + ---help---
> +   This kernel module can be loaded in host kernel to accelerate
> +   guest networking with virtio_net. Not to be confused with virtio_net
> +   module itself which needs to be loaded in guest kernel.
> +
> +   To compile this driver as a module, choose M here: the module will
> +   be called vhost_net.
> +
> diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
> new file mode 100644
> index 000..72dd020
> --- /dev/null
> +++ b/drivers/vhost/Makefile
> @@ -0,0 +1,2 @@
> +obj-$(C

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-25 Thread Avi Kivity
On 09/24/2009 09:03 PM, Gregory Haskins wrote:
>
>> I don't really see how vhost and vbus are different here.  vhost expects
>> signalling to happen through a couple of eventfds and requires someone
>> to supply them and implement kernel support (if needed).  vbus requires
>> someone to write a connector to provide the signalling implementation.
>> Neither will work out-of-the-box when implementing virtio-net over
>> falling dominos, for example.
>>  
> I realize in retrospect that my choice of words above implies vbus _is_
> complete, but this is not what I was saying.  What I was trying to
> convey is that vbus is _more_ complete.  Yes, in either case some kind
> of glue needs to be written.  The difference is that vbus implements
> more of the glue generally, and leaves less required to be customized
> for each iteration.
>


No argument there.  Since you care about non-virt scenarios and virtio 
doesn't, naturally vbus is a better fit for them as the code stands.  
But that's not a strong argument for vbus; instead of adding vbus you 
could make virtio more friendly to non-virt (there's a limit how far you 
can take this, not imposed by the code, but by virtio's charter as a 
virtual device driver framework).

> Going back to our stack diagrams, you could think of a vhost solution
> like this:
>
> --
> | virtio-net
> --
> | virtio-ring
> --
> | virtio-bus
> --
> | ? undefined-1 ?
> --
> | vhost
> --
>
> and you could think of a vbus solution like this
>
> --
> | virtio-net
> --
> | virtio-ring
> --
> | virtio-bus
> --
> | bus-interface
> --
> | ? undefined-2 ?
> --
> | bus-model
> --
> | virtio-net-device (vhost ported to vbus model? :)
> --
>
>
> So the difference between vhost and vbus in this particular context is
> that you need to have "undefined-1" do device discovery/hotswap,
> config-space, address-decode/isolation, signal-path routing, memory-path
> routing, etc.  Today this function is filled by things like virtio-pci,
> pci-bus, KVM/ioeventfd, and QEMU for x86.  I am not as familiar with
> lguest, but presumably it is filled there by components like
> virtio-lguest, lguest-bus, lguest.ko, and lguest-launcher.  And to use
> more contemporary examples, we might have virtio-domino, domino-bus,
> domino.ko, and domino-launcher as well as virtio-ira, ira-bus, ira.ko,
> and ira-launcher.
>
> Contrast this to the vbus stack:  The bus-X components (when optionally
> employed by the connector designer) do device-discovery, hotswap,
> config-space, address-decode/isolation, signal-path and memory-path
> routing, etc in a general (and pv-centric) way. The "undefined-2"
> portion is the "connector", and just needs to convey messages like
> "DEVCALL" and "SHMSIGNAL".  The rest is handled in other parts of the stack.
>
>

Right.  virtio assumes that it's in a virt scenario and that the guest 
architecture already has enumeration and hotplug mechanisms which it 
would prefer to use.  That happens to be the case for kvm/x86.

> So to answer your question, the difference is that the part that has to
> be customized in vbus should be a fraction of what needs to be
> customized with vhost because it defines more of the stack.

But if you want to use the native mechanisms, vbus doesn't have any 
added value.

> And, as
> eluded to in my diagram, both virtio-net and vhost (with some
> modifications to fit into the vbus framework) are potentially
> complementary, not competitors.
>

Only theoretically.  The existing installed base would have to be thrown 
away, or we'd need to support both.

  


>> Without a vbus-connector-falling-dominos, vbus-venet can't do anything
>> either.
>>  
> Mostly covered above...
>
> However, I was addressing your assertion that vhost somehow magically
> accomplishes this "container/addressing" function without any specific
> kernel support.  This is incorrect.  I contend that this kernel support
> is required and present.  The difference is that its defined elsewhere
> (and typically in a transport/arch specific way).
>
> IOW: You can basically think of the programmed PIO addresses as forming
> its "container".  Only addresses explicitly added are visible, and
> everything else is inaccessible.  This whole discussion is merely a
> question of what's been generalized verses what needs to be
> re-implemented each time.
>

Sorry, this is too abstract for me.



>> vbus doesn't do kvm guest address decoding for the fast path.  It's
>> still done by ioeventfd.
>>  
> That is not correct.  vbus does its own native address decoding in the
> fast path, such as here:
>
> http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/l

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-25 Thread Avi Kivity
On 09/24/2009 10:27 PM, Ira W. Snyder wrote:
>>> Ira can make ira-bus, and ira-eventfd, etc, etc.
>>>
>>> Each iteration will invariably introduce duplicated parts of the stack.
>>>
>>>
>> Invariably?  Use libraries (virtio-shmem.ko, libvhost.so).
>>
>>  
> Referencing libraries that don't yet exist doesn't seem like a good
> argument against vbus from my point of view. I'm not speficially
> advocating for vbus; I'm just letting you know how it looks to another
> developer in the trenches.
>

My argument is that we shouldn't write a new framework instead of fixing 
or extending an existing one.

> If you'd like to see the amount of duplication present, look at the code
> I'm currently working on.

Yes, virtio-phys-guest looks pretty much duplicated.  Looks like it 
should be pretty easy to deduplicate.

>   It mostly works at this point, though I
> haven't finished my userspace, nor figured out how to actually transfer
> data.
>
> The current question I have (just to let you know where I am in
> development) is:
>
> I have the physical address of the remote data, but how do I get it into
> a userspace buffer, so I can pass it to tun?
>

vhost does guest physical address to host userspace address (it your 
scenario, remote physical to local virtual) using a table of memory 
slots; there's an ioctl that allows userspace to initialize that table.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-24 Thread Ira W. Snyder
On Thu, Sep 24, 2009 at 10:18:28AM +0300, Avi Kivity wrote:
> On 09/24/2009 12:15 AM, Gregory Haskins wrote:
> >
> >>> There are various aspects about designing high-performance virtual
> >>> devices such as providing the shortest paths possible between the
> >>> physical resources and the consumers.  Conversely, we also need to
> >>> ensure that we meet proper isolation/protection guarantees at the same
> >>> time.  What this means is there are various aspects to any
> >>> high-performance PV design that require to be placed in-kernel to
> >>> maximize the performance yet properly isolate the guest.
> >>>
> >>> For instance, you are required to have your signal-path (interrupts and
> >>> hypercalls), your memory-path (gpa translation), and
> >>> addressing/isolation model in-kernel to maximize performance.
> >>>
> >>>
> >> Exactly.  That's what vhost puts into the kernel and nothing more.
> >>  
> > Actually, no.  Generally, _KVM_ puts those things into the kernel, and
> > vhost consumes them.  Without KVM (or something equivalent), vhost is
> > incomplete.  One of my goals with vbus is to generalize the "something
> > equivalent" part here.
> >
> 
> I don't really see how vhost and vbus are different here.  vhost expects 
> signalling to happen through a couple of eventfds and requires someone 
> to supply them and implement kernel support (if needed).  vbus requires 
> someone to write a connector to provide the signalling implementation.  
> Neither will work out-of-the-box when implementing virtio-net over 
> falling dominos, for example.
> 
> >>> Vbus accomplishes its in-kernel isolation model by providing a
> >>> "container" concept, where objects are placed into this container by
> >>> userspace.  The host kernel enforces isolation/protection by using a
> >>> namespace to identify objects that is only relevant within a specific
> >>> container's context (namely, a "u32 dev-id").  The guest addresses the
> >>> objects by its dev-id, and the kernel ensures that the guest can't
> >>> access objects outside of its dev-id namespace.
> >>>
> >>>
> >> vhost manages to accomplish this without any kernel support.
> >>  
> > No, vhost manages to accomplish this because of KVMs kernel support
> > (ioeventfd, etc).   Without a KVM-like in-kernel support, vhost is a
> > merely a kind of "tuntap"-like clone signalled by eventfds.
> >
> 
> Without a vbus-connector-falling-dominos, vbus-venet can't do anything 
> either.  Both vhost and vbus need an interface, vhost's is just narrower 
> since it doesn't do configuration or enumeration.
> 
> > This goes directly to my rebuttal of your claim that vbus places too
> > much in the kernel.  I state that, one way or the other, address decode
> > and isolation _must_ be in the kernel for performance.  Vbus does this
> > with a devid/container scheme.  vhost+virtio-pci+kvm does it with
> > pci+pio+ioeventfd.
> >
> 
> vbus doesn't do kvm guest address decoding for the fast path.  It's 
> still done by ioeventfd.
> 
> >>   The guest
> >> simply has not access to any vhost resources other than the guest->host
> >> doorbell, which is handed to the guest outside vhost (so it's somebody
> >> else's problem, in userspace).
> >>  
> > You mean _controlled_ by userspace, right?  Obviously, the other side of
> > the kernel still needs to be programmed (ioeventfd, etc).  Otherwise,
> > vhost would be pointless: e.g. just use vanilla tuntap if you don't need
> > fast in-kernel decoding.
> >
> 
> Yes (though for something like level-triggered interrupts we're probably 
> keeping it in userspace, enjoying the benefits of vhost data path while 
> paying more for signalling).
> 
> >>> All that is required is a way to transport a message with a "devid"
> >>> attribute as an address (such as DEVCALL(devid)) and the framework
> >>> provides the rest of the decode+execute function.
> >>>
> >>>
> >> vhost avoids that.
> >>  
> > No, it doesn't avoid it.  It just doesn't specify how its done, and
> > relies on something else to do it on its behalf.
> >
> 
> That someone else can be in userspace, apart from the actual fast path.
> 
> > Conversely, vbus specifies how its done, but not how to transport the
> > verb "across the wire".  That is the role of the vbus-connector abstraction.
> >
> 
> So again, vbus does everything in the kernel (since it's so easy and 
> cheap) but expects a vbus-connector.  vhost does configuration in 
> userspace (since it's so clunky and fragile) but expects a couple of 
> eventfds.
> 
> >>> Contrast this to vhost+virtio-pci (called simply "vhost" from here).
> >>>
> >>>
> >> It's the wrong name.  vhost implements only the data path.
> >>  
> > Understood, but vhost+virtio-pci is what I am contrasting, and I use
> > "vhost" for short from that point on because I am too lazy to type the
> > whole name over and over ;)
> >
> 
> If you #define A A+B+C don't expect intelligent conversation a

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-24 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/23/2009 10:37 PM, Avi Kivity wrote:
>>
>> Example: feature negotiation.  If it happens in userspace, it's easy
>> to limit what features we expose to the guest.  If it happens in the
>> kernel, we need to add an interface to let the kernel know which
>> features it should expose to the guest.  We also need to add an
>> interface to let userspace know which features were negotiated, if we
>> want to implement live migration.  Something fairly trivial bloats
>> rapidly.
> 
> btw, we have this issue with kvm reporting cpuid bits to the guest. 
> Instead of letting kvm talk directly to the hardware and the guest, kvm
> gets the cpuid bits from the hardware, strips away features it doesn't
> support, exposes that to userspace, and expects userspace to program the
> cpuid bits it wants to expose to the guest (which may be different than
> what kvm exposed to userspace, and different from guest to guest).
> 

This issue doesn't exist in the model I am referring to, as these are
all virtual-devices anyway.  See my last reply

-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-24 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/24/2009 12:15 AM, Gregory Haskins wrote:
>>
 There are various aspects about designing high-performance virtual
 devices such as providing the shortest paths possible between the
 physical resources and the consumers.  Conversely, we also need to
 ensure that we meet proper isolation/protection guarantees at the same
 time.  What this means is there are various aspects to any
 high-performance PV design that require to be placed in-kernel to
 maximize the performance yet properly isolate the guest.

 For instance, you are required to have your signal-path (interrupts and
 hypercalls), your memory-path (gpa translation), and
 addressing/isolation model in-kernel to maximize performance.


>>> Exactly.  That's what vhost puts into the kernel and nothing more.
>>>  
>> Actually, no.  Generally, _KVM_ puts those things into the kernel, and
>> vhost consumes them.  Without KVM (or something equivalent), vhost is
>> incomplete.  One of my goals with vbus is to generalize the "something
>> equivalent" part here.
>>
> 
> I don't really see how vhost and vbus are different here.  vhost expects
> signalling to happen through a couple of eventfds and requires someone
> to supply them and implement kernel support (if needed).  vbus requires
> someone to write a connector to provide the signalling implementation. 
> Neither will work out-of-the-box when implementing virtio-net over
> falling dominos, for example.

I realize in retrospect that my choice of words above implies vbus _is_
complete, but this is not what I was saying.  What I was trying to
convey is that vbus is _more_ complete.  Yes, in either case some kind
of glue needs to be written.  The difference is that vbus implements
more of the glue generally, and leaves less required to be customized
for each iteration.

Going back to our stack diagrams, you could think of a vhost solution
like this:

--
| virtio-net
--
| virtio-ring
--
| virtio-bus
--
| ? undefined-1 ?
--
| vhost
--

and you could think of a vbus solution like this

--
| virtio-net
--
| virtio-ring
--
| virtio-bus
--
| bus-interface
--
| ? undefined-2 ?
--
| bus-model
--
| virtio-net-device (vhost ported to vbus model? :)
--


So the difference between vhost and vbus in this particular context is
that you need to have "undefined-1" do device discovery/hotswap,
config-space, address-decode/isolation, signal-path routing, memory-path
routing, etc.  Today this function is filled by things like virtio-pci,
pci-bus, KVM/ioeventfd, and QEMU for x86.  I am not as familiar with
lguest, but presumably it is filled there by components like
virtio-lguest, lguest-bus, lguest.ko, and lguest-launcher.  And to use
more contemporary examples, we might have virtio-domino, domino-bus,
domino.ko, and domino-launcher as well as virtio-ira, ira-bus, ira.ko,
and ira-launcher.

Contrast this to the vbus stack:  The bus-X components (when optionally
employed by the connector designer) do device-discovery, hotswap,
config-space, address-decode/isolation, signal-path and memory-path
routing, etc in a general (and pv-centric) way. The "undefined-2"
portion is the "connector", and just needs to convey messages like
"DEVCALL" and "SHMSIGNAL".  The rest is handled in other parts of the stack.

So to answer your question, the difference is that the part that has to
be customized in vbus should be a fraction of what needs to be
customized with vhost because it defines more of the stack.  And, as
eluded to in my diagram, both virtio-net and vhost (with some
modifications to fit into the vbus framework) are potentially
complementary, not competitors.

> 
 Vbus accomplishes its in-kernel isolation model by providing a
 "container" concept, where objects are placed into this container by
 userspace.  The host kernel enforces isolation/protection by using a
 namespace to identify objects that is only relevant within a specific
 container's context (namely, a "u32 dev-id").  The guest addresses the
 objects by its dev-id, and the kernel ensures that the guest can't
 access objects outside of its dev-id namespace.


>>> vhost manages to accomplish this without any kernel support.
>>>  
>> No, vhost manages to accomplish this because of KVMs kernel support
>> (ioeventfd, etc).   Without a KVM-like in-kernel support, vhost is a
>> merely a kind of "tuntap"-like clone signalled by eventfds.
>>
> 
> Without a vbus-connector-falling-dominos, vbus-venet can't do anything
> either.

Mostly covered above...

However, I was addressing your assertion that vhost somehow m

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-24 Thread Avi Kivity
On 09/23/2009 10:37 PM, Avi Kivity wrote:
>
> Example: feature negotiation.  If it happens in userspace, it's easy 
> to limit what features we expose to the guest.  If it happens in the 
> kernel, we need to add an interface to let the kernel know which 
> features it should expose to the guest.  We also need to add an 
> interface to let userspace know which features were negotiated, if we 
> want to implement live migration.  Something fairly trivial bloats 
> rapidly.

btw, we have this issue with kvm reporting cpuid bits to the guest.  
Instead of letting kvm talk directly to the hardware and the guest, kvm 
gets the cpuid bits from the hardware, strips away features it doesn't 
support, exposes that to userspace, and expects userspace to program the 
cpuid bits it wants to expose to the guest (which may be different than 
what kvm exposed to userspace, and different from guest to guest).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-24 Thread Avi Kivity
On 09/24/2009 12:15 AM, Gregory Haskins wrote:
>
>>> There are various aspects about designing high-performance virtual
>>> devices such as providing the shortest paths possible between the
>>> physical resources and the consumers.  Conversely, we also need to
>>> ensure that we meet proper isolation/protection guarantees at the same
>>> time.  What this means is there are various aspects to any
>>> high-performance PV design that require to be placed in-kernel to
>>> maximize the performance yet properly isolate the guest.
>>>
>>> For instance, you are required to have your signal-path (interrupts and
>>> hypercalls), your memory-path (gpa translation), and
>>> addressing/isolation model in-kernel to maximize performance.
>>>
>>>
>> Exactly.  That's what vhost puts into the kernel and nothing more.
>>  
> Actually, no.  Generally, _KVM_ puts those things into the kernel, and
> vhost consumes them.  Without KVM (or something equivalent), vhost is
> incomplete.  One of my goals with vbus is to generalize the "something
> equivalent" part here.
>

I don't really see how vhost and vbus are different here.  vhost expects 
signalling to happen through a couple of eventfds and requires someone 
to supply them and implement kernel support (if needed).  vbus requires 
someone to write a connector to provide the signalling implementation.  
Neither will work out-of-the-box when implementing virtio-net over 
falling dominos, for example.

>>> Vbus accomplishes its in-kernel isolation model by providing a
>>> "container" concept, where objects are placed into this container by
>>> userspace.  The host kernel enforces isolation/protection by using a
>>> namespace to identify objects that is only relevant within a specific
>>> container's context (namely, a "u32 dev-id").  The guest addresses the
>>> objects by its dev-id, and the kernel ensures that the guest can't
>>> access objects outside of its dev-id namespace.
>>>
>>>
>> vhost manages to accomplish this without any kernel support.
>>  
> No, vhost manages to accomplish this because of KVMs kernel support
> (ioeventfd, etc).   Without a KVM-like in-kernel support, vhost is a
> merely a kind of "tuntap"-like clone signalled by eventfds.
>

Without a vbus-connector-falling-dominos, vbus-venet can't do anything 
either.  Both vhost and vbus need an interface, vhost's is just narrower 
since it doesn't do configuration or enumeration.

> This goes directly to my rebuttal of your claim that vbus places too
> much in the kernel.  I state that, one way or the other, address decode
> and isolation _must_ be in the kernel for performance.  Vbus does this
> with a devid/container scheme.  vhost+virtio-pci+kvm does it with
> pci+pio+ioeventfd.
>

vbus doesn't do kvm guest address decoding for the fast path.  It's 
still done by ioeventfd.

>>   The guest
>> simply has not access to any vhost resources other than the guest->host
>> doorbell, which is handed to the guest outside vhost (so it's somebody
>> else's problem, in userspace).
>>  
> You mean _controlled_ by userspace, right?  Obviously, the other side of
> the kernel still needs to be programmed (ioeventfd, etc).  Otherwise,
> vhost would be pointless: e.g. just use vanilla tuntap if you don't need
> fast in-kernel decoding.
>

Yes (though for something like level-triggered interrupts we're probably 
keeping it in userspace, enjoying the benefits of vhost data path while 
paying more for signalling).

>>> All that is required is a way to transport a message with a "devid"
>>> attribute as an address (such as DEVCALL(devid)) and the framework
>>> provides the rest of the decode+execute function.
>>>
>>>
>> vhost avoids that.
>>  
> No, it doesn't avoid it.  It just doesn't specify how its done, and
> relies on something else to do it on its behalf.
>

That someone else can be in userspace, apart from the actual fast path.

> Conversely, vbus specifies how its done, but not how to transport the
> verb "across the wire".  That is the role of the vbus-connector abstraction.
>

So again, vbus does everything in the kernel (since it's so easy and 
cheap) but expects a vbus-connector.  vhost does configuration in 
userspace (since it's so clunky and fragile) but expects a couple of 
eventfds.

>>> Contrast this to vhost+virtio-pci (called simply "vhost" from here).
>>>
>>>
>> It's the wrong name.  vhost implements only the data path.
>>  
> Understood, but vhost+virtio-pci is what I am contrasting, and I use
> "vhost" for short from that point on because I am too lazy to type the
> whole name over and over ;)
>

If you #define A A+B+C don't expect intelligent conversation afterwards.

>>> It is not immune to requiring in-kernel addressing support either, but
>>> rather it just does it differently (and its not as you might expect via
>>> qemu).
>>>
>>> Vhost relies on QEMU to render PCI objects to the guest, which the guest
>>> assigns resources

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-23 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/23/2009 08:58 PM, Gregory Haskins wrote:
>>>
 It also pulls parts of the device model into the host kernel.

>>> That is the point.  Most of it needs to be there for performance.
>>>  
>> To clarify this point:
>>
>> There are various aspects about designing high-performance virtual
>> devices such as providing the shortest paths possible between the
>> physical resources and the consumers.  Conversely, we also need to
>> ensure that we meet proper isolation/protection guarantees at the same
>> time.  What this means is there are various aspects to any
>> high-performance PV design that require to be placed in-kernel to
>> maximize the performance yet properly isolate the guest.
>>
>> For instance, you are required to have your signal-path (interrupts and
>> hypercalls), your memory-path (gpa translation), and
>> addressing/isolation model in-kernel to maximize performance.
>>
> 
> Exactly.  That's what vhost puts into the kernel and nothing more.

Actually, no.  Generally, _KVM_ puts those things into the kernel, and
vhost consumes them.  Without KVM (or something equivalent), vhost is
incomplete.  One of my goals with vbus is to generalize the "something
equivalent" part here.

I know you may not care about non-kvm use cases, and thats fine.  No one
says you have to.  However, note that some of use do care about these
non-kvm cases, and thus its a distinction I am making here as a benefit
of the vbus framework.

> 
>> Vbus accomplishes its in-kernel isolation model by providing a
>> "container" concept, where objects are placed into this container by
>> userspace.  The host kernel enforces isolation/protection by using a
>> namespace to identify objects that is only relevant within a specific
>> container's context (namely, a "u32 dev-id").  The guest addresses the
>> objects by its dev-id, and the kernel ensures that the guest can't
>> access objects outside of its dev-id namespace.
>>
> 
> vhost manages to accomplish this without any kernel support.

No, vhost manages to accomplish this because of KVMs kernel support
(ioeventfd, etc).   Without a KVM-like in-kernel support, vhost is a
merely a kind of "tuntap"-like clone signalled by eventfds.

vbus on the other hand, generalizes one more piece of the puzzle
(namely, the function of pio+ioeventfd and userspace's programming of
it) by presenting the devid namespace and container concept.

This goes directly to my rebuttal of your claim that vbus places too
much in the kernel.  I state that, one way or the other, address decode
and isolation _must_ be in the kernel for performance.  Vbus does this
with a devid/container scheme.  vhost+virtio-pci+kvm does it with
pci+pio+ioeventfd.


>  The guest
> simply has not access to any vhost resources other than the guest->host
> doorbell, which is handed to the guest outside vhost (so it's somebody
> else's problem, in userspace).

You mean _controlled_ by userspace, right?  Obviously, the other side of
the kernel still needs to be programmed (ioeventfd, etc).  Otherwise,
vhost would be pointless: e.g. just use vanilla tuntap if you don't need
fast in-kernel decoding.

> 
>> All that is required is a way to transport a message with a "devid"
>> attribute as an address (such as DEVCALL(devid)) and the framework
>> provides the rest of the decode+execute function.
>>
> 
> vhost avoids that.

No, it doesn't avoid it.  It just doesn't specify how its done, and
relies on something else to do it on its behalf.

Conversely, vbus specifies how its done, but not how to transport the
verb "across the wire".  That is the role of the vbus-connector abstraction.

> 
>> Contrast this to vhost+virtio-pci (called simply "vhost" from here).
>>
> 
> It's the wrong name.  vhost implements only the data path.

Understood, but vhost+virtio-pci is what I am contrasting, and I use
"vhost" for short from that point on because I am too lazy to type the
whole name over and over ;)

> 
>> It is not immune to requiring in-kernel addressing support either, but
>> rather it just does it differently (and its not as you might expect via
>> qemu).
>>
>> Vhost relies on QEMU to render PCI objects to the guest, which the guest
>> assigns resources (such as BARs, interrupts, etc).
> 
> vhost does not rely on qemu.  It relies on its user to handle
> configuration.  In one important case it's qemu+pci.  It could just as
> well be the lguest launcher.

I meant vhost=vhost+virtio-pci here.  Sorry for the confusion.

The point I am making specifically is that vhost in general relies on
other in-kernel components to function.  I.e. It cannot function without
having something like the PCI model to build an IO namespace.  That
namespace (in this case, pio addresses+data tuples) are used for the
in-kernel addressing function under KVM + virtio-pci.

The case of the lguest launcher is a good one to highlight.  Yes, you
can presumably also use lguest with vhost, if the requisite facilities
are expo

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-23 Thread Avi Kivity
On 09/23/2009 08:58 PM, Gregory Haskins wrote:
>>
>>> It also pulls parts of the device model into the host kernel.
>>>
>> That is the point.  Most of it needs to be there for performance.
>>  
> To clarify this point:
>
> There are various aspects about designing high-performance virtual
> devices such as providing the shortest paths possible between the
> physical resources and the consumers.  Conversely, we also need to
> ensure that we meet proper isolation/protection guarantees at the same
> time.  What this means is there are various aspects to any
> high-performance PV design that require to be placed in-kernel to
> maximize the performance yet properly isolate the guest.
>
> For instance, you are required to have your signal-path (interrupts and
> hypercalls), your memory-path (gpa translation), and
> addressing/isolation model in-kernel to maximize performance.
>

Exactly.  That's what vhost puts into the kernel and nothing more.

> Vbus accomplishes its in-kernel isolation model by providing a
> "container" concept, where objects are placed into this container by
> userspace.  The host kernel enforces isolation/protection by using a
> namespace to identify objects that is only relevant within a specific
> container's context (namely, a "u32 dev-id").  The guest addresses the
> objects by its dev-id, and the kernel ensures that the guest can't
> access objects outside of its dev-id namespace.
>

vhost manages to accomplish this without any kernel support.  The guest 
simply has not access to any vhost resources other than the guest->host 
doorbell, which is handed to the guest outside vhost (so it's somebody 
else's problem, in userspace).

> All that is required is a way to transport a message with a "devid"
> attribute as an address (such as DEVCALL(devid)) and the framework
> provides the rest of the decode+execute function.
>

vhost avoids that.

> Contrast this to vhost+virtio-pci (called simply "vhost" from here).
>

It's the wrong name.  vhost implements only the data path.

> It is not immune to requiring in-kernel addressing support either, but
> rather it just does it differently (and its not as you might expect via
> qemu).
>
> Vhost relies on QEMU to render PCI objects to the guest, which the guest
> assigns resources (such as BARs, interrupts, etc).

vhost does not rely on qemu.  It relies on its user to handle 
configuration.  In one important case it's qemu+pci.  It could just as 
well be the lguest launcher.

>A PCI-BAR in this
> example may represent a PIO address for triggering some operation in the
> device-model's fast-path.  For it to have meaning in the fast-path, KVM
> has to have in-kernel knowledge of what a PIO-exit is, and what to do
> with it (this is where pio-bus and ioeventfd come in).  The programming
> of the PIO-exit and the ioeventfd are likewise controlled by some
> userspace management entity (i.e. qemu).   The PIO address and value
> tuple form the address, and the ioeventfd framework within KVM provide
> the decode+execute function.
>

Right.

> This idea seemingly works fine, mind you, but it rides on top of a *lot*
> of stuff including but not limited to: the guests pci stack, the qemu
> pci emulation, kvm pio support, and ioeventfd.  When you get into
> situations where you don't have PCI or even KVM underneath you (e.g. a
> userspace container, Ira's rig, etc) trying to recreate all of that PCI
> infrastructure for the sake of using PCI is, IMO, a lot of overhead for
> little gain.
>

For the N+1th time, no.  vhost is perfectly usable without pci.  Can we 
stop raising and debunking this point?

> All you really need is a simple decode+execute mechanism, and a way to
> program it from userspace control.  vbus tries to do just that:
> commoditize it so all you need is the transport of the control messages
> (like DEVCALL()), but the decode+execute itself is reuseable, even
> across various environments (like KVM or Iras rig).
>

If you think it should be "commodotized", write libvhostconfig.so.

> And your argument, I believe, is that vbus allows both to be implemented
> in the kernel (though to reiterate, its optional) and is therefore a bad
> design, so lets discuss that.
>
> I believe the assertion is that things like config-space are best left
> to userspace, and we should only relegate fast-path duties to the
> kernel.  The problem is that, in my experience, a good deal of
> config-space actually influences the fast-path and thus needs to
> interact with the fast-path mechanism eventually anyway.
> Whats left
> over that doesn't fall into this category may cheaply ride on existing
> plumbing, so its not like we created something new or unnatural just to
> support this subclass of config-space.
>

Flexibility is reduced, because changing code in the kernel is more 
expensive than in userspace, and kernel/user interfaces aren't typically 
as wide as pure userspace interfaces.  Security is reduced, since a 

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-23 Thread Gregory Haskins
Gregory Haskins wrote:
> Avi Kivity wrote:
>> On 09/23/2009 05:26 PM, Gregory Haskins wrote:
>>>   
> Yes, I'm having to create my own bus model, a-la lguest, virtio-pci,
> and
> virtio-s390. It isn't especially easy. I can steal lots of code from
> the
> lguest bus model, but sometimes it is good to generalize, especially
> after the fourth implemention or so. I think this is what GHaskins
> tried
> to do.
>
>
 Yes.  vbus is more finely layered so there is less code duplication.
  
>>> To clarify, Ira was correct in stating this generalizing some of these
>>> components was one of the goals for the vbus project: IOW vbus finely
>>> layers and defines what's below virtio, not replaces it.
>>>
>>> You can think of a virtio-stack like this:
>>>
>>> --
>>> | virtio-net
>>> --
>>> | virtio-ring
>>> --
>>> | virtio-bus
>>> --
>>> | ? undefined ?
>>> --
>>>
>>> IOW: The way I see it, virtio is a device interface model only.  The
>>> rest of it is filled in by the virtio-transport and some kind of
>>> back-end.
>>>
>>> So today, we can complete the "? undefined ?" block like this for KVM:
>>>
>>> --
>>> | virtio-pci
>>> --
>>>   |
>>> --
>>> | kvm.ko
>>> --
>>> | qemu
>>> --
>>> | tuntap
>>> --
>>>
>>> In this case, kvm.ko and tuntap are providing plumbing, and qemu is
>>> providing a backend device model (pci-based, etc).
>>>
>>> You can, of course, plug a different stack in (such as virtio-lguest,
>>> virtio-ira, etc) but you are more or less on your own to recreate many
>>> of the various facilities contained in that stack (such as things
>>> provided by QEMU, like discovery/hotswap/addressing), as Ira is
>>> discovering.
>>>
>>> Vbus tries to commoditize more components in the stack (like the bus
>>> model and backend-device model) so they don't need to be redesigned each
>>> time we solve this "virtio-transport" problem.  IOW: stop the
>>> proliferation of the need for pci-bus, lguest-bus, foo-bus underneath
>>> virtio.  Instead, we can then focus on the value add on top, like the
>>> models themselves or the simple glue between them.
>>>
>>> So now you might have something like
>>>
>>> --
>>> | virtio-vbus
>>> --
>>> | vbus-proxy
>>> --
>>> | kvm-guest-connector
>>> --
>>>   |
>>> --
>>> | kvm.ko
>>> --
>>> | kvm-host-connector.ko
>>> --
>>> | vbus.ko
>>> --
>>> | virtio-net-backend.ko
>>> --
>>>
>>> so now we don't need to worry about the bus-model or the device-model
>>> framework.  We only need to implement the connector, etc.  This is handy
>>> when you find yourself in an environment that doesn't support PCI (such
>>> as Ira's rig, or userspace containers), or when you want to add features
>>> that PCI doesn't have (such as fluid event channels for things like IPC
>>> services, or priortizable interrupts, etc).
>>>
>> Well, vbus does more, for example it tunnels interrupts instead of
>> exposing them 1:1 on the native interface if it exists.
> 
> As I've previously explained, that trait is a function of the
> kvm-connector I've chosen to implement, not of the overall design of vbus.
> 
> The reason why my kvm-connector is designed that way is because my early
> testing/benchmarking shows one of the issues in KVM performance is the
> ratio of exits per IO operation are fairly high, especially as your
> scale io-load.  Therefore, the connector achieves a substantial
> reduction in that ratio by treating "interrupts" to the same kind of
> benefits that NAPI brought to general networking: That is, we enqueue
> "interrupt" messages into a lockless ring and only hit the IDT for the
> first occurrence.  Subsequent interrupts are injected in a
> parallel/lockless manner, without hitting the IDT nor incurring an extra
> EOI.  This pays dividends as the IO rate increases, which is when the
> guest needs the most help.
> 
> OTOH, it is entirely possible to design the connector such that we
> maintain a 1:1 ratio of signals to traditional IDT interrupts.  It is
> also possible to design a connector which surfaces as something else,
> such as PCI devices (by terminating the connector in QEMU and utilizing
> its PCI emulation facilities), which would naturally employ 1:1 mapping.
> 
> So if 1:1 mapping is a critical feature (I would argue to the contrary),
> vbus can support it.
> 
>> It also pulls parts of the device model into the host kernel.
> 
> That is the point.  Most of it needs to be there for performance.

To clarify this point:

There are various a

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-23 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/23/2009 05:26 PM, Gregory Haskins wrote:
>>
>>   
 Yes, I'm having to create my own bus model, a-la lguest, virtio-pci,
 and
 virtio-s390. It isn't especially easy. I can steal lots of code from
 the
 lguest bus model, but sometimes it is good to generalize, especially
 after the fourth implemention or so. I think this is what GHaskins
 tried
 to do.


>>> Yes.  vbus is more finely layered so there is less code duplication.
>>>  
>> To clarify, Ira was correct in stating this generalizing some of these
>> components was one of the goals for the vbus project: IOW vbus finely
>> layers and defines what's below virtio, not replaces it.
>>
>> You can think of a virtio-stack like this:
>>
>> --
>> | virtio-net
>> --
>> | virtio-ring
>> --
>> | virtio-bus
>> --
>> | ? undefined ?
>> --
>>
>> IOW: The way I see it, virtio is a device interface model only.  The
>> rest of it is filled in by the virtio-transport and some kind of
>> back-end.
>>
>> So today, we can complete the "? undefined ?" block like this for KVM:
>>
>> --
>> | virtio-pci
>> --
>>   |
>> --
>> | kvm.ko
>> --
>> | qemu
>> --
>> | tuntap
>> --
>>
>> In this case, kvm.ko and tuntap are providing plumbing, and qemu is
>> providing a backend device model (pci-based, etc).
>>
>> You can, of course, plug a different stack in (such as virtio-lguest,
>> virtio-ira, etc) but you are more or less on your own to recreate many
>> of the various facilities contained in that stack (such as things
>> provided by QEMU, like discovery/hotswap/addressing), as Ira is
>> discovering.
>>
>> Vbus tries to commoditize more components in the stack (like the bus
>> model and backend-device model) so they don't need to be redesigned each
>> time we solve this "virtio-transport" problem.  IOW: stop the
>> proliferation of the need for pci-bus, lguest-bus, foo-bus underneath
>> virtio.  Instead, we can then focus on the value add on top, like the
>> models themselves or the simple glue between them.
>>
>> So now you might have something like
>>
>> --
>> | virtio-vbus
>> --
>> | vbus-proxy
>> --
>> | kvm-guest-connector
>> --
>>   |
>> --
>> | kvm.ko
>> --
>> | kvm-host-connector.ko
>> --
>> | vbus.ko
>> --
>> | virtio-net-backend.ko
>> --
>>
>> so now we don't need to worry about the bus-model or the device-model
>> framework.  We only need to implement the connector, etc.  This is handy
>> when you find yourself in an environment that doesn't support PCI (such
>> as Ira's rig, or userspace containers), or when you want to add features
>> that PCI doesn't have (such as fluid event channels for things like IPC
>> services, or priortizable interrupts, etc).
>>
> 
> Well, vbus does more, for example it tunnels interrupts instead of
> exposing them 1:1 on the native interface if it exists.

As I've previously explained, that trait is a function of the
kvm-connector I've chosen to implement, not of the overall design of vbus.

The reason why my kvm-connector is designed that way is because my early
testing/benchmarking shows one of the issues in KVM performance is the
ratio of exits per IO operation are fairly high, especially as your
scale io-load.  Therefore, the connector achieves a substantial
reduction in that ratio by treating "interrupts" to the same kind of
benefits that NAPI brought to general networking: That is, we enqueue
"interrupt" messages into a lockless ring and only hit the IDT for the
first occurrence.  Subsequent interrupts are injected in a
parallel/lockless manner, without hitting the IDT nor incurring an extra
EOI.  This pays dividends as the IO rate increases, which is when the
guest needs the most help.

OTOH, it is entirely possible to design the connector such that we
maintain a 1:1 ratio of signals to traditional IDT interrupts.  It is
also possible to design a connector which surfaces as something else,
such as PCI devices (by terminating the connector in QEMU and utilizing
its PCI emulation facilities), which would naturally employ 1:1 mapping.

So if 1:1 mapping is a critical feature (I would argue to the contrary),
vbus can support it.

> It also pulls parts of the device model into the host kernel.

That is the point.  Most of it needs to be there for performance.  And
what doesn't need to be there for performance can either be:

a) skipped at the discretion of the connector/device-model designer

OR

b) included because its trivially small subset of the model (e.g. a
mac

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-23 Thread Avi Kivity
On 09/23/2009 05:26 PM, Gregory Haskins wrote:
>
>
>>> Yes, I'm having to create my own bus model, a-la lguest, virtio-pci, and
>>> virtio-s390. It isn't especially easy. I can steal lots of code from the
>>> lguest bus model, but sometimes it is good to generalize, especially
>>> after the fourth implemention or so. I think this is what GHaskins tried
>>> to do.
>>>
>>>
>> Yes.  vbus is more finely layered so there is less code duplication.
>>  
> To clarify, Ira was correct in stating this generalizing some of these
> components was one of the goals for the vbus project: IOW vbus finely
> layers and defines what's below virtio, not replaces it.
>
> You can think of a virtio-stack like this:
>
> --
> | virtio-net
> --
> | virtio-ring
> --
> | virtio-bus
> --
> | ? undefined ?
> --
>
> IOW: The way I see it, virtio is a device interface model only.  The
> rest of it is filled in by the virtio-transport and some kind of back-end.
>
> So today, we can complete the "? undefined ?" block like this for KVM:
>
> --
> | virtio-pci
> --
>   |
> --
> | kvm.ko
> --
> | qemu
> --
> | tuntap
> --
>
> In this case, kvm.ko and tuntap are providing plumbing, and qemu is
> providing a backend device model (pci-based, etc).
>
> You can, of course, plug a different stack in (such as virtio-lguest,
> virtio-ira, etc) but you are more or less on your own to recreate many
> of the various facilities contained in that stack (such as things
> provided by QEMU, like discovery/hotswap/addressing), as Ira is discovering.
>
> Vbus tries to commoditize more components in the stack (like the bus
> model and backend-device model) so they don't need to be redesigned each
> time we solve this "virtio-transport" problem.  IOW: stop the
> proliferation of the need for pci-bus, lguest-bus, foo-bus underneath
> virtio.  Instead, we can then focus on the value add on top, like the
> models themselves or the simple glue between them.
>
> So now you might have something like
>
> --
> | virtio-vbus
> --
> | vbus-proxy
> --
> | kvm-guest-connector
> --
>   |
> --
> | kvm.ko
> --
> | kvm-host-connector.ko
> --
> | vbus.ko
> --
> | virtio-net-backend.ko
> --
>
> so now we don't need to worry about the bus-model or the device-model
> framework.  We only need to implement the connector, etc.  This is handy
> when you find yourself in an environment that doesn't support PCI (such
> as Ira's rig, or userspace containers), or when you want to add features
> that PCI doesn't have (such as fluid event channels for things like IPC
> services, or priortizable interrupts, etc).
>

Well, vbus does more, for example it tunnels interrupts instead of 
exposing them 1:1 on the native interface if it exists.  It also pulls 
parts of the device model into the host kernel.

>> The virtio layering was more or less dictated by Xen which doesn't have
>> shared memory (it uses grant references instead).  As a matter of fact
>> lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that
>> part is duplicated.  It's probably possible to add a virtio-shmem.ko
>> library that people who do have shared memory can reuse.
>>  
> Note that I do not believe the Xen folk use virtio, so while I can
> appreciate the foresight that went into that particular aspect of the
> design of the virtio model, I am not sure if its a realistic constraint.
>

Since a virtio goal was to reduce virtual device driver proliferation, 
it was necessary to accommodate Xen.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-23 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/22/2009 12:43 AM, Ira W. Snyder wrote:
>>
>>> Sure, virtio-ira and he is on his own to make a bus-model under that, or
>>> virtio-vbus + vbus-ira-connector to use the vbus framework.  Either
>>> model can work, I agree.
>>>
>>>  
>> Yes, I'm having to create my own bus model, a-la lguest, virtio-pci, and
>> virtio-s390. It isn't especially easy. I can steal lots of code from the
>> lguest bus model, but sometimes it is good to generalize, especially
>> after the fourth implemention or so. I think this is what GHaskins tried
>> to do.
>>
> 
> Yes.  vbus is more finely layered so there is less code duplication.

To clarify, Ira was correct in stating this generalizing some of these
components was one of the goals for the vbus project: IOW vbus finely
layers and defines what's below virtio, not replaces it.

You can think of a virtio-stack like this:

--
| virtio-net
--
| virtio-ring
--
| virtio-bus
--
| ? undefined ?
--

IOW: The way I see it, virtio is a device interface model only.  The
rest of it is filled in by the virtio-transport and some kind of back-end.

So today, we can complete the "? undefined ?" block like this for KVM:

--
| virtio-pci
--
 |
--
| kvm.ko
--
| qemu
--
| tuntap
--

In this case, kvm.ko and tuntap are providing plumbing, and qemu is
providing a backend device model (pci-based, etc).

You can, of course, plug a different stack in (such as virtio-lguest,
virtio-ira, etc) but you are more or less on your own to recreate many
of the various facilities contained in that stack (such as things
provided by QEMU, like discovery/hotswap/addressing), as Ira is discovering.

Vbus tries to commoditize more components in the stack (like the bus
model and backend-device model) so they don't need to be redesigned each
time we solve this "virtio-transport" problem.  IOW: stop the
proliferation of the need for pci-bus, lguest-bus, foo-bus underneath
virtio.  Instead, we can then focus on the value add on top, like the
models themselves or the simple glue between them.

So now you might have something like

--
| virtio-vbus
--
| vbus-proxy
--
| kvm-guest-connector
--
 |
--
| kvm.ko
--
| kvm-host-connector.ko
--
| vbus.ko
--
| virtio-net-backend.ko
--

so now we don't need to worry about the bus-model or the device-model
framework.  We only need to implement the connector, etc.  This is handy
when you find yourself in an environment that doesn't support PCI (such
as Ira's rig, or userspace containers), or when you want to add features
that PCI doesn't have (such as fluid event channels for things like IPC
services, or priortizable interrupts, etc).

> 
> The virtio layering was more or less dictated by Xen which doesn't have
> shared memory (it uses grant references instead).  As a matter of fact
> lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that
> part is duplicated.  It's probably possible to add a virtio-shmem.ko
> library that people who do have shared memory can reuse.

Note that I do not believe the Xen folk use virtio, so while I can
appreciate the foresight that went into that particular aspect of the
design of the virtio model, I am not sure if its a realistic constraint.

The reason why I decided to not worry about that particular model is
twofold:

1) Trying to support non shared-memory designs is prohibitively high for
my performance goals (for instance, requiring an exit on each
->add_buf() in addition to the ->kick()).

2) The Xen guys are unlikely to diverge from something like
xenbus/xennet anyway, so it would be for naught.

Therefore, I just went with a device model optimized for shared-memory
outright.

That said, I believe we can refactor what is called the
"vbus-proxy-device" into this virtio-shmem interface that you and
Anthony have described.  We could make the feature optional and only
support on architectures where this makes sense.



Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-22 Thread Avi Kivity
On 09/22/2009 06:25 PM, Ira W. Snyder wrote:
>
>> Yes.  vbus is more finely layered so there is less code duplication.
>>
>> The virtio layering was more or less dictated by Xen which doesn't have
>> shared memory (it uses grant references instead).  As a matter of fact
>> lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that
>> part is duplicated.  It's probably possible to add a virtio-shmem.ko
>> library that people who do have shared memory can reuse.
>>
>>  
> Seems like a nice benefit of vbus.
>

Yes, it is.  With some work virtio can gain that too (virtio-shmem.ko).

>>> I've given it some thought, and I think that running vhost-net (or
>>> similar) on the ppc boards, with virtio-net on the x86 crate server will
>>> work. The virtio-ring abstraction is almost good enough to work for this
>>> situation, but I had to re-invent it to work with my boards.
>>>
>>> I've exposed a 16K region of memory as PCI BAR1 from my ppc board.
>>> Remember that this is the "host" system. I used each 4K block as a
>>> "device descriptor" which contains:
>>>
>>> 1) the type of device, config space, etc. for virtio
>>> 2) the "desc" table (virtio memory descriptors, see virtio-ring)
>>> 3) the "avail" table (available entries in the desc table)
>>>
>>>
>> Won't access from x86 be slow to this memory (on the other hand, if you
>> change it to main memory access from ppc will be slow... really depends
>> on how your system is tuned.
>>
>>  
> Writes across the bus are fast, reads across the bus are slow. These are
> just the descriptor tables for memory buffers, not the physical memory
> buffers themselves.
>
> These only need to be written by the guest (x86), and read by the host
> (ppc). The host never changes the tables, so we can cache a copy in the
> guest, for a fast detach_buf() implementation (see virtio-ring, which
> I'm copying the design from).
>
> The only accesses are writes across the PCI bus. There is never a need
> to do a read (except for slow-path configuration).
>

Okay, sounds like what you're doing it optimal then.

> In the spirit of "post early and often", I'm making my code available,
> that's all. I'm asking anyone interested for some review, before I have
> to re-code this for about the fifth time now. I'm trying to avoid
> Haskins' situation, where he's invented and debugged a lot of new code,
> and then been told to do it completely differently.
>
> Yes, the code I posted is only compile-tested, because quite a lot of
> code (kernel and userspace) must be working before anything works at
> all. I hate to design the whole thing, then be told that something
> fundamental about it is wrong, and have to completely re-write it.
>

Understood.  Best to get a review from Rusty then.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-22 Thread Ira W. Snyder
On Tue, Sep 22, 2009 at 12:43:36PM +0300, Avi Kivity wrote:
> On 09/22/2009 12:43 AM, Ira W. Snyder wrote:
> >
> >> Sure, virtio-ira and he is on his own to make a bus-model under that, or
> >> virtio-vbus + vbus-ira-connector to use the vbus framework.  Either
> >> model can work, I agree.
> >>
> >>  
> > Yes, I'm having to create my own bus model, a-la lguest, virtio-pci, and
> > virtio-s390. It isn't especially easy. I can steal lots of code from the
> > lguest bus model, but sometimes it is good to generalize, especially
> > after the fourth implemention or so. I think this is what GHaskins tried
> > to do.
> >
> 
> Yes.  vbus is more finely layered so there is less code duplication.
> 
> The virtio layering was more or less dictated by Xen which doesn't have 
> shared memory (it uses grant references instead).  As a matter of fact 
> lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that 
> part is duplicated.  It's probably possible to add a virtio-shmem.ko 
> library that people who do have shared memory can reuse.
> 

Seems like a nice benefit of vbus.

> > I've given it some thought, and I think that running vhost-net (or
> > similar) on the ppc boards, with virtio-net on the x86 crate server will
> > work. The virtio-ring abstraction is almost good enough to work for this
> > situation, but I had to re-invent it to work with my boards.
> >
> > I've exposed a 16K region of memory as PCI BAR1 from my ppc board.
> > Remember that this is the "host" system. I used each 4K block as a
> > "device descriptor" which contains:
> >
> > 1) the type of device, config space, etc. for virtio
> > 2) the "desc" table (virtio memory descriptors, see virtio-ring)
> > 3) the "avail" table (available entries in the desc table)
> >
> 
> Won't access from x86 be slow to this memory (on the other hand, if you 
> change it to main memory access from ppc will be slow... really depends 
> on how your system is tuned.
> 

Writes across the bus are fast, reads across the bus are slow. These are
just the descriptor tables for memory buffers, not the physical memory
buffers themselves.

These only need to be written by the guest (x86), and read by the host
(ppc). The host never changes the tables, so we can cache a copy in the
guest, for a fast detach_buf() implementation (see virtio-ring, which
I'm copying the design from).

The only accesses are writes across the PCI bus. There is never a need
to do a read (except for slow-path configuration).

> > Parts 2 and 3 are repeated three times, to allow for a maximum of three
> > virtqueues per device. This is good enough for all current drivers.
> >
> 
> The plan is to switch to multiqueue soon.  Will not affect you if your 
> boards are uniprocessor or small smp.
> 

Everything I have is UP. I don't need extreme performance, either.
40MB/sec is the minimum I need to reach, though I'd like to have some
headroom.

For reference, using the CPU to handle data transfers, I get ~2MB/sec
transfers. Using the DMA engine, I've hit about 60MB/sec with my
"crossed-wires" virtio-net.

> > I've gotten plenty of email about this from lots of interested
> > developers. There are people who would like this kind of system to just
> > work, while having to write just some glue for their device, just like a
> > network driver. I hunch most people have created some proprietary mess
> > that basically works, and left it at that.
> >
> 
> So long as you keep the system-dependent features hookable or 
> configurable, it should work.
> 
> > So, here is a desperate cry for help. I'd like to make this work, and
> > I'd really like to see it in mainline. I'm trying to give back to the
> > community from which I've taken plenty.
> >
> 
> Not sure who you're crying for help to.  Once you get this working, post 
> patches.  If the patches are reasonably clean and don't impact 
> performance for the main use case, and if you can show the need, I 
> expect they'll be merged.
> 

In the spirit of "post early and often", I'm making my code available,
that's all. I'm asking anyone interested for some review, before I have
to re-code this for about the fifth time now. I'm trying to avoid
Haskins' situation, where he's invented and debugged a lot of new code,
and then been told to do it completely differently.

Yes, the code I posted is only compile-tested, because quite a lot of
code (kernel and userspace) must be working before anything works at
all. I hate to design the whole thing, then be told that something
fundamental about it is wrong, and have to completely re-write it.

Thanks for the comments,
Ira

> -- 
> error compiling committee.c: too many arguments to function
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-22 Thread Avi Kivity
On 09/22/2009 12:43 AM, Ira W. Snyder wrote:
>
>> Sure, virtio-ira and he is on his own to make a bus-model under that, or
>> virtio-vbus + vbus-ira-connector to use the vbus framework.  Either
>> model can work, I agree.
>>
>>  
> Yes, I'm having to create my own bus model, a-la lguest, virtio-pci, and
> virtio-s390. It isn't especially easy. I can steal lots of code from the
> lguest bus model, but sometimes it is good to generalize, especially
> after the fourth implemention or so. I think this is what GHaskins tried
> to do.
>

Yes.  vbus is more finely layered so there is less code duplication.

The virtio layering was more or less dictated by Xen which doesn't have 
shared memory (it uses grant references instead).  As a matter of fact 
lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that 
part is duplicated.  It's probably possible to add a virtio-shmem.ko 
library that people who do have shared memory can reuse.

> I've given it some thought, and I think that running vhost-net (or
> similar) on the ppc boards, with virtio-net on the x86 crate server will
> work. The virtio-ring abstraction is almost good enough to work for this
> situation, but I had to re-invent it to work with my boards.
>
> I've exposed a 16K region of memory as PCI BAR1 from my ppc board.
> Remember that this is the "host" system. I used each 4K block as a
> "device descriptor" which contains:
>
> 1) the type of device, config space, etc. for virtio
> 2) the "desc" table (virtio memory descriptors, see virtio-ring)
> 3) the "avail" table (available entries in the desc table)
>

Won't access from x86 be slow to this memory (on the other hand, if you 
change it to main memory access from ppc will be slow... really depends 
on how your system is tuned.

> Parts 2 and 3 are repeated three times, to allow for a maximum of three
> virtqueues per device. This is good enough for all current drivers.
>

The plan is to switch to multiqueue soon.  Will not affect you if your 
boards are uniprocessor or small smp.

> I've gotten plenty of email about this from lots of interested
> developers. There are people who would like this kind of system to just
> work, while having to write just some glue for their device, just like a
> network driver. I hunch most people have created some proprietary mess
> that basically works, and left it at that.
>

So long as you keep the system-dependent features hookable or 
configurable, it should work.

> So, here is a desperate cry for help. I'd like to make this work, and
> I'd really like to see it in mainline. I'm trying to give back to the
> community from which I've taken plenty.
>

Not sure who you're crying for help to.  Once you get this working, post 
patches.  If the patches are reasonably clean and don't impact 
performance for the main use case, and if you can show the need, I 
expect they'll be merged.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-21 Thread Ira W. Snyder
On Wed, Sep 16, 2009 at 11:11:57PM -0400, Gregory Haskins wrote:
> Avi Kivity wrote:
> > On 09/16/2009 10:22 PM, Gregory Haskins wrote:
> >> Avi Kivity wrote:
> >>   
> >>> On 09/16/2009 05:10 PM, Gregory Haskins wrote:
> >>> 
> > If kvm can do it, others can.
> >
> >  
>  The problem is that you seem to either hand-wave over details like
>  this,
>  or you give details that are pretty much exactly what vbus does
>  already.
> My point is that I've already sat down and thought about these
>  issues
>  and solved them in a freely available GPL'ed software package.
> 
> 
> >>> In the kernel.  IMO that's the wrong place for it.
> >>>  
> >> 3) "in-kernel": You can do something like virtio-net to vhost to
> >> potentially meet some of the requirements, but not all.
> >>
> >> In order to fully meet (3), you would need to do some of that stuff you
> >> mentioned in the last reply with muxing device-nr/reg-nr.  In addition,
> >> we need to have a facility for mapping eventfds and establishing a
> >> signaling mechanism (like PIO+qid), etc. KVM does this with
> >> IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be
> >> invented.
> >>
> > 
> > irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted.
> 
> Not per se, but it needs to be interfaced.  How do I register that
> eventfd with the fastpath in Ira's rig? How do I signal the eventfd
> (x86->ppc, and ppc->x86)?
> 

Sorry to reply so late to this thread, I've been on vacation for the
past week. If you'd like to continue in another thread, please start it
and CC me.

On the PPC, I've got a hardware "doorbell" register which generates 30
distiguishable interrupts over the PCI bus. I have outbound and inbound
registers, which can be used to signal the "other side".

I assume it isn't too much code to signal an eventfd in an interrupt
handler. I haven't gotten to this point in the code yet.

> To take it to the next level, how do I organize that mechanism so that
> it works for more than one IO-stream (e.g. address the various queues
> within ethernet or a different device like the console)?  KVM has
> IOEVENTFD and IRQFD managed with MSI and PIO.  This new rig does not
> have the luxury of an established IO paradigm.
> 
> Is vbus the only way to implement a solution?  No.  But it is _a_ way,
> and its one that was specifically designed to solve this very problem
> (as well as others).
> 
> (As an aside, note that you generally will want an abstraction on top of
> irqfd/eventfd like shm-signal or virtqueues to do shared-memory based
> event mitigation, but I digress.  That is a separate topic).
> 
> > 
> >> To meet performance, this stuff has to be in kernel and there has to be
> >> a way to manage it.
> > 
> > and management belongs in userspace.
> 
> vbus does not dictate where the management must be.  Its an extensible
> framework, governed by what you plug into it (ala connectors and devices).
> 
> For instance, the vbus-kvm connector in alacrityvm chooses to put DEVADD
> and DEVDROP hotswap events into the interrupt stream, because they are
> simple and we already needed the interrupt stream anyway for fast-path.
> 
> As another example: venet chose to put ->call(MACQUERY) "config-space"
> into its call namespace because its simple, and we already need
> ->calls() for fastpath.  It therefore exports an attribute to sysfs that
> allows the management app to set it.
> 
> I could likewise have designed the connector or device-model differently
> as to keep the mac-address and hotswap-events somewhere else (QEMU/PCI
> userspace) but this seems silly to me when they are so trivial, so I didn't.
> 
> > 
> >> Since vbus was designed to do exactly that, this is
> >> what I would advocate.  You could also reinvent these concepts and put
> >> your own mux and mapping code in place, in addition to all the other
> >> stuff that vbus does.  But I am not clear why anyone would want to.
> >>
> > 
> > Maybe they like their backward compatibility and Windows support.
> 
> This is really not relevant to this thread, since we are talking about
> Ira's hardware.  But if you must bring this up, then I will reiterate
> that you just design the connector to interface with QEMU+PCI and you
> have that too if that was important to you.
> 
> But on that topic: Since you could consider KVM a "motherboard
> manufacturer" of sorts (it just happens to be virtual hardware), I don't
> know why KVM seems to consider itself the only motherboard manufacturer
> in the world that has to make everything look legacy.  If a company like
> ASUS wants to add some cutting edge IO controller/bus, they simply do
> it.  Pretty much every product release may contain a different array of
> devices, many of which are not backwards compatible with any prior
> silicon.  The guy/gal installing Windows on that system may see a "?" in
> device-manager until they load a driver that supports the 

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-17 Thread Javier Guerra
On Wed, Sep 16, 2009 at 10:11 PM, Gregory Haskins
 wrote:
> It is certainly not a requirement to make said
> chip somehow work with existing drivers/facilities on bare metal, per
> se.  Why should virtual systems be different?

i'd guess it's an issue of support resources.  a hardware developer
creates a chip and immediately sells it, getting small but assured
revenue, with it they write (or pays to write) drivers for a couple of
releases, and stop to manufacture it as soon as it's not profitable.

software has a much longer lifetime, especially at the platform-level
(and KVM is a platform for a lot of us). also, being GPL, it's cheaper
to produce but has (much!) more limited resources.  creating a new
support issue is a scary thought.


-- 
Javier
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-17 Thread Avi Kivity
On 09/17/2009 06:11 AM, Gregory Haskins wrote:
>
>> irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted.
>>  
> Not per se, but it needs to be interfaced.  How do I register that
> eventfd with the fastpath in Ira's rig? How do I signal the eventfd
> (x86->ppc, and ppc->x86)?
>

You write a userspace or kernel module to do it.  It's a few dozen lines 
of code.

> To take it to the next level, how do I organize that mechanism so that
> it works for more than one IO-stream (e.g. address the various queues
> within ethernet or a different device like the console)?  KVM has
> IOEVENTFD and IRQFD managed with MSI and PIO.  This new rig does not
> have the luxury of an established IO paradigm.
>
> Is vbus the only way to implement a solution?  No.  But it is _a_ way,
> and its one that was specifically designed to solve this very problem
> (as well as others).
>

virtio assumes that the number of transports will be limited and 
interesting growth is in the number of device classes and drivers.  So 
we have support for just three transports, but 6 device classes (9p, 
rng, balloon, console, blk, net) and 8 drivers (the preceding 6 for 
linux, plus blk/net for Windows).  It would have nice to be able to 
write a new binding in Visual Basic but it's hardly a killer feature.


>>> Since vbus was designed to do exactly that, this is
>>> what I would advocate.  You could also reinvent these concepts and put
>>> your own mux and mapping code in place, in addition to all the other
>>> stuff that vbus does.  But I am not clear why anyone would want to.
>>>
>>>
>> Maybe they like their backward compatibility and Windows support.
>>  
> This is really not relevant to this thread, since we are talking about
> Ira's hardware.  But if you must bring this up, then I will reiterate
> that you just design the connector to interface with QEMU+PCI and you
> have that too if that was important to you.
>

Well, for Ira the major issue is probably inclusion in the upstream kernel.

> But on that topic: Since you could consider KVM a "motherboard
> manufacturer" of sorts (it just happens to be virtual hardware), I don't
> know why KVM seems to consider itself the only motherboard manufacturer
> in the world that has to make everything look legacy.  If a company like
> ASUS wants to add some cutting edge IO controller/bus, they simply do
> it.

No, they don't.  New buses are added through industry consortiums these 
days.  No one adds a bus that is only available with their machine, not 
even Apple.

> Pretty much every product release may contain a different array of
> devices, many of which are not backwards compatible with any prior
> silicon.  The guy/gal installing Windows on that system may see a "?" in
> device-manager until they load a driver that supports the new chip, and
> subsequently it works.  It is certainly not a requirement to make said
> chip somehow work with existing drivers/facilities on bare metal, per
> se.  Why should virtual systems be different?
>

Devices/drivers are a different matter, and if you have a virtio-net 
device you'll get the same "?" until you load the driver.  That's how 
people and the OS vendors expect things to work.

> What I was getting at is that you can't just hand-wave the datapath
> stuff.  We do fast path in KVM with IRQFD/IOEVENTFD+PIO, and we do
> device discovery/addressing with PCI.

That's not datapath stuff.

> Neither of those are available
> here in Ira's case yet the general concepts are needed.  Therefore, we
> have to come up with something else.
>

Ira has to implement virtio's ->kick() function and come up with 
something for discovery.  It's a lot less lines of code than there are 
messages in this thread.

>> Yes.  I'm all for reusing virtio, but I'm not going switch to vbus or
>> support both for this esoteric use case.
>>  
> With all due respect, no one asked you to.  This sub-thread was
> originally about using vhost in Ira's rig.  When problems surfaced in
> that proposed model, I highlighted that I had already addressed that
> problem in vbus, and here we are.
>

Ah, okay.  I have no interest in Ira choosing either virtio or vbus.



>> vhost-net somehow manages to work without the config stuff in the kernel.
>>  
> I was referring to data-path stuff, like signal and memory
> configuration/routing.
>

signal and memory configuration/routing are not data-path stuff.

>> Well, virtio has a similar abstraction on the guest side.  The host side
>> abstraction is limited to signalling since all configuration is in
>> userspace.  vhost-net ought to work for lguest and s390 without change.
>>  
> But IIUC that is primarily because the revectoring work is already in
> QEMU for virtio-u and it rides on that, right?  Not knocking that, thats
> nice and a distinct advantage.  It should just be noted that its based
> on sunk-cost, and not truly free.  Its just already paid for, which is
> different.

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Gregory Haskins
Michael S. Tsirkin wrote:
> On Wed, Sep 16, 2009 at 10:10:55AM -0400, Gregory Haskins wrote:
>>> There is no role reversal.
>> So if I have virtio-blk driver running on the x86 and vhost-blk device
>> running on the ppc board, I can use the ppc board as a block-device.
>> What if I really wanted to go the other way?
> 
> It seems ppc is the only one that can initiate DMA to an arbitrary
> address, so you can't do this really, or you can by tunneling each
> request back to ppc, or doing an extra data copy, but it's unlikely to
> work well.
> 
> The limitation comes from hardware, not from the API we use.

Understood, but presumably it can be exposed as a sub-function of the
ppc's board's register file as a DMA-controller service to the x86.
This would fall into the "tunnel requests back" category you mention
above, though I think "tunnel" implies a heavier protocol than it would
actually require.  This would look more like a PIO cycle to a DMA
controller than some higher layer protocol.

You would then utilize that DMA service inside the memctx, and it the
rest of vbus would work transparently with the existing devices/drivers.

I do agree it would require some benchmarking to determine its
feasibility, which is why I was careful to say things like "may work"
;).  I also do not even know if its possible to expose the service this
way on his system.  If this design is not possible or performs poorly, I
admit vbus is just as hosed as vhost in regard to the "role correction"
benefit.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Michael S. Tsirkin
On Wed, Sep 16, 2009 at 10:10:55AM -0400, Gregory Haskins wrote:
> > There is no role reversal.
> 
> So if I have virtio-blk driver running on the x86 and vhost-blk device
> running on the ppc board, I can use the ppc board as a block-device.
> What if I really wanted to go the other way?

It seems ppc is the only one that can initiate DMA to an arbitrary
address, so you can't do this really, or you can by tunneling each
request back to ppc, or doing an extra data copy, but it's unlikely to
work well.

The limitation comes from hardware, not from the API we use.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/16/2009 10:22 PM, Gregory Haskins wrote:
>> Avi Kivity wrote:
>>   
>>> On 09/16/2009 05:10 PM, Gregory Haskins wrote:
>>> 
> If kvm can do it, others can.
>
>  
 The problem is that you seem to either hand-wave over details like
 this,
 or you give details that are pretty much exactly what vbus does
 already.
My point is that I've already sat down and thought about these
 issues
 and solved them in a freely available GPL'ed software package.


>>> In the kernel.  IMO that's the wrong place for it.
>>>  
>> 3) "in-kernel": You can do something like virtio-net to vhost to
>> potentially meet some of the requirements, but not all.
>>
>> In order to fully meet (3), you would need to do some of that stuff you
>> mentioned in the last reply with muxing device-nr/reg-nr.  In addition,
>> we need to have a facility for mapping eventfds and establishing a
>> signaling mechanism (like PIO+qid), etc. KVM does this with
>> IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be
>> invented.
>>
> 
> irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted.

Not per se, but it needs to be interfaced.  How do I register that
eventfd with the fastpath in Ira's rig? How do I signal the eventfd
(x86->ppc, and ppc->x86)?

To take it to the next level, how do I organize that mechanism so that
it works for more than one IO-stream (e.g. address the various queues
within ethernet or a different device like the console)?  KVM has
IOEVENTFD and IRQFD managed with MSI and PIO.  This new rig does not
have the luxury of an established IO paradigm.

Is vbus the only way to implement a solution?  No.  But it is _a_ way,
and its one that was specifically designed to solve this very problem
(as well as others).

(As an aside, note that you generally will want an abstraction on top of
irqfd/eventfd like shm-signal or virtqueues to do shared-memory based
event mitigation, but I digress.  That is a separate topic).

> 
>> To meet performance, this stuff has to be in kernel and there has to be
>> a way to manage it.
> 
> and management belongs in userspace.

vbus does not dictate where the management must be.  Its an extensible
framework, governed by what you plug into it (ala connectors and devices).

For instance, the vbus-kvm connector in alacrityvm chooses to put DEVADD
and DEVDROP hotswap events into the interrupt stream, because they are
simple and we already needed the interrupt stream anyway for fast-path.

As another example: venet chose to put ->call(MACQUERY) "config-space"
into its call namespace because its simple, and we already need
->calls() for fastpath.  It therefore exports an attribute to sysfs that
allows the management app to set it.

I could likewise have designed the connector or device-model differently
as to keep the mac-address and hotswap-events somewhere else (QEMU/PCI
userspace) but this seems silly to me when they are so trivial, so I didn't.

> 
>> Since vbus was designed to do exactly that, this is
>> what I would advocate.  You could also reinvent these concepts and put
>> your own mux and mapping code in place, in addition to all the other
>> stuff that vbus does.  But I am not clear why anyone would want to.
>>
> 
> Maybe they like their backward compatibility and Windows support.

This is really not relevant to this thread, since we are talking about
Ira's hardware.  But if you must bring this up, then I will reiterate
that you just design the connector to interface with QEMU+PCI and you
have that too if that was important to you.

But on that topic: Since you could consider KVM a "motherboard
manufacturer" of sorts (it just happens to be virtual hardware), I don't
know why KVM seems to consider itself the only motherboard manufacturer
in the world that has to make everything look legacy.  If a company like
ASUS wants to add some cutting edge IO controller/bus, they simply do
it.  Pretty much every product release may contain a different array of
devices, many of which are not backwards compatible with any prior
silicon.  The guy/gal installing Windows on that system may see a "?" in
device-manager until they load a driver that supports the new chip, and
subsequently it works.  It is certainly not a requirement to make said
chip somehow work with existing drivers/facilities on bare metal, per
se.  Why should virtual systems be different?

So, yeah, the current design of the vbus-kvm connector means I have to
provide a driver.  This is understood, and I have no problem with that.

The only thing that I would agree has to be backwards compatible is the
BIOS/boot function.  If you can't support running an image like the
Windows installer, you are hosed.  If you can't use your ethernet until
you get a chance to install a driver after the install completes, its
just like most other systems in existence.  IOW: It's not a big deal.

For cases where the IO system is needed 

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Avi Kivity
On 09/16/2009 10:22 PM, Gregory Haskins wrote:
> Avi Kivity wrote:
>
>> On 09/16/2009 05:10 PM, Gregory Haskins wrote:
>>  
 If kvm can do it, others can.

  
>>> The problem is that you seem to either hand-wave over details like this,
>>> or you give details that are pretty much exactly what vbus does already.
>>>My point is that I've already sat down and thought about these issues
>>> and solved them in a freely available GPL'ed software package.
>>>
>>>
>> In the kernel.  IMO that's the wrong place for it.
>>  
> 3) "in-kernel": You can do something like virtio-net to vhost to
> potentially meet some of the requirements, but not all.
>
> In order to fully meet (3), you would need to do some of that stuff you
> mentioned in the last reply with muxing device-nr/reg-nr.  In addition,
> we need to have a facility for mapping eventfds and establishing a
> signaling mechanism (like PIO+qid), etc. KVM does this with
> IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be
> invented.
>

irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted.

> To meet performance, this stuff has to be in kernel and there has to be
> a way to manage it.

and management belongs in userspace.

> Since vbus was designed to do exactly that, this is
> what I would advocate.  You could also reinvent these concepts and put
> your own mux and mapping code in place, in addition to all the other
> stuff that vbus does.  But I am not clear why anyone would want to.
>

Maybe they like their backward compatibility and Windows support.

> So no, the kernel is not the wrong place for it.  Its the _only_ place
> for it.  Otherwise, just use (1) and be done with it.
>
>

I'm talking about the config stuff, not the data path.

>>   Further, if we adopt
>> vbus, if drop compatibility with existing guests or have to support both
>> vbus and virtio-pci.
>>  
> We already need to support both (at least to support Ira).  virtio-pci
> doesn't work here.  Something else (vbus, or vbus-like) is needed.
>

virtio-ira.

>>> So the question is: is your position that vbus is all wrong and you wish
>>> to create a new bus-like thing to solve the problem?
>>>
>> I don't intend to create anything new, I am satisfied with virtio.  If
>> it works for Ira, excellent.  If not, too bad.
>>  
> I think that about sums it up, then.
>

Yes.  I'm all for reusing virtio, but I'm not going switch to vbus or 
support both for this esoteric use case.

>>> If so, how is it
>>> different from what Ive already done?  More importantly, what specific
>>> objections do you have to what Ive done, as perhaps they can be fixed
>>> instead of starting over?
>>>
>>>
>> The two biggest objections are:
>> - the host side is in the kernel
>>  
> As it needs to be.
>

vhost-net somehow manages to work without the config stuff in the kernel.

> With all due respect, based on all of your comments in aggregate I
> really do not think you are truly grasping what I am actually building here.
>

Thanks.



>>> Bingo.  So now its a question of do you want to write this layer from
>>> scratch, or re-use my framework.
>>>
>>>
>> You will have to implement a connector or whatever for vbus as well.
>> vbus has more layers so it's probably smaller for vbus.
>>  
> Bingo!

(addictive, isn't it)

> That is precisely the point.
>
> All the stuff for how to map eventfds, handle signal mitigation, demux
> device/function pointers, isolation, etc, are built in.  All the
> connector has to do is transport the 4-6 verbs and provide a memory
> mapping/copy function, and the rest is reusable.  The device models
> would then work in all environments unmodified, and likewise the
> connectors could use all device-models unmodified.
>

Well, virtio has a similar abstraction on the guest side.  The host side 
abstraction is limited to signalling since all configuration is in 
userspace.  vhost-net ought to work for lguest and s390 without change.

>> It was already implemented three times for virtio, so apparently that's
>> extensible too.
>>  
> And to my point, I'm trying to commoditize as much of that process as
> possible on both the front and backends (at least for cases where
> performance matters) so that you don't need to reinvent the wheel for
> each one.
>

Since you're interested in any-to-any connectors it makes sense to you.  
I'm only interested in kvm-host-to-kvm-guest, so reducing the already 
minor effort to implement a new virtio binding has little appeal to me.

>> You mean, if the x86 board was able to access the disks and dma into the
>> ppb boards memory?  You'd run vhost-blk on x86 and virtio-net on ppc.
>>  
> But as we discussed, vhost doesn't work well if you try to run it on the
> x86 side due to its assumptions about pagable "guest" memory, right?  So
> is that even an option?  And even still, you would still need to solve
> t

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/16/2009 05:10 PM, Gregory Haskins wrote:
>>
>>> If kvm can do it, others can.
>>>  
>> The problem is that you seem to either hand-wave over details like this,
>> or you give details that are pretty much exactly what vbus does already.
>>   My point is that I've already sat down and thought about these issues
>> and solved them in a freely available GPL'ed software package.
>>
> 
> In the kernel.  IMO that's the wrong place for it.

In conversations with Ira, he indicated he needs kernel-to-kernel
ethernet for performance, and needs at least an ethernet and console
connectivity.  You could conceivably build a solution for this system 3
basic ways:

1) "completely" in userspace: use things like tuntap on the ppc boards,
and tunnel packets across a custom point-to-point connection formed over
the pci link to a userspace app on the x86 board.  This app then
reinjects the packets into the x86 kernel as a raw socket or tuntap,
etc.  Pretty much vanilla tuntap/vpn kind of stuff.  Advantage: very
little kernel code.  Problem: performance (citation: hopefully obvious).

2) "partially" in userspace: have an in-kernel virtio-net driver talk to
a userspace based virtio-net backend.  This is the (current, non-vhost
oriented) KVM/qemu model.  Advantage, re-uses existing kernel-code.
Problem: performance (citation: see alacrityvm numbers).

3) "in-kernel": You can do something like virtio-net to vhost to
potentially meet some of the requirements, but not all.

In order to fully meet (3), you would need to do some of that stuff you
mentioned in the last reply with muxing device-nr/reg-nr.  In addition,
we need to have a facility for mapping eventfds and establishing a
signaling mechanism (like PIO+qid), etc. KVM does this with
IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be
invented.

To meet performance, this stuff has to be in kernel and there has to be
a way to manage it.  Since vbus was designed to do exactly that, this is
what I would advocate.  You could also reinvent these concepts and put
your own mux and mapping code in place, in addition to all the other
stuff that vbus does.  But I am not clear why anyone would want to.

So no, the kernel is not the wrong place for it.  Its the _only_ place
for it.  Otherwise, just use (1) and be done with it.

>  Further, if we adopt
> vbus, if drop compatibility with existing guests or have to support both
> vbus and virtio-pci.

We already need to support both (at least to support Ira).  virtio-pci
doesn't work here.  Something else (vbus, or vbus-like) is needed.

> 
>> So the question is: is your position that vbus is all wrong and you wish
>> to create a new bus-like thing to solve the problem?
> 
> I don't intend to create anything new, I am satisfied with virtio.  If
> it works for Ira, excellent.  If not, too bad.

I think that about sums it up, then.


>  I believe it will work without too much trouble.

Afaict it wont for the reasons I mentioned.

> 
>> If so, how is it
>> different from what Ive already done?  More importantly, what specific
>> objections do you have to what Ive done, as perhaps they can be fixed
>> instead of starting over?
>>
> 
> The two biggest objections are:
> - the host side is in the kernel

As it needs to be.

> - the guest side is a new bus instead of reusing pci (on x86/kvm),
> making Windows support more difficult

Thats a function of the vbus-connector, which is different from
vbus-core.  If you don't like it (and I know you don't), we can write
one that interfaces to qemu's pci system.  I just don't like the
limitations that imposes, nor do I think we need that complexity of
dealing with a split PCI model, so I chose to not implement vbus-kvm
this way.

With all due respect, based on all of your comments in aggregate I
really do not think you are truly grasping what I am actually building here.

> 
> I guess these two are exactly what you think are vbus' greatest
> advantages, so we'll probably have to extend our agree-to-disagree on
> this one.
> 
> I also had issues with using just one interrupt vector to service all
> events, but that's easily fixed.

Again, function of the connector.

> 
>>> There is no guest and host in this scenario.  There's a device side
>>> (ppc) and a driver side (x86).  The driver side can access configuration
>>> information on the device side.  How to multiplex multiple devices is an
>>> interesting exercise for whoever writes the virtio binding for that
>>> setup.
>>>  
>> Bingo.  So now its a question of do you want to write this layer from
>> scratch, or re-use my framework.
>>
> 
> You will have to implement a connector or whatever for vbus as well. 
> vbus has more layers so it's probably smaller for vbus.

Bingo! That is precisely the point.

All the stuff for how to map eventfds, handle signal mitigation, demux
device/function pointers, isolation, etc, are built in.  All the
connector has to do is transport the 4-6 verbs and provide

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Michael S. Tsirkin
On Wed, Sep 16, 2009 at 05:22:37PM +0200, Arnd Bergmann wrote:
> On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> > On Wed, Sep 16, 2009 at 04:57:42PM +0200, Arnd Bergmann wrote:
> > > On Tuesday 15 September 2009, Michael S. Tsirkin wrote:
> > > > Userspace in x86 maps a PCI region, uses it for communication with ppc?
> > > 
> > > This might have portability issues. On x86 it should work, but if the
> > > host is powerpc or similar, you cannot reliably access PCI I/O memory
> > > through copy_tofrom_user but have to use memcpy_toio/fromio or 
> > > readl/writel
> > > calls, which don't work on user pointers.
> > > 
> > > Specifically on powerpc, copy_from_user cannot access unaligned buffers
> > > if they are on an I/O mapping.
> > > 
> > We are talking about doing this in userspace, not in kernel.
> 
> Ok, that's fine then. I thought the idea was to use the vhost_net driver

It's a separate issue. We were talking generally about configuration
and setup. Gregory implemented it in kernel, Avi wants it
moved to userspace, with only fastpath in kernel.

> to access the user memory, which would be a really cute hack otherwise,
> as you'd only need to provide the eventfds from a hardware specific
> driver and could use the regular virtio_net on the other side.
> 
>   Arnd <><

To do that, maybe copy to user on ppc can be fixed, or wrapped
around in a arch specific macro, so that everyone else
does not have to go through abstraction layers.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Avi Kivity
On 09/16/2009 05:10 PM, Gregory Haskins wrote:
>
>> If kvm can do it, others can.
>>  
> The problem is that you seem to either hand-wave over details like this,
> or you give details that are pretty much exactly what vbus does already.
>   My point is that I've already sat down and thought about these issues
> and solved them in a freely available GPL'ed software package.
>

In the kernel.  IMO that's the wrong place for it.  Further, if we adopt 
vbus, if drop compatibility with existing guests or have to support both 
vbus and virtio-pci.

> So the question is: is your position that vbus is all wrong and you wish
> to create a new bus-like thing to solve the problem?

I don't intend to create anything new, I am satisfied with virtio.  If 
it works for Ira, excellent.  If not, too bad.  I believe it will work 
without too much trouble.

> If so, how is it
> different from what Ive already done?  More importantly, what specific
> objections do you have to what Ive done, as perhaps they can be fixed
> instead of starting over?
>

The two biggest objections are:
- the host side is in the kernel
- the guest side is a new bus instead of reusing pci (on x86/kvm), 
making Windows support more difficult

I guess these two are exactly what you think are vbus' greatest 
advantages, so we'll probably have to extend our agree-to-disagree on 
this one.

I also had issues with using just one interrupt vector to service all 
events, but that's easily fixed.

>> There is no guest and host in this scenario.  There's a device side
>> (ppc) and a driver side (x86).  The driver side can access configuration
>> information on the device side.  How to multiplex multiple devices is an
>> interesting exercise for whoever writes the virtio binding for that setup.
>>  
> Bingo.  So now its a question of do you want to write this layer from
> scratch, or re-use my framework.
>

You will have to implement a connector or whatever for vbus as well.  
vbus has more layers so it's probably smaller for vbus.


  
>>> I am talking about how we would tunnel the config space for N devices
>>> across his transport.
>>>
>>>
>> Sounds trivial.
>>  
> No one said it was rocket science.  But it does need to be designed and
> implemented end-to-end, much of which Ive already done in what I hope is
> an extensible way.
>

It was already implemented three times for virtio, so apparently that's 
extensible too.

>>   Write an address containing the device number and
>> register number to on location, read or write data from another.
>>  
> You mean like the "u64 devh", and "u32 func" fields I have here for the
> vbus-kvm connector?
>
> http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=include/linux/vbus_pci.h;h=fe337590e644017392e4c9d9236150adb2333729;hb=ded8ce2005a85c174ba93ee26f8d67049ef11025#l64
>
>

Probably.



>>> That sounds convenient given his hardware, but it has its own set of
>>> problems.  For one, the configuration/inventory of these boards is now
>>> driven by the wrong side and has to be addressed.
>>>
>> Why is it the wrong side?
>>  
> "Wrong" is probably too harsh a word when looking at ethernet.  Its
> certainly "odd", and possibly inconvenient.  It would be like having
> vhost in a KVM guest, and virtio-net running on the host.  You could do
> it, but its weird and awkward.  Where it really falls apart and enters
> the "wrong" category is for non-symmetric devices, like disk-io.
>
>


It's not odd or wrong or wierd or awkward.  An ethernet NIC is not 
symmetric, one side does DMA and issues interrupts, the other uses its 
own memory.  That's exactly the case with Ira's setup.

If the ppc boards were to emulate a disk controller, you'd run 
virtio-blk on x86 and vhost-blk on the ppc boards.

>>> Second, the role
>>> reversal will likely not work for many models other than ethernet (e.g.
>>> virtio-console or virtio-blk drivers running on the x86 board would be
>>> naturally consuming services from the slave boards...virtio-net is an
>>> exception because 802.x is generally symmetrical).
>>>
>>>
>> There is no role reversal.
>>  
> So if I have virtio-blk driver running on the x86 and vhost-blk device
> running on the ppc board, I can use the ppc board as a block-device.
> What if I really wanted to go the other way?
>

You mean, if the x86 board was able to access the disks and dma into the 
ppb boards memory?  You'd run vhost-blk on x86 and virtio-net on ppc.

As long as you don't use the words "guest" and "host" but keep to 
"driver" and "device", it all works out.

>> The side doing dma is the device, the side
>> accessing its own memory is the driver.  Just like that other 1e12
>> driver/device pairs out there.
>>  
> IIUC, his ppc boards really can be seen as "guests" (they are linux
> instances that are utilizing services from the x86, not the other way
> around).

They aren't guests.  Guests d

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Arnd Bergmann
On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> On Wed, Sep 16, 2009 at 04:57:42PM +0200, Arnd Bergmann wrote:
> > On Tuesday 15 September 2009, Michael S. Tsirkin wrote:
> > > Userspace in x86 maps a PCI region, uses it for communication with ppc?
> > 
> > This might have portability issues. On x86 it should work, but if the
> > host is powerpc or similar, you cannot reliably access PCI I/O memory
> > through copy_tofrom_user but have to use memcpy_toio/fromio or readl/writel
> > calls, which don't work on user pointers.
> > 
> > Specifically on powerpc, copy_from_user cannot access unaligned buffers
> > if they are on an I/O mapping.
> > 
> We are talking about doing this in userspace, not in kernel.

Ok, that's fine then. I thought the idea was to use the vhost_net driver
to access the user memory, which would be a really cute hack otherwise,
as you'd only need to provide the eventfds from a hardware specific
driver and could use the regular virtio_net on the other side.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Michael S. Tsirkin
On Wed, Sep 16, 2009 at 04:57:42PM +0200, Arnd Bergmann wrote:
> On Tuesday 15 September 2009, Michael S. Tsirkin wrote:
> > Userspace in x86 maps a PCI region, uses it for communication with ppc?
> 
> This might have portability issues. On x86 it should work, but if the
> host is powerpc or similar, you cannot reliably access PCI I/O memory
> through copy_tofrom_user but have to use memcpy_toio/fromio or readl/writel
> calls, which don't work on user pointers.
> 
> Specifically on powerpc, copy_from_user cannot access unaligned buffers
> if they are on an I/O mapping.
> 
>   Arnd <><

We are talking about doing this in userspace, not in kernel.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Arnd Bergmann
On Tuesday 15 September 2009, Michael S. Tsirkin wrote:
> Userspace in x86 maps a PCI region, uses it for communication with ppc?

This might have portability issues. On x86 it should work, but if the
host is powerpc or similar, you cannot reliably access PCI I/O memory
through copy_tofrom_user but have to use memcpy_toio/fromio or readl/writel
calls, which don't work on user pointers.

Specifically on powerpc, copy_from_user cannot access unaligned buffers
if they are on an I/O mapping.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/16/2009 02:44 PM, Gregory Haskins wrote:
>> The problem isn't where to find the models...the problem is how to
>> aggregate multiple models to the guest.
>>
> 
> You mean configuration?
> 
>>> You instantiate multiple vhost-nets.  Multiple ethernet NICs is a
>>> supported configuration for kvm.
>>>  
>> But this is not KVM.
>>
>>
> 
> If kvm can do it, others can.

The problem is that you seem to either hand-wave over details like this,
or you give details that are pretty much exactly what vbus does already.
 My point is that I've already sat down and thought about these issues
and solved them in a freely available GPL'ed software package.

So the question is: is your position that vbus is all wrong and you wish
to create a new bus-like thing to solve the problem?  If so, how is it
different from what Ive already done?  More importantly, what specific
objections do you have to what Ive done, as perhaps they can be fixed
instead of starting over?

> 
 His slave boards surface themselves as PCI devices to the x86
 host.  So how do you use that to make multiple vhost-based devices (say
 two virtio-nets, and a virtio-console) communicate across the
 transport?


>>> I don't really see the difference between 1 and N here.
>>>  
>> A KVM surfaces N virtio-devices as N pci-devices to the guest.  What do
>> we do in Ira's case where the entire guest represents itself as a PCI
>> device to the host, and nothing the other way around?
>>
> 
> There is no guest and host in this scenario.  There's a device side
> (ppc) and a driver side (x86).  The driver side can access configuration
> information on the device side.  How to multiplex multiple devices is an
> interesting exercise for whoever writes the virtio binding for that setup.

Bingo.  So now its a question of do you want to write this layer from
scratch, or re-use my framework.

> 
 There are multiple ways to do this, but what I am saying is that
 whatever is conceived will start to look eerily like a vbus-connector,
 since this is one of its primary purposes ;)


>>> I'm not sure if you're talking about the configuration interface or data
>>> path here.
>>>  
>> I am talking about how we would tunnel the config space for N devices
>> across his transport.
>>
> 
> Sounds trivial.

No one said it was rocket science.  But it does need to be designed and
implemented end-to-end, much of which Ive already done in what I hope is
an extensible way.

>  Write an address containing the device number and
> register number to on location, read or write data from another.

You mean like the "u64 devh", and "u32 func" fields I have here for the
vbus-kvm connector?

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=include/linux/vbus_pci.h;h=fe337590e644017392e4c9d9236150adb2333729;hb=ded8ce2005a85c174ba93ee26f8d67049ef11025#l64

> Just
> like the PCI cf8/cfc interface.
> 
>>> They aren't in the "guest".  The best way to look at it is
>>>
>>> - a device side, with a dma engine: vhost-net
>>> - a driver side, only accessing its own memory: virtio-net
>>>
>>> Given that Ira's config has the dma engine in the ppc boards, that's
>>> where vhost-net would live (the ppc boards acting as NICs to the x86
>>> board, essentially).
>>>  
>> That sounds convenient given his hardware, but it has its own set of
>> problems.  For one, the configuration/inventory of these boards is now
>> driven by the wrong side and has to be addressed.
> 
> Why is it the wrong side?

"Wrong" is probably too harsh a word when looking at ethernet.  Its
certainly "odd", and possibly inconvenient.  It would be like having
vhost in a KVM guest, and virtio-net running on the host.  You could do
it, but its weird and awkward.  Where it really falls apart and enters
the "wrong" category is for non-symmetric devices, like disk-io.

> 
>> Second, the role
>> reversal will likely not work for many models other than ethernet (e.g.
>> virtio-console or virtio-blk drivers running on the x86 board would be
>> naturally consuming services from the slave boards...virtio-net is an
>> exception because 802.x is generally symmetrical).
>>
> 
> There is no role reversal.

So if I have virtio-blk driver running on the x86 and vhost-blk device
running on the ppc board, I can use the ppc board as a block-device.
What if I really wanted to go the other way?

> The side doing dma is the device, the side
> accessing its own memory is the driver.  Just like that other 1e12
> driver/device pairs out there.

IIUC, his ppc boards really can be seen as "guests" (they are linux
instances that are utilizing services from the x86, not the other way
around).  vhost forces the model to have the ppc boards act as IO-hosts,
whereas vbus would likely work in either direction due to its more
refined abstraction layer.

> 
>>> I have no idea, that's for Ira to solve.
>>>  
>> Bingo.  Thus

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Avi Kivity
On 09/16/2009 02:44 PM, Gregory Haskins wrote:
> The problem isn't where to find the models...the problem is how to
> aggregate multiple models to the guest.
>

You mean configuration?

>> You instantiate multiple vhost-nets.  Multiple ethernet NICs is a
>> supported configuration for kvm.
>>  
> But this is not KVM.
>
>

If kvm can do it, others can.

>>> His slave boards surface themselves as PCI devices to the x86
>>> host.  So how do you use that to make multiple vhost-based devices (say
>>> two virtio-nets, and a virtio-console) communicate across the transport?
>>>
>>>
>> I don't really see the difference between 1 and N here.
>>  
> A KVM surfaces N virtio-devices as N pci-devices to the guest.  What do
> we do in Ira's case where the entire guest represents itself as a PCI
> device to the host, and nothing the other way around?
>

There is no guest and host in this scenario.  There's a device side 
(ppc) and a driver side (x86).  The driver side can access configuration 
information on the device side.  How to multiplex multiple devices is an 
interesting exercise for whoever writes the virtio binding for that setup.

>>> There are multiple ways to do this, but what I am saying is that
>>> whatever is conceived will start to look eerily like a vbus-connector,
>>> since this is one of its primary purposes ;)
>>>
>>>
>> I'm not sure if you're talking about the configuration interface or data
>> path here.
>>  
> I am talking about how we would tunnel the config space for N devices
> across his transport.
>

Sounds trivial.  Write an address containing the device number and 
register number to on location, read or write data from another.  Just 
like the PCI cf8/cfc interface.

>> They aren't in the "guest".  The best way to look at it is
>>
>> - a device side, with a dma engine: vhost-net
>> - a driver side, only accessing its own memory: virtio-net
>>
>> Given that Ira's config has the dma engine in the ppc boards, that's
>> where vhost-net would live (the ppc boards acting as NICs to the x86
>> board, essentially).
>>  
> That sounds convenient given his hardware, but it has its own set of
> problems.  For one, the configuration/inventory of these boards is now
> driven by the wrong side and has to be addressed.

Why is it the wrong side?

> Second, the role
> reversal will likely not work for many models other than ethernet (e.g.
> virtio-console or virtio-blk drivers running on the x86 board would be
> naturally consuming services from the slave boards...virtio-net is an
> exception because 802.x is generally symmetrical).
>

There is no role reversal.  The side doing dma is the device, the side 
accessing its own memory is the driver.  Just like that other 1e12 
driver/device pairs out there.

>> I have no idea, that's for Ira to solve.
>>  
> Bingo.  Thus my statement that the vhost proposal is incomplete.  You
> have the virtio-net and vhost-net pieces covering the fast-path
> end-points, but nothing in the middle (transport, aggregation,
> config-space), and nothing on the management-side.  vbus provides most
> of the other pieces, and can even support the same virtio-net protocol
> on top.  The remaining part would be something like a udev script to
> populate the vbus with devices on board-insert events.
>

Of course vhost is incomplete, in the same sense that Linux is 
incomplete.  Both require userspace.

>> If he could fake the PCI
>> config space as seen by the x86 board, he would just show the normal pci
>> config and use virtio-pci (multiple channels would show up as a
>> multifunction device).  Given he can't, he needs to tunnel the virtio
>> config space some other way.
>>  
> Right, and note that vbus was designed to solve this.  This tunneling
> can, of course, be done without vbus using some other design.  However,
> whatever solution is created will look incredibly close to what I've
> already done, so my point is "why reinvent it"?
>

virtio requires binding for this tunnelling, so does vbus.  Its the same 
problem with the same solution.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/15/2009 11:08 PM, Gregory Haskins wrote:
>>
>>> There's virtio-console, virtio-blk etc.  None of these have kernel-mode
>>> servers, but these could be implemented if/when needed.
>>>  
>> IIUC, Ira already needs at least ethernet and console capability.
>>
>>
> 
> He's welcome to pick up the necessary code from qemu.

The problem isn't where to find the models...the problem is how to
aggregate multiple models to the guest.

> 
 b) what do you suppose this protocol to aggregate the connections would
 look like? (hint: this is what a vbus-connector does).


>>> You mean multilink?  You expose the device as a multiqueue.
>>>  
>> No, what I mean is how do you surface multiple ethernet and consoles to
>> the guests?  For Ira's case, I think he needs at minimum at least one of
>> each, and he mentioned possibly having two unique ethernets at one point.
>>
> 
> You instantiate multiple vhost-nets.  Multiple ethernet NICs is a
> supported configuration for kvm.

But this is not KVM.

> 
>> His slave boards surface themselves as PCI devices to the x86
>> host.  So how do you use that to make multiple vhost-based devices (say
>> two virtio-nets, and a virtio-console) communicate across the transport?
>>
> 
> I don't really see the difference between 1 and N here.

A KVM surfaces N virtio-devices as N pci-devices to the guest.  What do
we do in Ira's case where the entire guest represents itself as a PCI
device to the host, and nothing the other way around?


> 
>> There are multiple ways to do this, but what I am saying is that
>> whatever is conceived will start to look eerily like a vbus-connector,
>> since this is one of its primary purposes ;)
>>
> 
> I'm not sure if you're talking about the configuration interface or data
> path here.

I am talking about how we would tunnel the config space for N devices
across his transport.

As an aside, the vbus-kvm connector makes them one and the same, but
they do not have to be.  Its all in the connector design.

> 
 c) how do you manage the configuration, especially on a per-board
 basis?


>>> pci (for kvm/x86).
>>>  
>> Ok, for kvm understood (and I would also add "qemu" to that mix).  But
>> we are talking about vhost's application in a non-kvm environment here,
>> right?.
>>
>> So if the vhost-X devices are in the "guest",
> 
> They aren't in the "guest".  The best way to look at it is
> 
> - a device side, with a dma engine: vhost-net
> - a driver side, only accessing its own memory: virtio-net
> 
> Given that Ira's config has the dma engine in the ppc boards, that's
> where vhost-net would live (the ppc boards acting as NICs to the x86
> board, essentially).

That sounds convenient given his hardware, but it has its own set of
problems.  For one, the configuration/inventory of these boards is now
driven by the wrong side and has to be addressed.  Second, the role
reversal will likely not work for many models other than ethernet (e.g.
virtio-console or virtio-blk drivers running on the x86 board would be
naturally consuming services from the slave boards...virtio-net is an
exception because 802.x is generally symmetrical).

IIUC, vbus would support having the device models live properly on the
x86 side, solving both of these problems.  It would be impossible to
reverse vhost given its current design.

> 
>> and the x86 board is just
>> a slave...How do you tell each ppc board how many devices and what
>> config (e.g. MACs, etc) to instantiate?  Do you assume that they should
>> all be symmetric and based on positional (e.g. slot) data?  What if you
>> want asymmetric configurations (if not here, perhaps in a different
>> environment)?
>>
> 
> I have no idea, that's for Ira to solve.

Bingo.  Thus my statement that the vhost proposal is incomplete.  You
have the virtio-net and vhost-net pieces covering the fast-path
end-points, but nothing in the middle (transport, aggregation,
config-space), and nothing on the management-side.  vbus provides most
of the other pieces, and can even support the same virtio-net protocol
on top.  The remaining part would be something like a udev script to
populate the vbus with devices on board-insert events.

> If he could fake the PCI
> config space as seen by the x86 board, he would just show the normal pci
> config and use virtio-pci (multiple channels would show up as a
> multifunction device).  Given he can't, he needs to tunnel the virtio
> config space some other way.

Right, and note that vbus was designed to solve this.  This tunneling
can, of course, be done without vbus using some other design.  However,
whatever solution is created will look incredibly close to what I've
already done, so my point is "why reinvent it"?

> 
>>> Yes.  virtio is really virtualization oriented.
>>>  
>> I would say that its vhost in particular that is virtualization
>> oriented.  virtio, as a concept, generally should work in physical
>

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Avi Kivity
On 09/15/2009 11:08 PM, Gregory Haskins wrote:
>
>> There's virtio-console, virtio-blk etc.  None of these have kernel-mode
>> servers, but these could be implemented if/when needed.
>>  
> IIUC, Ira already needs at least ethernet and console capability.
>
>

He's welcome to pick up the necessary code from qemu.

>>> b) what do you suppose this protocol to aggregate the connections would
>>> look like? (hint: this is what a vbus-connector does).
>>>
>>>
>> You mean multilink?  You expose the device as a multiqueue.
>>  
> No, what I mean is how do you surface multiple ethernet and consoles to
> the guests?  For Ira's case, I think he needs at minimum at least one of
> each, and he mentioned possibly having two unique ethernets at one point.
>

You instantiate multiple vhost-nets.  Multiple ethernet NICs is a 
supported configuration for kvm.

> His slave boards surface themselves as PCI devices to the x86
> host.  So how do you use that to make multiple vhost-based devices (say
> two virtio-nets, and a virtio-console) communicate across the transport?
>

I don't really see the difference between 1 and N here.

> There are multiple ways to do this, but what I am saying is that
> whatever is conceived will start to look eerily like a vbus-connector,
> since this is one of its primary purposes ;)
>

I'm not sure if you're talking about the configuration interface or data 
path here.

>>> c) how do you manage the configuration, especially on a per-board basis?
>>>
>>>
>> pci (for kvm/x86).
>>  
> Ok, for kvm understood (and I would also add "qemu" to that mix).  But
> we are talking about vhost's application in a non-kvm environment here,
> right?.
>
> So if the vhost-X devices are in the "guest",

They aren't in the "guest".  The best way to look at it is

- a device side, with a dma engine: vhost-net
- a driver side, only accessing its own memory: virtio-net

Given that Ira's config has the dma engine in the ppc boards, that's 
where vhost-net would live (the ppc boards acting as NICs to the x86 
board, essentially).

> and the x86 board is just
> a slave...How do you tell each ppc board how many devices and what
> config (e.g. MACs, etc) to instantiate?  Do you assume that they should
> all be symmetric and based on positional (e.g. slot) data?  What if you
> want asymmetric configurations (if not here, perhaps in a different
> environment)?
>

I have no idea, that's for Ira to solve.  If he could fake the PCI 
config space as seen by the x86 board, he would just show the normal pci 
config and use virtio-pci (multiple channels would show up as a 
multifunction device).  Given he can't, he needs to tunnel the virtio 
config space some other way.

>> Yes.  virtio is really virtualization oriented.
>>  
> I would say that its vhost in particular that is virtualization
> oriented.  virtio, as a concept, generally should work in physical
> systems, if perhaps with some minor modifications.  The biggest "limit"
> is having "virt" in its name ;)
>

Let me rephrase.  The virtio developers are virtualization oriented.  If 
it works for non-virt applications, that's good, but not a design goal.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Gregory Haskins
Michael S. Tsirkin wrote:
> On Tue, Sep 15, 2009 at 05:39:27PM -0400, Gregory Haskins wrote:
>> Michael S. Tsirkin wrote:
>>> On Tue, Sep 15, 2009 at 04:43:58PM -0400, Gregory Haskins wrote:
 Michael S. Tsirkin wrote:
> On Tue, Sep 15, 2009 at 04:08:23PM -0400, Gregory Haskins wrote:
>> No, what I mean is how do you surface multiple ethernet and consoles to
>> the guests?  For Ira's case, I think he needs at minimum at least one of
>> each, and he mentioned possibly having two unique ethernets at one point.
>>
>> His slave boards surface themselves as PCI devices to the x86
>> host.  So how do you use that to make multiple vhost-based devices (say
>> two virtio-nets, and a virtio-console) communicate across the transport?
>>
>> There are multiple ways to do this, but what I am saying is that
>> whatever is conceived will start to look eerily like a vbus-connector,
>> since this is one of its primary purposes ;)
> Can't all this be in userspace?
 Can you outline your proposal?

 -Greg

>>> Userspace in x86 maps a PCI region, uses it for communication with ppc?
>>>
>> And what do you propose this communication to look like?
> 
> Who cares? Implement vbus protocol there if you like.
> 

Exactly.  My point is that you need something like a vbus protocol there. ;)

Here is the protocol I run over PCI in AlacrityVM:

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=include/linux/vbus_pci.h;h=fe337590e644017392e4c9d9236150adb2333729;hb=ded8ce2005a85c174ba93ee26f8d67049ef11025

And I guess to your point, yes the protocol can technically be in
userspace (outside of whatever you need for the in-kernel portion of the
communication transport, if any.

The vbus-connector design does not specify where the protocol needs to
take place, per se.  Note, however, for performance reasons some parts
of the protocol may want to be in the kernel (such as DEVCALL and
SHMSIGNAL).  It is for this reason that I just run all of it there,
because IMO its simpler than splitting it up.  The slow path stuff just
rides on infrastructure that I need for fast-path anyway, so it doesn't
really cost me anything additional.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Michael S. Tsirkin
On Tue, Sep 15, 2009 at 05:39:27PM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Tue, Sep 15, 2009 at 04:43:58PM -0400, Gregory Haskins wrote:
> >> Michael S. Tsirkin wrote:
> >>> On Tue, Sep 15, 2009 at 04:08:23PM -0400, Gregory Haskins wrote:
>  No, what I mean is how do you surface multiple ethernet and consoles to
>  the guests?  For Ira's case, I think he needs at minimum at least one of
>  each, and he mentioned possibly having two unique ethernets at one point.
> 
>  His slave boards surface themselves as PCI devices to the x86
>  host.  So how do you use that to make multiple vhost-based devices (say
>  two virtio-nets, and a virtio-console) communicate across the transport?
> 
>  There are multiple ways to do this, but what I am saying is that
>  whatever is conceived will start to look eerily like a vbus-connector,
>  since this is one of its primary purposes ;)
> >>> Can't all this be in userspace?
> >> Can you outline your proposal?
> >>
> >> -Greg
> >>
> > 
> > Userspace in x86 maps a PCI region, uses it for communication with ppc?
> > 
> 
> And what do you propose this communication to look like?

Who cares? Implement vbus protocol there if you like.

> -Greg
> 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Gregory Haskins
Michael S. Tsirkin wrote:
> On Tue, Sep 15, 2009 at 04:43:58PM -0400, Gregory Haskins wrote:
>> Michael S. Tsirkin wrote:
>>> On Tue, Sep 15, 2009 at 04:08:23PM -0400, Gregory Haskins wrote:
 No, what I mean is how do you surface multiple ethernet and consoles to
 the guests?  For Ira's case, I think he needs at minimum at least one of
 each, and he mentioned possibly having two unique ethernets at one point.

 His slave boards surface themselves as PCI devices to the x86
 host.  So how do you use that to make multiple vhost-based devices (say
 two virtio-nets, and a virtio-console) communicate across the transport?

 There are multiple ways to do this, but what I am saying is that
 whatever is conceived will start to look eerily like a vbus-connector,
 since this is one of its primary purposes ;)
>>> Can't all this be in userspace?
>> Can you outline your proposal?
>>
>> -Greg
>>
> 
> Userspace in x86 maps a PCI region, uses it for communication with ppc?
> 

And what do you propose this communication to look like?

-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Michael S. Tsirkin
On Tue, Sep 15, 2009 at 04:43:58PM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Tue, Sep 15, 2009 at 04:08:23PM -0400, Gregory Haskins wrote:
> >> No, what I mean is how do you surface multiple ethernet and consoles to
> >> the guests?  For Ira's case, I think he needs at minimum at least one of
> >> each, and he mentioned possibly having two unique ethernets at one point.
> >>
> >> His slave boards surface themselves as PCI devices to the x86
> >> host.  So how do you use that to make multiple vhost-based devices (say
> >> two virtio-nets, and a virtio-console) communicate across the transport?
> >>
> >> There are multiple ways to do this, but what I am saying is that
> >> whatever is conceived will start to look eerily like a vbus-connector,
> >> since this is one of its primary purposes ;)
> > 
> > Can't all this be in userspace?
> 
> Can you outline your proposal?
> 
> -Greg
> 

Userspace in x86 maps a PCI region, uses it for communication with ppc?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Gregory Haskins
Michael S. Tsirkin wrote:
> On Tue, Sep 15, 2009 at 04:08:23PM -0400, Gregory Haskins wrote:
>> No, what I mean is how do you surface multiple ethernet and consoles to
>> the guests?  For Ira's case, I think he needs at minimum at least one of
>> each, and he mentioned possibly having two unique ethernets at one point.
>>
>> His slave boards surface themselves as PCI devices to the x86
>> host.  So how do you use that to make multiple vhost-based devices (say
>> two virtio-nets, and a virtio-console) communicate across the transport?
>>
>> There are multiple ways to do this, but what I am saying is that
>> whatever is conceived will start to look eerily like a vbus-connector,
>> since this is one of its primary purposes ;)
> 
> Can't all this be in userspace?

Can you outline your proposal?

-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Michael S. Tsirkin
On Tue, Sep 15, 2009 at 04:08:23PM -0400, Gregory Haskins wrote:
> No, what I mean is how do you surface multiple ethernet and consoles to
> the guests?  For Ira's case, I think he needs at minimum at least one of
> each, and he mentioned possibly having two unique ethernets at one point.
> 
> His slave boards surface themselves as PCI devices to the x86
> host.  So how do you use that to make multiple vhost-based devices (say
> two virtio-nets, and a virtio-console) communicate across the transport?
> 
> There are multiple ways to do this, but what I am saying is that
> whatever is conceived will start to look eerily like a vbus-connector,
> since this is one of its primary purposes ;)

Can't all this be in userspace?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/15/2009 04:50 PM, Gregory Haskins wrote:
>>> Why?  vhost will call get_user_pages() or copy_*_user() which ought to
>>> do the right thing.
>>>  
>> I was speaking generally, not specifically to Ira's architecture.  What
>> I mean is that vbus was designed to work without assuming that the
>> memory is pageable.  There are environments in which the host is not
>> capable of mapping hvas/*page, but the memctx->copy_to/copy_from
>> paradigm could still work (think rdma, for instance).
>>
> 
> Sure, vbus is more flexible here.
> 
 As an aside: a bigger issue is that, iiuc, Ira wants more than a single
 ethernet channel in his design (multiple ethernets, consoles, etc).  A
 vhost solution in this environment is incomplete.


>>> Why?  Instantiate as many vhost-nets as needed.
>>>  
>> a) what about non-ethernets?
>>
> 
> There's virtio-console, virtio-blk etc.  None of these have kernel-mode
> servers, but these could be implemented if/when needed.

IIUC, Ira already needs at least ethernet and console capability.

> 
>> b) what do you suppose this protocol to aggregate the connections would
>> look like? (hint: this is what a vbus-connector does).
>>
> 
> You mean multilink?  You expose the device as a multiqueue.

No, what I mean is how do you surface multiple ethernet and consoles to
the guests?  For Ira's case, I think he needs at minimum at least one of
each, and he mentioned possibly having two unique ethernets at one point.

His slave boards surface themselves as PCI devices to the x86
host.  So how do you use that to make multiple vhost-based devices (say
two virtio-nets, and a virtio-console) communicate across the transport?

There are multiple ways to do this, but what I am saying is that
whatever is conceived will start to look eerily like a vbus-connector,
since this is one of its primary purposes ;)


> 
>> c) how do you manage the configuration, especially on a per-board basis?
>>
> 
> pci (for kvm/x86).

Ok, for kvm understood (and I would also add "qemu" to that mix).  But
we are talking about vhost's application in a non-kvm environment here,
right?.

So if the vhost-X devices are in the "guest", and the x86 board is just
a slave...How do you tell each ppc board how many devices and what
config (e.g. MACs, etc) to instantiate?  Do you assume that they should
all be symmetric and based on positional (e.g. slot) data?  What if you
want asymmetric configurations (if not here, perhaps in a different
environment)?

> 
>> Actually I have patches queued to allow vbus to be managed via ioctls as
>> well, per your feedback (and it solves the permissions/lifetime
>> critisims in alacrityvm-v0.1).
>>
> 
> That will make qemu integration easier.
> 
>>>   The only difference is the implementation.  vhost-net
>>> leaves much more to userspace, that's the main difference.
>>>  
>> Also,
>>
>> *) vhost is virtio-net specific, whereas vbus is a more generic device
>> model where thing like virtio-net or venet ride on top.
>>
> 
> I think vhost-net is separated into vhost and vhost-net.

Thats good.

> 
>> *) vhost is only designed to work with environments that look very
>> similar to a KVM guest (slot/hva translatable).  vbus can bridge various
>> environments by abstracting the key components (such as memory access).
>>
> 
> Yes.  virtio is really virtualization oriented.

I would say that its vhost in particular that is virtualization
oriented.  virtio, as a concept, generally should work in physical
systems, if perhaps with some minor modifications.  The biggest "limit"
is having "virt" in its name ;)

> 
>> *) vhost requires an active userspace management daemon, whereas vbus
>> can be driven by transient components, like scripts (ala udev)
>>
> 
> vhost by design leaves configuration and handshaking to userspace.  I
> see it as an advantage.

The misconception here is that vbus by design _doesn't define_ where
configuration/handshaking happens.  It is primarily implemented by a
modular component called a "vbus-connector", and _I_ see this
flexibility as an advantage.  vhost on the other hand depends on a
active userspace component and a slots/hva memory design, which is more
limiting in where it can be used and forces you to split the logic.
However, I think we both more or less agree on this point already.

For the record, vbus itself is simply a resource container for
virtual-devices, which provides abstractions for the various points of
interest to generalizing PV (memory, signals, etc) and the proper
isolation and protection guarantees.  What you do with it is defined by
the modular virtual-devices (e.g. virtion-net, venet, sched, hrt, scsi,
rdma, etc) and vbus-connectors (vbus-kvm, etc) you plug into it.

As an example, you could emulate the vhost design in vbus by writing a
"vbus-vhost" connector.  This connector would be very thin and terminate
locally in QEMU.  It would provide a ioctl-based verb na

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Avi Kivity
On 09/15/2009 04:50 PM, Gregory Haskins wrote:
>> Why?  vhost will call get_user_pages() or copy_*_user() which ought to
>> do the right thing.
>>  
> I was speaking generally, not specifically to Ira's architecture.  What
> I mean is that vbus was designed to work without assuming that the
> memory is pageable.  There are environments in which the host is not
> capable of mapping hvas/*page, but the memctx->copy_to/copy_from
> paradigm could still work (think rdma, for instance).
>

Sure, vbus is more flexible here.

>>> As an aside: a bigger issue is that, iiuc, Ira wants more than a single
>>> ethernet channel in his design (multiple ethernets, consoles, etc).  A
>>> vhost solution in this environment is incomplete.
>>>
>>>
>> Why?  Instantiate as many vhost-nets as needed.
>>  
> a) what about non-ethernets?
>

There's virtio-console, virtio-blk etc.  None of these have kernel-mode 
servers, but these could be implemented if/when needed.

> b) what do you suppose this protocol to aggregate the connections would
> look like? (hint: this is what a vbus-connector does).
>

You mean multilink?  You expose the device as a multiqueue.

> c) how do you manage the configuration, especially on a per-board basis?
>

pci (for kvm/x86).

> Actually I have patches queued to allow vbus to be managed via ioctls as
> well, per your feedback (and it solves the permissions/lifetime
> critisims in alacrityvm-v0.1).
>

That will make qemu integration easier.

>>   The only difference is the implementation.  vhost-net
>> leaves much more to userspace, that's the main difference.
>>  
> Also,
>
> *) vhost is virtio-net specific, whereas vbus is a more generic device
> model where thing like virtio-net or venet ride on top.
>

I think vhost-net is separated into vhost and vhost-net.

> *) vhost is only designed to work with environments that look very
> similar to a KVM guest (slot/hva translatable).  vbus can bridge various
> environments by abstracting the key components (such as memory access).
>

Yes.  virtio is really virtualization oriented.

> *) vhost requires an active userspace management daemon, whereas vbus
> can be driven by transient components, like scripts (ala udev)
>

vhost by design leaves configuration and handshaking to userspace.  I 
see it as an advantage.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Michael S. Tsirkin
On Tue, Sep 15, 2009 at 09:50:39AM -0400, Gregory Haskins wrote:
> Avi Kivity wrote:
> > On 09/15/2009 04:03 PM, Gregory Haskins wrote:
> >>
> >>> In this case the x86 is the owner and the ppc boards use translated
> >>> access.  Just switch drivers and device and it falls into place.
> >>>
> >>>  
> >> You could switch vbus roles as well, I suppose.
> > 
> > Right, there's not real difference in this regard.
> > 
> >> Another potential
> >> option is that he can stop mapping host memory on the guest so that it
> >> follows the more traditional model.  As a bus-master device, the ppc
> >> boards should have access to any host memory at least in the GFP_DMA
> >> range, which would include all relevant pointers here.
> >>
> >> I digress:  I was primarily addressing the concern that Ira would need
> >> to manage the "host" side of the link using hvas mapped from userspace
> >> (even if host side is the ppc boards).  vbus abstracts that access so as
> >> to allow something other than userspace/hva mappings.  OTOH, having each
> >> ppc board run a userspace app to do the mapping on its behalf and feed
> >> it to vhost is probably not a huge deal either.  Where vhost might
> >> really fall apart is when any assumptions about pageable memory occur,
> >> if any.
> >>
> > 
> > Why?  vhost will call get_user_pages() or copy_*_user() which ought to
> > do the right thing.
> 
> I was speaking generally, not specifically to Ira's architecture.  What
> I mean is that vbus was designed to work without assuming that the
> memory is pageable.  There are environments in which the host is not
> capable of mapping hvas/*page, but the memctx->copy_to/copy_from
> paradigm could still work (think rdma, for instance).

rdma interfaces are typically asynchronous, so blocking
copy_from/copy_to can be made to work, but likely won't work
that well. DMA might work better if it is asynchronous as well.

Assuming a synchronous copy is what we need - maybe the issue is that
there aren't good APIs for x86/ppc communication? If so, sticking them in
vhost might not be the best place.  Maybe the specific platform can
redefine copy_to/from_user to do the right thing? Or, maybe add another
API for that ...

> > 
> >> As an aside: a bigger issue is that, iiuc, Ira wants more than a single
> >> ethernet channel in his design (multiple ethernets, consoles, etc).  A
> >> vhost solution in this environment is incomplete.
> >>
> > 
> > Why?  Instantiate as many vhost-nets as needed.
> 
> a) what about non-ethernets?

vhost-net actually does not care.
the packet is passed on to a socket, we are done.

> b) what do you suppose this protocol to aggregate the connections would
> look like? (hint: this is what a vbus-connector does).

You are talking about management protocol between ppc and x86, right?
One wonders why does it have to be in kernel at all.

> c) how do you manage the configuration, especially on a per-board basis?

not sure what a board is, but configuration is done in userspace.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/15/2009 04:03 PM, Gregory Haskins wrote:
>>
>>> In this case the x86 is the owner and the ppc boards use translated
>>> access.  Just switch drivers and device and it falls into place.
>>>
>>>  
>> You could switch vbus roles as well, I suppose.
> 
> Right, there's not real difference in this regard.
> 
>> Another potential
>> option is that he can stop mapping host memory on the guest so that it
>> follows the more traditional model.  As a bus-master device, the ppc
>> boards should have access to any host memory at least in the GFP_DMA
>> range, which would include all relevant pointers here.
>>
>> I digress:  I was primarily addressing the concern that Ira would need
>> to manage the "host" side of the link using hvas mapped from userspace
>> (even if host side is the ppc boards).  vbus abstracts that access so as
>> to allow something other than userspace/hva mappings.  OTOH, having each
>> ppc board run a userspace app to do the mapping on its behalf and feed
>> it to vhost is probably not a huge deal either.  Where vhost might
>> really fall apart is when any assumptions about pageable memory occur,
>> if any.
>>
> 
> Why?  vhost will call get_user_pages() or copy_*_user() which ought to
> do the right thing.

I was speaking generally, not specifically to Ira's architecture.  What
I mean is that vbus was designed to work without assuming that the
memory is pageable.  There are environments in which the host is not
capable of mapping hvas/*page, but the memctx->copy_to/copy_from
paradigm could still work (think rdma, for instance).

> 
>> As an aside: a bigger issue is that, iiuc, Ira wants more than a single
>> ethernet channel in his design (multiple ethernets, consoles, etc).  A
>> vhost solution in this environment is incomplete.
>>
> 
> Why?  Instantiate as many vhost-nets as needed.

a) what about non-ethernets?
b) what do you suppose this protocol to aggregate the connections would
look like? (hint: this is what a vbus-connector does).
c) how do you manage the configuration, especially on a per-board basis?

> 
>> Note that Ira's architecture highlights that vbus's explicit management
>> interface is more valuable here than it is in KVM, since KVM already has
>> its own management interface via QEMU.
>>
> 
> vhost-net and vbus both need management, vhost-net via ioctls and vbus
> via configfs.

Actually I have patches queued to allow vbus to be managed via ioctls as
well, per your feedback (and it solves the permissions/lifetime
critisims in alacrityvm-v0.1).

>  The only difference is the implementation.  vhost-net
> leaves much more to userspace, that's the main difference.

Also,

*) vhost is virtio-net specific, whereas vbus is a more generic device
model where thing like virtio-net or venet ride on top.

*) vhost is only designed to work with environments that look very
similar to a KVM guest (slot/hva translatable).  vbus can bridge various
environments by abstracting the key components (such as memory access).

*) vhost requires an active userspace management daemon, whereas vbus
can be driven by transient components, like scripts (ala udev)

Kind Regards,
-Greg




signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Avi Kivity
On 09/15/2009 04:03 PM, Gregory Haskins wrote:
>
>> In this case the x86 is the owner and the ppc boards use translated
>> access.  Just switch drivers and device and it falls into place.
>>
>>  
> You could switch vbus roles as well, I suppose.

Right, there's not real difference in this regard.

> Another potential
> option is that he can stop mapping host memory on the guest so that it
> follows the more traditional model.  As a bus-master device, the ppc
> boards should have access to any host memory at least in the GFP_DMA
> range, which would include all relevant pointers here.
>
> I digress:  I was primarily addressing the concern that Ira would need
> to manage the "host" side of the link using hvas mapped from userspace
> (even if host side is the ppc boards).  vbus abstracts that access so as
> to allow something other than userspace/hva mappings.  OTOH, having each
> ppc board run a userspace app to do the mapping on its behalf and feed
> it to vhost is probably not a huge deal either.  Where vhost might
> really fall apart is when any assumptions about pageable memory occur,
> if any.
>

Why?  vhost will call get_user_pages() or copy_*_user() which ought to 
do the right thing.

> As an aside: a bigger issue is that, iiuc, Ira wants more than a single
> ethernet channel in his design (multiple ethernets, consoles, etc).  A
> vhost solution in this environment is incomplete.
>

Why?  Instantiate as many vhost-nets as needed.

> Note that Ira's architecture highlights that vbus's explicit management
> interface is more valuable here than it is in KVM, since KVM already has
> its own management interface via QEMU.
>

vhost-net and vbus both need management, vhost-net via ioctls and vbus 
via configfs.  The only difference is the implementation.  vhost-net 
leaves much more to userspace, that's the main difference.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/14/2009 10:14 PM, Gregory Haskins wrote:
>> To reiterate, as long as the model is such that the ppc boards are
>> considered the "owner" (direct access, no translation needed) I believe
>> it will work.  If the pointers are expected to be owned by the host,
>> then my model doesn't work well either.
>>
> 
> In this case the x86 is the owner and the ppc boards use translated
> access.  Just switch drivers and device and it falls into place.
> 

You could switch vbus roles as well, I suppose.  Another potential
option is that he can stop mapping host memory on the guest so that it
follows the more traditional model.  As a bus-master device, the ppc
boards should have access to any host memory at least in the GFP_DMA
range, which would include all relevant pointers here.

I digress:  I was primarily addressing the concern that Ira would need
to manage the "host" side of the link using hvas mapped from userspace
(even if host side is the ppc boards).  vbus abstracts that access so as
to allow something other than userspace/hva mappings.  OTOH, having each
ppc board run a userspace app to do the mapping on its behalf and feed
it to vhost is probably not a huge deal either.  Where vhost might
really fall apart is when any assumptions about pageable memory occur,
if any.

As an aside: a bigger issue is that, iiuc, Ira wants more than a single
ethernet channel in his design (multiple ethernets, consoles, etc).  A
vhost solution in this environment is incomplete.

Note that Ira's architecture highlights that vbus's explicit management
interface is more valuable here than it is in KVM, since KVM already has
its own management interface via QEMU.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Avi Kivity
On 09/14/2009 10:14 PM, Gregory Haskins wrote:
> To reiterate, as long as the model is such that the ppc boards are
> considered the "owner" (direct access, no translation needed) I believe
> it will work.  If the pointers are expected to be owned by the host,
> then my model doesn't work well either.
>

In this case the x86 is the owner and the ppc boards use translated 
access.  Just switch drivers and device and it falls into place.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-15 Thread Avi Kivity
On 09/14/2009 07:47 PM, Michael S. Tsirkin wrote:
> On Mon, Sep 14, 2009 at 12:08:55PM -0400, Gregory Haskins wrote:
>
>> For Ira's example, the addresses would represent a physical address on
>> the PCI boards, and would follow any kind of relevant rules for
>> converting a "GPA" to a host accessible address (even if indirectly, via
>> a dma controller).
>>  
> I don't think limiting addresses to PCI physical addresses will work
> well.  From what I rememeber, Ira's x86 can not initiate burst
> transactions on PCI, and it's the ppc that initiates all DMA.
>

vhost-net would run on the PPC then.

>>>   But we can't let the guest specify physical addresses.
>>>
>> Agreed.  Neither your proposal nor mine operate this way afaict.
>>  
> But this seems to be what Ira needs.
>

In Ira's scenario, the "guest" (x86 host) specifies x86 physical 
addresses, and the ppc dmas to them.  It's the virtio model without any 
change.  A normal guest also specifis physical addresses.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-14 Thread Gregory Haskins
Michael S. Tsirkin wrote:
> On Mon, Sep 14, 2009 at 12:08:55PM -0400, Gregory Haskins wrote:
>> Michael S. Tsirkin wrote:
>>> On Fri, Sep 11, 2009 at 12:00:21PM -0400, Gregory Haskins wrote:
 FWIW: VBUS handles this situation via the "memctx" abstraction.  IOW,
 the memory is not assumed to be a userspace address.  Rather, it is a
 memctx-specific address, which can be userspace, or any other type
 (including hardware, dma-engine, etc).  As long as the memctx knows how
 to translate it, it will work.
>>> How would permissions be handled?
>> Same as anything else, really.  Read on for details.
>>
>>> it's easy to allow an app to pass in virtual addresses in its own address 
>>> space.
>> Agreed, and this is what I do.
>>
>> The guest always passes its own physical addresses (using things like
>> __pa() in linux).  This address passed is memctx specific, but generally
>> would fall into the category of "virtual-addresses" from the hosts
>> perspective.
>>
>> For a KVM/AlacrityVM guest example, the addresses are GPAs, accessed
>> internally to the context via a gfn_to_hva conversion (you can see this
>> occuring in the citation links I sent)
>>
>> For Ira's example, the addresses would represent a physical address on
>> the PCI boards, and would follow any kind of relevant rules for
>> converting a "GPA" to a host accessible address (even if indirectly, via
>> a dma controller).
> 
> So vbus can let an application

"application" means KVM guest, or ppc board, right?

> access either its own virtual memory or a physical memory on a PCI device.

To reiterate from the last reply: the model is the "guest" owns the
memory.  The host is granted access to that memory by means of a memctx
object, which must be admitted to the host kernel and accessed according
 to standard access-policy mechanisms.  Generally the "application" or
guest would never be accessing anything other than its own memory.

> My question is, is any application
> that's allowed to do the former also granted rights to do the later?

If I understand your question, no.  Can you elaborate?

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-14 Thread Gregory Haskins
Michael S. Tsirkin wrote:
> On Mon, Sep 14, 2009 at 12:08:55PM -0400, Gregory Haskins wrote:
>> For Ira's example, the addresses would represent a physical address on
>> the PCI boards, and would follow any kind of relevant rules for
>> converting a "GPA" to a host accessible address (even if indirectly, via
>> a dma controller).
> 
> I don't think limiting addresses to PCI physical addresses will work
> well.

The only "limit" is imposed by the memctx.  If a given context needs to
meet certain requirements beyond PCI physical addresses, it would
presumably be designed that way.


>  From what I rememeber, Ira's x86 can not initiate burst
> transactions on PCI, and it's the ppc that initiates all DMA.

The only requirement is that the "guest" "owns" the memory.  IOW: As
with virtio/vhost, the guest can access the pointers in the ring
directly but the host must pass through a translation function.

Your translation is direct: you use a slots/hva scheme.  My translation
is abstracted, which means it can support slots/hva (such as in
alacrityvm) or some other scheme as long as the general model of "guest
owned" holds true.

> 
>>>  But we can't let the guest specify physical addresses.
>> Agreed.  Neither your proposal nor mine operate this way afaict.
> 
> But this seems to be what Ira needs.

So what he could do then is implement the memctx to integrate with the
ppc side dma controller.  E.g. "translation" in his box means a protocol
from the x86 to the ppc to initiate the dma cycle.  This could be
exposed as a dma facility in the register file of the ppc boards, for
instance.

To reiterate, as long as the model is such that the ppc boards are
considered the "owner" (direct access, no translation needed) I believe
it will work.  If the pointers are expected to be owned by the host,
then my model doesn't work well either.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-14 Thread Michael S. Tsirkin
On Mon, Sep 14, 2009 at 12:08:55PM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Fri, Sep 11, 2009 at 12:00:21PM -0400, Gregory Haskins wrote:
> >> FWIW: VBUS handles this situation via the "memctx" abstraction.  IOW,
> >> the memory is not assumed to be a userspace address.  Rather, it is a
> >> memctx-specific address, which can be userspace, or any other type
> >> (including hardware, dma-engine, etc).  As long as the memctx knows how
> >> to translate it, it will work.
> > 
> > How would permissions be handled?
> 
> Same as anything else, really.  Read on for details.
> 
> > it's easy to allow an app to pass in virtual addresses in its own address 
> > space.
> 
> Agreed, and this is what I do.
> 
> The guest always passes its own physical addresses (using things like
> __pa() in linux).  This address passed is memctx specific, but generally
> would fall into the category of "virtual-addresses" from the hosts
> perspective.
> 
> For a KVM/AlacrityVM guest example, the addresses are GPAs, accessed
> internally to the context via a gfn_to_hva conversion (you can see this
> occuring in the citation links I sent)
> 
> For Ira's example, the addresses would represent a physical address on
> the PCI boards, and would follow any kind of relevant rules for
> converting a "GPA" to a host accessible address (even if indirectly, via
> a dma controller).

So vbus can let an application access either its own virtual memory or a
physical memory on a PCI device.  My question is, is any application
that's allowed to do the former also granted rights to do the later?

> >  But we can't let the guest specify physical addresses.
> 
> Agreed.  Neither your proposal nor mine operate this way afaict.
> 
> HTH
> 
> Kind Regards,
> -Greg
> 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-14 Thread Michael S. Tsirkin
On Mon, Sep 14, 2009 at 12:08:55PM -0400, Gregory Haskins wrote:
> For Ira's example, the addresses would represent a physical address on
> the PCI boards, and would follow any kind of relevant rules for
> converting a "GPA" to a host accessible address (even if indirectly, via
> a dma controller).

I don't think limiting addresses to PCI physical addresses will work
well.  From what I rememeber, Ira's x86 can not initiate burst
transactions on PCI, and it's the ppc that initiates all DMA.

> 
> >  But we can't let the guest specify physical addresses.
> 
> Agreed.  Neither your proposal nor mine operate this way afaict.

But this seems to be what Ira needs.

> HTH
> 
> Kind Regards,
> -Greg
> 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-14 Thread Gregory Haskins
Michael S. Tsirkin wrote:
> On Fri, Sep 11, 2009 at 12:00:21PM -0400, Gregory Haskins wrote:
>> FWIW: VBUS handles this situation via the "memctx" abstraction.  IOW,
>> the memory is not assumed to be a userspace address.  Rather, it is a
>> memctx-specific address, which can be userspace, or any other type
>> (including hardware, dma-engine, etc).  As long as the memctx knows how
>> to translate it, it will work.
> 
> How would permissions be handled?

Same as anything else, really.  Read on for details.

> it's easy to allow an app to pass in virtual addresses in its own address 
> space.

Agreed, and this is what I do.

The guest always passes its own physical addresses (using things like
__pa() in linux).  This address passed is memctx specific, but generally
would fall into the category of "virtual-addresses" from the hosts
perspective.

For a KVM/AlacrityVM guest example, the addresses are GPAs, accessed
internally to the context via a gfn_to_hva conversion (you can see this
occuring in the citation links I sent)

For Ira's example, the addresses would represent a physical address on
the PCI boards, and would follow any kind of relevant rules for
converting a "GPA" to a host accessible address (even if indirectly, via
a dma controller).


>  But we can't let the guest specify physical addresses.

Agreed.  Neither your proposal nor mine operate this way afaict.

HTH

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-14 Thread Michael S. Tsirkin
On Mon, Sep 14, 2009 at 01:57:06PM +0800, Xin, Xiaohui wrote:
> >The irqfd/ioeventfd patches are part of Avi's kvm.git tree:
> >git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git
> >
> >I expect them to be merged by 2.6.32-rc1 - right, Avi?
> 
> Michael,
> 
> I think I have the kernel patch for kvm_irqfd and kvm_ioeventfd, but missed 
> the qemu side patch for irqfd and ioeventfd.
> 
> I met the compile error when I compiled virtio-pci.c file in qemu-kvm like 
> this:
> 
> /root/work/vmdq/vhost/qemu-kvm/hw/virtio-pci.c:384: error: `KVM_IRQFD` 
> undeclared (first use in this function)
> /root/work/vmdq/vhost/qemu-kvm/hw/virtio-pci.c:400: error: `KVM_IOEVENTFD` 
> undeclared (first use in this function)
> 
> Which qemu tree or patch do you use for kvm_irqfd and kvm_ioeventfd?

I'm using the headers from upstream kernel.
I'll send a patch for that.

> Thanks
> Xiaohui
> 
> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com] 
> Sent: Sunday, September 13, 2009 1:46 PM
> To: Xin, Xiaohui
> Cc: Ira W. Snyder; net...@vger.kernel.org; 
> virtualization@lists.linux-foundation.org; k...@vger.kernel.org; 
> linux-ker...@vger.kernel.org; mi...@elte.hu; linux...@kvack.org; 
> a...@linux-foundation.org; h...@zytor.com; gregory.hask...@gmail.com; Rusty 
> Russell; s.he...@linux-ag.com; a...@redhat.com
> Subject: Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
> 
> On Fri, Sep 11, 2009 at 11:17:33PM +0800, Xin, Xiaohui wrote:
> > Michael,
> > We are very interested in your patch and want to have a try with it.
> > I have collected your 3 patches in kernel side and 4 patches in queue side.
> > The patches are listed here:
> > 
> > PATCHv5-1-3-mm-export-use_mm-unuse_mm-to-modules.patch
> > PATCHv5-2-3-mm-reduce-atomic-use-on-use_mm-fast-path.patch
> > PATCHv5-3-3-vhost_net-a-kernel-level-virtio-server.patch
> > 
> > PATCHv3-1-4-qemu-kvm-move-virtio-pci[1].o-to-near-pci.o.patch
> > PATCHv3-2-4-virtio-move-features-to-an-inline-function.patch
> > PATCHv3-3-4-qemu-kvm-vhost-net-implementation.patch
> > PATCHv3-4-4-qemu-kvm-add-compat-eventfd.patch
> > 
> > I applied the kernel patches on v2.6.31-rc4 and the qemu patches on latest 
> > kvm qemu.
> > But seems there are some patches are needed at least irqfd and ioeventfd 
> > patches on
> > current qemu. I cannot create a kvm guest with "-net 
> > nic,model=virtio,vhost=vethX".
> > 
> > May you kindly advice us the patch lists all exactly to make it work?
> > Thanks a lot. :-)
> > 
> > Thanks
> > Xiaohui
> 
> 
> The irqfd/ioeventfd patches are part of Avi's kvm.git tree:
> git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git
> 
> I expect them to be merged by 2.6.32-rc1 - right, Avi?
> 
> -- 
> MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


RE: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-13 Thread Xin, Xiaohui
>The irqfd/ioeventfd patches are part of Avi's kvm.git tree:
>git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git
>
>I expect them to be merged by 2.6.32-rc1 - right, Avi?

Michael,

I think I have the kernel patch for kvm_irqfd and kvm_ioeventfd, but missed the 
qemu side patch for irqfd and ioeventfd.

I met the compile error when I compiled virtio-pci.c file in qemu-kvm like this:

/root/work/vmdq/vhost/qemu-kvm/hw/virtio-pci.c:384: error: `KVM_IRQFD` 
undeclared (first use in this function)
/root/work/vmdq/vhost/qemu-kvm/hw/virtio-pci.c:400: error: `KVM_IOEVENTFD` 
undeclared (first use in this function)

Which qemu tree or patch do you use for kvm_irqfd and kvm_ioeventfd?

Thanks
Xiaohui

-Original Message-
From: Michael S. Tsirkin [mailto:m...@redhat.com] 
Sent: Sunday, September 13, 2009 1:46 PM
To: Xin, Xiaohui
Cc: Ira W. Snyder; net...@vger.kernel.org; 
virtualization@lists.linux-foundation.org; k...@vger.kernel.org; 
linux-ker...@vger.kernel.org; mi...@elte.hu; linux...@kvack.org; 
a...@linux-foundation.org; h...@zytor.com; gregory.hask...@gmail.com; Rusty 
Russell; s.he...@linux-ag.com; a...@redhat.com
Subject: Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

On Fri, Sep 11, 2009 at 11:17:33PM +0800, Xin, Xiaohui wrote:
> Michael,
> We are very interested in your patch and want to have a try with it.
> I have collected your 3 patches in kernel side and 4 patches in queue side.
> The patches are listed here:
> 
> PATCHv5-1-3-mm-export-use_mm-unuse_mm-to-modules.patch
> PATCHv5-2-3-mm-reduce-atomic-use-on-use_mm-fast-path.patch
> PATCHv5-3-3-vhost_net-a-kernel-level-virtio-server.patch
> 
> PATCHv3-1-4-qemu-kvm-move-virtio-pci[1].o-to-near-pci.o.patch
> PATCHv3-2-4-virtio-move-features-to-an-inline-function.patch
> PATCHv3-3-4-qemu-kvm-vhost-net-implementation.patch
> PATCHv3-4-4-qemu-kvm-add-compat-eventfd.patch
> 
> I applied the kernel patches on v2.6.31-rc4 and the qemu patches on latest 
> kvm qemu.
> But seems there are some patches are needed at least irqfd and ioeventfd 
> patches on
> current qemu. I cannot create a kvm guest with "-net 
> nic,model=virtio,vhost=vethX".
> 
> May you kindly advice us the patch lists all exactly to make it work?
> Thanks a lot. :-)
> 
> Thanks
> Xiaohui


The irqfd/ioeventfd patches are part of Avi's kvm.git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git

I expect them to be merged by 2.6.32-rc1 - right, Avi?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-13 Thread Michael S. Tsirkin
On Fri, Sep 11, 2009 at 12:00:21PM -0400, Gregory Haskins wrote:
> FWIW: VBUS handles this situation via the "memctx" abstraction.  IOW,
> the memory is not assumed to be a userspace address.  Rather, it is a
> memctx-specific address, which can be userspace, or any other type
> (including hardware, dma-engine, etc).  As long as the memctx knows how
> to translate it, it will work.

How would permissions be handled? it's easy to allow an app to pass in
virtual addresses in its own address space.  But we can't let the guest
specify physical addresses.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-12 Thread Michael S. Tsirkin
On Fri, Sep 11, 2009 at 11:17:33PM +0800, Xin, Xiaohui wrote:
> Michael,
> We are very interested in your patch and want to have a try with it.
> I have collected your 3 patches in kernel side and 4 patches in queue side.
> The patches are listed here:
> 
> PATCHv5-1-3-mm-export-use_mm-unuse_mm-to-modules.patch
> PATCHv5-2-3-mm-reduce-atomic-use-on-use_mm-fast-path.patch
> PATCHv5-3-3-vhost_net-a-kernel-level-virtio-server.patch
> 
> PATCHv3-1-4-qemu-kvm-move-virtio-pci[1].o-to-near-pci.o.patch
> PATCHv3-2-4-virtio-move-features-to-an-inline-function.patch
> PATCHv3-3-4-qemu-kvm-vhost-net-implementation.patch
> PATCHv3-4-4-qemu-kvm-add-compat-eventfd.patch
> 
> I applied the kernel patches on v2.6.31-rc4 and the qemu patches on latest 
> kvm qemu.
> But seems there are some patches are needed at least irqfd and ioeventfd 
> patches on
> current qemu. I cannot create a kvm guest with "-net 
> nic,model=virtio,vhost=vethX".
> 
> May you kindly advice us the patch lists all exactly to make it work?
> Thanks a lot. :-)
> 
> Thanks
> Xiaohui


The irqfd/ioeventfd patches are part of Avi's kvm.git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git

I expect them to be merged by 2.6.32-rc1 - right, Avi?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-11 Thread Gregory Haskins
Ira W. Snyder wrote:
> On Mon, Sep 07, 2009 at 01:15:37PM +0300, Michael S. Tsirkin wrote:
>> On Thu, Sep 03, 2009 at 11:39:45AM -0700, Ira W. Snyder wrote:
>>> On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote:
 What it is: vhost net is a character device that can be used to reduce
 the number of system calls involved in virtio networking.
 Existing virtio net code is used in the guest without modification.

 There's similarity with vringfd, with some differences and reduced scope
 - uses eventfd for signalling
 - structures can be moved around in memory at any time (good for migration)
 - support memory table and not just an offset (needed for kvm)

 common virtio related code has been put in a separate file vhost.c and
 can be made into a separate module if/when more backends appear.  I used
 Rusty's lguest.c as the source for developing this part : this supplied
 me with witty comments I wouldn't be able to write myself.

 What it is not: vhost net is not a bus, and not a generic new system
 call. No assumptions are made on how guest performs hypercalls.
 Userspace hypervisors are supported as well as kvm.

 How it works: Basically, we connect virtio frontend (configured by
 userspace) to a backend. The backend could be a network device, or a
 tun-like device. In this version I only support raw socket as a backend,
 which can be bound to e.g. SR IOV, or to macvlan device.  Backend is
 also configured by userspace, including vlan/mac etc.

 Status:
 This works for me, and I haven't see any crashes.
 I have done some light benchmarking (with v4), compared to userspace, I
 see improved latency (as I save up to 4 system calls per packet) but not
 bandwidth/CPU (as TSO and interrupt mitigation are not supported).  For
 ping benchmark (where there's no TSO) troughput is also improved.

 Features that I plan to look at in the future:
 - tap support
 - TSO
 - interrupt mitigation
 - zero copy

>>> Hello Michael,
>>>
>>> I've started looking at vhost with the intention of using it over PCI to
>>> connect physical machines together.
>>>
>>> The part that I am struggling with the most is figuring out which parts
>>> of the rings are in the host's memory, and which parts are in the
>>> guest's memory.
>> All rings are in guest's memory, to match existing virtio code.
> 
> Ok, this makes sense.
> 
>> vhost
>> assumes that the memory space of the hypervisor userspace process covers
>> the whole of guest memory.
> 
> Is this necessary? Why? The assumption seems very wrong when you're
> doing data transport between two physical systems via PCI.

FWIW: VBUS handles this situation via the "memctx" abstraction.  IOW,
the memory is not assumed to be a userspace address.  Rather, it is a
memctx-specific address, which can be userspace, or any other type
(including hardware, dma-engine, etc).  As long as the memctx knows how
to translate it, it will work.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-11 Thread Gregory Haskins
Gregory Haskins wrote:

[snip]

> 
> FWIW: VBUS handles this situation via the "memctx" abstraction.  IOW,
> the memory is not assumed to be a userspace address.  Rather, it is a
> memctx-specific address, which can be userspace, or any other type
> (including hardware, dma-engine, etc).  As long as the memctx knows how
> to translate it, it will work.
> 

citations:

Here is a packet import (from the perspective of the host side "venet"
device model, similar to Michaels "vhost")

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=kernel/vbus/devices/venet-tap.c;h=ee091c47f06e9bb8487a45e72d493273fe08329f;hb=ded8ce2005a85c174ba93ee26f8d67049ef11025#l535

Here is the KVM specific memctx:

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=kernel/vbus/kvm.c;h=56e2c5682a7ca8432c159377b0f7389cf34cbc1b;hb=ded8ce2005a85c174ba93ee26f8d67049ef11025#l188

and

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=virt/kvm/xinterface.c;h=0cccb6095ca2a51bad01f7ba2137fdd9111b63d3;hb=ded8ce2005a85c174ba93ee26f8d67049ef11025#l289

You could alternatively define a memctx for your environment which knows
how to deal with your PPC boards PCI based memory, and the devices would
all "just work".

Kind Regards,
-Greg




signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

RE: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-11 Thread Xin, Xiaohui
Michael,
We are very interested in your patch and want to have a try with it.
I have collected your 3 patches in kernel side and 4 patches in queue side.
The patches are listed here:

PATCHv5-1-3-mm-export-use_mm-unuse_mm-to-modules.patch
PATCHv5-2-3-mm-reduce-atomic-use-on-use_mm-fast-path.patch
PATCHv5-3-3-vhost_net-a-kernel-level-virtio-server.patch

PATCHv3-1-4-qemu-kvm-move-virtio-pci[1].o-to-near-pci.o.patch
PATCHv3-2-4-virtio-move-features-to-an-inline-function.patch
PATCHv3-3-4-qemu-kvm-vhost-net-implementation.patch
PATCHv3-4-4-qemu-kvm-add-compat-eventfd.patch

I applied the kernel patches on v2.6.31-rc4 and the qemu patches on latest kvm 
qemu.
But seems there are some patches are needed at least irqfd and ioeventfd 
patches on
current qemu. I cannot create a kvm guest with "-net 
nic,model=virtio,vhost=vethX".

May you kindly advice us the patch lists all exactly to make it work?
Thanks a lot. :-)

Thanks
Xiaohui
-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of 
Michael S. Tsirkin
Sent: Wednesday, September 09, 2009 4:14 AM
To: Ira W. Snyder
Cc: net...@vger.kernel.org; virtualization@lists.linux-foundation.org; 
k...@vger.kernel.org; linux-ker...@vger.kernel.org; mi...@elte.hu; 
linux...@kvack.org; a...@linux-foundation.org; h...@zytor.com; 
gregory.hask...@gmail.com; Rusty Russell; s.he...@linux-ag.com
Subject: Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

On Tue, Sep 08, 2009 at 10:20:35AM -0700, Ira W. Snyder wrote:
> On Mon, Sep 07, 2009 at 01:15:37PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Sep 03, 2009 at 11:39:45AM -0700, Ira W. Snyder wrote:
> > > On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote:
> > > > What it is: vhost net is a character device that can be used to reduce
> > > > the number of system calls involved in virtio networking.
> > > > Existing virtio net code is used in the guest without modification.
> > > > 
> > > > There's similarity with vringfd, with some differences and reduced scope
> > > > - uses eventfd for signalling
> > > > - structures can be moved around in memory at any time (good for 
> > > > migration)
> > > > - support memory table and not just an offset (needed for kvm)
> > > > 
> > > > common virtio related code has been put in a separate file vhost.c and
> > > > can be made into a separate module if/when more backends appear.  I used
> > > > Rusty's lguest.c as the source for developing this part : this supplied
> > > > me with witty comments I wouldn't be able to write myself.
> > > > 
> > > > What it is not: vhost net is not a bus, and not a generic new system
> > > > call. No assumptions are made on how guest performs hypercalls.
> > > > Userspace hypervisors are supported as well as kvm.
> > > > 
> > > > How it works: Basically, we connect virtio frontend (configured by
> > > > userspace) to a backend. The backend could be a network device, or a
> > > > tun-like device. In this version I only support raw socket as a backend,
> > > > which can be bound to e.g. SR IOV, or to macvlan device.  Backend is
> > > > also configured by userspace, including vlan/mac etc.
> > > > 
> > > > Status:
> > > > This works for me, and I haven't see any crashes.
> > > > I have done some light benchmarking (with v4), compared to userspace, I
> > > > see improved latency (as I save up to 4 system calls per packet) but not
> > > > bandwidth/CPU (as TSO and interrupt mitigation are not supported).  For
> > > > ping benchmark (where there's no TSO) troughput is also improved.
> > > > 
> > > > Features that I plan to look at in the future:
> > > > - tap support
> > > > - TSO
> > > > - interrupt mitigation
> > > > - zero copy
> > > > 
> > > 
> > > Hello Michael,
> > > 
> > > I've started looking at vhost with the intention of using it over PCI to
> > > connect physical machines together.
> > > 
> > > The part that I am struggling with the most is figuring out which parts
> > > of the rings are in the host's memory, and which parts are in the
> > > guest's memory.
> > 
> > All rings are in guest's memory, to match existing virtio code.
> 
> Ok, this makes sense.
> 
> > vhost
> > assumes that the memory space of the hypervisor userspace process covers
> > the whole of guest memory.
> 
> Is this necessary? Why?

Because with virtio ring can give 

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-08 Thread Michael S. Tsirkin
On Tue, Sep 08, 2009 at 10:20:35AM -0700, Ira W. Snyder wrote:
> On Mon, Sep 07, 2009 at 01:15:37PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Sep 03, 2009 at 11:39:45AM -0700, Ira W. Snyder wrote:
> > > On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote:
> > > > What it is: vhost net is a character device that can be used to reduce
> > > > the number of system calls involved in virtio networking.
> > > > Existing virtio net code is used in the guest without modification.
> > > > 
> > > > There's similarity with vringfd, with some differences and reduced scope
> > > > - uses eventfd for signalling
> > > > - structures can be moved around in memory at any time (good for 
> > > > migration)
> > > > - support memory table and not just an offset (needed for kvm)
> > > > 
> > > > common virtio related code has been put in a separate file vhost.c and
> > > > can be made into a separate module if/when more backends appear.  I used
> > > > Rusty's lguest.c as the source for developing this part : this supplied
> > > > me with witty comments I wouldn't be able to write myself.
> > > > 
> > > > What it is not: vhost net is not a bus, and not a generic new system
> > > > call. No assumptions are made on how guest performs hypercalls.
> > > > Userspace hypervisors are supported as well as kvm.
> > > > 
> > > > How it works: Basically, we connect virtio frontend (configured by
> > > > userspace) to a backend. The backend could be a network device, or a
> > > > tun-like device. In this version I only support raw socket as a backend,
> > > > which can be bound to e.g. SR IOV, or to macvlan device.  Backend is
> > > > also configured by userspace, including vlan/mac etc.
> > > > 
> > > > Status:
> > > > This works for me, and I haven't see any crashes.
> > > > I have done some light benchmarking (with v4), compared to userspace, I
> > > > see improved latency (as I save up to 4 system calls per packet) but not
> > > > bandwidth/CPU (as TSO and interrupt mitigation are not supported).  For
> > > > ping benchmark (where there's no TSO) troughput is also improved.
> > > > 
> > > > Features that I plan to look at in the future:
> > > > - tap support
> > > > - TSO
> > > > - interrupt mitigation
> > > > - zero copy
> > > > 
> > > 
> > > Hello Michael,
> > > 
> > > I've started looking at vhost with the intention of using it over PCI to
> > > connect physical machines together.
> > > 
> > > The part that I am struggling with the most is figuring out which parts
> > > of the rings are in the host's memory, and which parts are in the
> > > guest's memory.
> > 
> > All rings are in guest's memory, to match existing virtio code.
> 
> Ok, this makes sense.
> 
> > vhost
> > assumes that the memory space of the hypervisor userspace process covers
> > the whole of guest memory.
> 
> Is this necessary? Why?

Because with virtio ring can give us arbitrary guest addresses.  If
guest was limited to using a subset of addresses, hypervisor would only
have to map these.

> The assumption seems very wrong when you're
> doing data transport between two physical systems via PCI.
> I know vhost has not been designed for this specific situation, but it
> is good to be looking toward other possible uses.
> 
> > And there's a translation table.
> > Ring addresses are userspace addresses, they do not undergo translation.
> > 
> > > If I understand everything correctly, the rings are all userspace
> > > addresses, which means that they can be moved around in physical memory,
> > > and get pushed out to swap.
> > 
> > Unless they are locked, yes.
> > 
> > > AFAIK, this is impossible to handle when
> > > connecting two physical systems, you'd need the rings available in IO
> > > memory (PCI memory), so you can ioreadXX() them instead. To the best of
> > > my knowledge, I shouldn't be using copy_to_user() on an __iomem address.
> > > Also, having them migrate around in memory would be a bad thing.
> > > 
> > > Also, I'm having trouble figuring out how the packet contents are
> > > actually copied from one system to the other. Could you point this out
> > > for me?
> > 
> > The code in net/packet/af_packet.c does it when vhost calls sendmsg.
> > 
> 
> Ok. The sendmsg() implementation uses memcpy_fromiovec(). Is it possible
> to make this use a DMA engine instead?

Maybe.

> I know this was suggested in an earlier thread.

Yes, it might even give some performance benefit with e.g. I/O AT.

> > > Is there somewhere I can find the userspace code (kvm, qemu, lguest,
> > > etc.) code needed for interacting with the vhost misc device so I can
> > > get a better idea of how userspace is supposed to work?
> > 
> > Look in archives for k...@vger.kernel.org. the subject is qemu-kvm: vhost 
> > net.
> > 
> > > (Features
> > > negotiation, etc.)
> > > 
> > 
> > That's not yet implemented as there are no features yet.  I'm working on
> > tap support, which will add a feature bit.  Overall, qemu does an ioctl
> > to query supported features, and then a

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-08 Thread Ira W. Snyder
On Mon, Sep 07, 2009 at 01:15:37PM +0300, Michael S. Tsirkin wrote:
> On Thu, Sep 03, 2009 at 11:39:45AM -0700, Ira W. Snyder wrote:
> > On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote:
> > > What it is: vhost net is a character device that can be used to reduce
> > > the number of system calls involved in virtio networking.
> > > Existing virtio net code is used in the guest without modification.
> > > 
> > > There's similarity with vringfd, with some differences and reduced scope
> > > - uses eventfd for signalling
> > > - structures can be moved around in memory at any time (good for 
> > > migration)
> > > - support memory table and not just an offset (needed for kvm)
> > > 
> > > common virtio related code has been put in a separate file vhost.c and
> > > can be made into a separate module if/when more backends appear.  I used
> > > Rusty's lguest.c as the source for developing this part : this supplied
> > > me with witty comments I wouldn't be able to write myself.
> > > 
> > > What it is not: vhost net is not a bus, and not a generic new system
> > > call. No assumptions are made on how guest performs hypercalls.
> > > Userspace hypervisors are supported as well as kvm.
> > > 
> > > How it works: Basically, we connect virtio frontend (configured by
> > > userspace) to a backend. The backend could be a network device, or a
> > > tun-like device. In this version I only support raw socket as a backend,
> > > which can be bound to e.g. SR IOV, or to macvlan device.  Backend is
> > > also configured by userspace, including vlan/mac etc.
> > > 
> > > Status:
> > > This works for me, and I haven't see any crashes.
> > > I have done some light benchmarking (with v4), compared to userspace, I
> > > see improved latency (as I save up to 4 system calls per packet) but not
> > > bandwidth/CPU (as TSO and interrupt mitigation are not supported).  For
> > > ping benchmark (where there's no TSO) troughput is also improved.
> > > 
> > > Features that I plan to look at in the future:
> > > - tap support
> > > - TSO
> > > - interrupt mitigation
> > > - zero copy
> > > 
> > 
> > Hello Michael,
> > 
> > I've started looking at vhost with the intention of using it over PCI to
> > connect physical machines together.
> > 
> > The part that I am struggling with the most is figuring out which parts
> > of the rings are in the host's memory, and which parts are in the
> > guest's memory.
> 
> All rings are in guest's memory, to match existing virtio code.

Ok, this makes sense.

> vhost
> assumes that the memory space of the hypervisor userspace process covers
> the whole of guest memory.

Is this necessary? Why? The assumption seems very wrong when you're
doing data transport between two physical systems via PCI.

I know vhost has not been designed for this specific situation, but it
is good to be looking toward other possible uses.

> And there's a translation table.
> Ring addresses are userspace addresses, they do not undergo translation.
> 
> > If I understand everything correctly, the rings are all userspace
> > addresses, which means that they can be moved around in physical memory,
> > and get pushed out to swap.
> 
> Unless they are locked, yes.
> 
> > AFAIK, this is impossible to handle when
> > connecting two physical systems, you'd need the rings available in IO
> > memory (PCI memory), so you can ioreadXX() them instead. To the best of
> > my knowledge, I shouldn't be using copy_to_user() on an __iomem address.
> > Also, having them migrate around in memory would be a bad thing.
> > 
> > Also, I'm having trouble figuring out how the packet contents are
> > actually copied from one system to the other. Could you point this out
> > for me?
> 
> The code in net/packet/af_packet.c does it when vhost calls sendmsg.
> 

Ok. The sendmsg() implementation uses memcpy_fromiovec(). Is it possible
to make this use a DMA engine instead? I know this was suggested in an
earlier thread.

> > Is there somewhere I can find the userspace code (kvm, qemu, lguest,
> > etc.) code needed for interacting with the vhost misc device so I can
> > get a better idea of how userspace is supposed to work?
> 
> Look in archives for k...@vger.kernel.org. the subject is qemu-kvm: vhost net.
> 
> > (Features
> > negotiation, etc.)
> > 
> 
> That's not yet implemented as there are no features yet.  I'm working on
> tap support, which will add a feature bit.  Overall, qemu does an ioctl
> to query supported features, and then acks them with another ioctl.  I'm
> also trying to avoid duplicating functionality available elsewhere.  So
> that to check e.g. TSO support, you'd just look at the underlying
> hardware device you are binding to.
> 

Ok. Do you have plans to support the VIRTIO_NET_F_MRG_RXBUF feature in
the future? I found that this made an enormous improvement in throughput
on my virtio-net <-> virtio-net system. Perhaps it isn't needed with
vhost-net.

Thanks for replying,
Ira
___

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-07 Thread Michael S. Tsirkin
On Thu, Sep 03, 2009 at 11:39:45AM -0700, Ira W. Snyder wrote:
> On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote:
> > What it is: vhost net is a character device that can be used to reduce
> > the number of system calls involved in virtio networking.
> > Existing virtio net code is used in the guest without modification.
> > 
> > There's similarity with vringfd, with some differences and reduced scope
> > - uses eventfd for signalling
> > - structures can be moved around in memory at any time (good for migration)
> > - support memory table and not just an offset (needed for kvm)
> > 
> > common virtio related code has been put in a separate file vhost.c and
> > can be made into a separate module if/when more backends appear.  I used
> > Rusty's lguest.c as the source for developing this part : this supplied
> > me with witty comments I wouldn't be able to write myself.
> > 
> > What it is not: vhost net is not a bus, and not a generic new system
> > call. No assumptions are made on how guest performs hypercalls.
> > Userspace hypervisors are supported as well as kvm.
> > 
> > How it works: Basically, we connect virtio frontend (configured by
> > userspace) to a backend. The backend could be a network device, or a
> > tun-like device. In this version I only support raw socket as a backend,
> > which can be bound to e.g. SR IOV, or to macvlan device.  Backend is
> > also configured by userspace, including vlan/mac etc.
> > 
> > Status:
> > This works for me, and I haven't see any crashes.
> > I have done some light benchmarking (with v4), compared to userspace, I
> > see improved latency (as I save up to 4 system calls per packet) but not
> > bandwidth/CPU (as TSO and interrupt mitigation are not supported).  For
> > ping benchmark (where there's no TSO) troughput is also improved.
> > 
> > Features that I plan to look at in the future:
> > - tap support
> > - TSO
> > - interrupt mitigation
> > - zero copy
> > 
> 
> Hello Michael,
> 
> I've started looking at vhost with the intention of using it over PCI to
> connect physical machines together.
> 
> The part that I am struggling with the most is figuring out which parts
> of the rings are in the host's memory, and which parts are in the
> guest's memory.

All rings are in guest's memory, to match existing virtio code.  vhost
assumes that the memory space of the hypervisor userspace process covers
the whole of guest memory. And there's a translation table.
Ring addresses are userspace addresses, they do not undergo translation.

> If I understand everything correctly, the rings are all userspace
> addresses, which means that they can be moved around in physical memory,
> and get pushed out to swap.

Unless they are locked, yes.

> AFAIK, this is impossible to handle when
> connecting two physical systems, you'd need the rings available in IO
> memory (PCI memory), so you can ioreadXX() them instead. To the best of
> my knowledge, I shouldn't be using copy_to_user() on an __iomem address.
> Also, having them migrate around in memory would be a bad thing.
> 
> Also, I'm having trouble figuring out how the packet contents are
> actually copied from one system to the other. Could you point this out
> for me?

The code in net/packet/af_packet.c does it when vhost calls sendmsg.

> Is there somewhere I can find the userspace code (kvm, qemu, lguest,
> etc.) code needed for interacting with the vhost misc device so I can
> get a better idea of how userspace is supposed to work?

Look in archives for k...@vger.kernel.org. the subject is qemu-kvm: vhost net.

> (Features
> negotiation, etc.)
> 
> Thanks,
> Ira

That's not yet implemented as there are no features yet.  I'm working on
tap support, which will add a feature bit.  Overall, qemu does an ioctl
to query supported features, and then acks them with another ioctl.  I'm
also trying to avoid duplicating functionality available elsewhere.  So
that to check e.g. TSO support, you'd just look at the underlying
hardware device you are binding to.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-03 Thread Ira W. Snyder
On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote:
> What it is: vhost net is a character device that can be used to reduce
> the number of system calls involved in virtio networking.
> Existing virtio net code is used in the guest without modification.
> 
> There's similarity with vringfd, with some differences and reduced scope
> - uses eventfd for signalling
> - structures can be moved around in memory at any time (good for migration)
> - support memory table and not just an offset (needed for kvm)
> 
> common virtio related code has been put in a separate file vhost.c and
> can be made into a separate module if/when more backends appear.  I used
> Rusty's lguest.c as the source for developing this part : this supplied
> me with witty comments I wouldn't be able to write myself.
> 
> What it is not: vhost net is not a bus, and not a generic new system
> call. No assumptions are made on how guest performs hypercalls.
> Userspace hypervisors are supported as well as kvm.
> 
> How it works: Basically, we connect virtio frontend (configured by
> userspace) to a backend. The backend could be a network device, or a
> tun-like device. In this version I only support raw socket as a backend,
> which can be bound to e.g. SR IOV, or to macvlan device.  Backend is
> also configured by userspace, including vlan/mac etc.
> 
> Status:
> This works for me, and I haven't see any crashes.
> I have done some light benchmarking (with v4), compared to userspace, I
> see improved latency (as I save up to 4 system calls per packet) but not
> bandwidth/CPU (as TSO and interrupt mitigation are not supported).  For
> ping benchmark (where there's no TSO) troughput is also improved.
> 
> Features that I plan to look at in the future:
> - tap support
> - TSO
> - interrupt mitigation
> - zero copy
> 

Hello Michael,

I've started looking at vhost with the intention of using it over PCI to
connect physical machines together.

The part that I am struggling with the most is figuring out which parts
of the rings are in the host's memory, and which parts are in the
guest's memory.

If I understand everything correctly, the rings are all userspace
addresses, which means that they can be moved around in physical memory,
and get pushed out to swap. AFAIK, this is impossible to handle when
connecting two physical systems, you'd need the rings available in IO
memory (PCI memory), so you can ioreadXX() them instead. To the best of
my knowledge, I shouldn't be using copy_to_user() on an __iomem address.
Also, having them migrate around in memory would be a bad thing.

Also, I'm having trouble figuring out how the packet contents are
actually copied from one system to the other. Could you point this out
for me?

Is there somewhere I can find the userspace code (kvm, qemu, lguest,
etc.) code needed for interacting with the vhost misc device so I can
get a better idea of how userspace is supposed to work? (Features
negotiation, etc.)

Thanks,
Ira

> Acked-by: Arnd Bergmann 
> Signed-off-by: Michael S. Tsirkin 
> 
> ---
>  MAINTAINERS|   10 +
>  arch/x86/kvm/Kconfig   |1 +
>  drivers/Makefile   |1 +
>  drivers/vhost/Kconfig  |   11 +
>  drivers/vhost/Makefile |2 +
>  drivers/vhost/net.c|  475 ++
>  drivers/vhost/vhost.c  |  688 
> 
>  drivers/vhost/vhost.h  |  122 
>  include/linux/Kbuild   |1 +
>  include/linux/miscdevice.h |1 +
>  include/linux/vhost.h  |  101 +++
>  11 files changed, 1413 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/vhost/Kconfig
>  create mode 100644 drivers/vhost/Makefile
>  create mode 100644 drivers/vhost/net.c
>  create mode 100644 drivers/vhost/vhost.c
>  create mode 100644 drivers/vhost/vhost.h
>  create mode 100644 include/linux/vhost.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b1114cf..de4587f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5431,6 +5431,16 @@ S: Maintained
>  F:   Documentation/filesystems/vfat.txt
>  F:   fs/fat/
>  
> +VIRTIO HOST (VHOST)
> +P:   Michael S. Tsirkin
> +M:   m...@redhat.com
> +L:   k...@vger.kernel.org
> +L:   virtualizat...@lists.osdl.org
> +L:   net...@vger.kernel.org
> +S:   Maintained
> +F:   drivers/vhost/
> +F:   include/linux/vhost.h
> +
>  VIA RHINE NETWORK DRIVER
>  M:   Roger Luethi 
>  S:   Maintained
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index b84e571..94f44d9 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -64,6 +64,7 @@ config KVM_AMD
>  
>  # OK, it's a little counter-intuitive to do this, but it puts it neatly under
>  # the virtualization menu.
> +source drivers/vhost/Kconfig
>  source drivers/lguest/Kconfig
>  source drivers/virtio/Kconfig
>  
> diff --git a/drivers/Makefile b/drivers/Makefile
> index bc4205d..1551ae1 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -105,6 +105,7 @@ obj-$(CONFIG_HID)

RE: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-01 Thread Xin, Xiaohui
Hi, Michael
That's a great job. We are now working on support VMDq on KVM, and since the 
VMDq hardware presents L2 sorting based on MAC addresses and VLAN tags, our 
target is to implement a zero copy solution using VMDq. We stared from the 
virtio-net architecture. What we want to proposal is to use AIO combined with 
direct I/O:
1) Modify virtio-net Backend service in Qemu to submit aio requests composed 
from virtqueue.
2) Modify TUN/TAP device to support aio operations and the user space buffer 
directly mapping into the host kernel.
3) Let a TUN/TAP device binds to single rx/tx queue from the NIC.
4) Modify the net_dev and skb structure to permit allocated skb to use user 
space directly mapped payload buffer address rather then kernel allocated.

As zero copy is also your goal, we are interested in what's in your mind, and 
would like to collaborate with you if possible.
BTW, we will send our VMDq write-up very soon.

Thanks
Xiaohui

-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of 
Michael S. Tsirkin
Sent: Wednesday, August 19, 2009 11:03 PM
To: net...@vger.kernel.org; virtualization@lists.linux-foundation.org; 
k...@vger.kernel.org; linux-ker...@vger.kernel.org; mi...@elte.hu; 
linux...@kvack.org; a...@linux-foundation.org; h...@zytor.com; 
gregory.hask...@gmail.com
Subject: [PATCHv4 2/2] vhost_net: a kernel-level virtio server

What it is: vhost net is a character device that can be used to reduce
the number of system calls involved in virtio networking.
Existing virtio net code is used in the guest without modification.

There's similarity with vringfd, with some differences and reduced scope
- uses eventfd for signalling
- structures can be moved around in memory at any time (good for migration)
- support memory table and not just an offset (needed for kvm)

common virtio related code has been put in a separate file vhost.c and
can be made into a separate module if/when more backends appear.  I used
Rusty's lguest.c as the source for developing this part : this supplied
me with witty comments I wouldn't be able to write myself.

What it is not: vhost net is not a bus, and not a generic new system
call. No assumptions are made on how guest performs hypercalls.
Userspace hypervisors are supported as well as kvm.

How it works: Basically, we connect virtio frontend (configured by
userspace) to a backend. The backend could be a network device, or a
tun-like device. In this version I only support raw socket as a backend,
which can be bound to e.g. SR IOV, or to macvlan device.  Backend is
also configured by userspace, including vlan/mac etc.

Status:
This works for me, and I haven't see any crashes.
I have not run any benchmarks yet, compared to userspace, I expect to
see improved latency (as I save up to 4 system calls per packet) but not
bandwidth/CPU (as TSO and interrupt mitigation are not supported).

Features that I plan to look at in the future:
- TSO
- interrupt mitigation
- zero copy

Acked-by: Arnd Bergmann 
Signed-off-by: Michael S. Tsirkin 
---
 MAINTAINERS|   10 +
 arch/x86/kvm/Kconfig   |1 +
 drivers/Makefile   |1 +
 drivers/vhost/Kconfig  |   11 +
 drivers/vhost/Makefile |2 +
 drivers/vhost/net.c|  429 
 drivers/vhost/vhost.c  |  664 
 drivers/vhost/vhost.h  |  108 +++
 include/linux/Kbuild   |1 +
 include/linux/miscdevice.h |1 +
 include/linux/vhost.h  |  100 +++
 11 files changed, 1328 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vhost/Kconfig
 create mode 100644 drivers/vhost/Makefile
 create mode 100644 drivers/vhost/net.c
 create mode 100644 drivers/vhost/vhost.c
 create mode 100644 drivers/vhost/vhost.h
 create mode 100644 include/linux/vhost.h

diff --git a/MAINTAINERS b/MAINTAINERS
index b1114cf..de4587f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5431,6 +5431,16 @@ S:   Maintained
 F: Documentation/filesystems/vfat.txt
 F: fs/fat/

+VIRTIO HOST (VHOST)
+P: Michael S. Tsirkin
+M: m...@redhat.com
+L: k...@vger.kernel.org
+L: virtualizat...@lists.osdl.org
+L: net...@vger.kernel.org
+S: Maintained
+F: drivers/vhost/
+F: include/linux/vhost.h
+
 VIA RHINE NETWORK DRIVER
 M: Roger Luethi 
 S: Maintained
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index b84e571..94f44d9 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -64,6 +64,7 @@ config KVM_AMD

 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
+source drivers/vhost/Kconfig
 source drivers/lguest/Kconfig
 source drivers/virtio/Kconfig

diff --git a/drivers/Makefile b/drivers/Makefile
index bc4205d..1551ae1 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -105,6 +105,7 @@ obj-$(CONFIG_HID)   += hid/
 obj-$(CONFIG_PPC_PS3)  += ps3/
 obj-$(CONFIG

RE: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-01 Thread Xin, Xiaohui
>I don't think we should do that with the tun/tap driver. By design, tun/tap is 
>a way to interact >with the
>networking stack as if coming from a device. The only way this connects to an 
>external >adapter is through
>a bridge or through IP routing, which means that it does not correspond to a 
>specific NIC.
>I have worked on a driver I called 'macvtap' in lack of a better name, to add 
>a new tap >frontend to
>the 'macvlan' driver. Since macvlan lets you add slaves to a single NIC 
>device, this gives you >a direct
>connection between one or multiple tap devices to an external NIC, which works 
>a lot better >than when
>you have a bridge inbetween. There is also work underway to add a bridging 
>capability to >macvlan, so
>you can communicate directly between guests like you can do with a bridge.
>Michael's vhost_net can plug into the same macvlan infrastructure, so the work 
>is >complementary.

We use TUN/TAP device to implement the prototype, and agree that it's not the 
only
choice here. We'd compare the two if possible.
And what we cares more about is the modification in the kernel like the net_dev 
and 
skb structures' modifications, thanks.

Thanks
Xiaohui

-Original Message-
From: Arnd Bergmann [mailto:a...@arndb.de] 
Sent: Monday, August 31, 2009 11:24 PM
To: Xin, Xiaohui
Cc: m...@redhat.com; net...@vger.kernel.org; 
virtualization@lists.linux-foundation.org; k...@vger.kernel.org; 
linux-ker...@vger.kernel.org; mi...@elte.hu; linux...@kvack.org; 
a...@linux-foundation.org; h...@zytor.com; gregory.hask...@gmail.com
Subject: Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

On Monday 31 August 2009, Xin, Xiaohui wrote:
> 
> Hi, Michael
> That's a great job. We are now working on support VMDq on KVM, and since the 
> VMDq hardware presents L2 sorting
> based on MAC addresses and VLAN tags, our target is to implement a zero copy 
> solution using VMDq.

I'm also interested in helping there, please include me in the discussions.

> We stared
> from the virtio-net architecture. What we want to proposal is to use AIO 
> combined with direct I/O:
> 1) Modify virtio-net Backend service in Qemu to submit aio requests composed 
> from virtqueue.

right, that sounds useful.

> 2) Modify TUN/TAP device to support aio operations and the user space buffer 
> directly mapping into the host kernel.
> 3) Let a TUN/TAP device binds to single rx/tx queue from the NIC.

I don't think we should do that with the tun/tap driver. By design, tun/tap is 
a way to interact with the
networking stack as if coming from a device. The only way this connects to an 
external adapter is through
a bridge or through IP routing, which means that it does not correspond to a 
specific NIC.

I have worked on a driver I called 'macvtap' in lack of a better name, to add a 
new tap frontend to
the 'macvlan' driver. Since macvlan lets you add slaves to a single NIC device, 
this gives you a direct
connection between one or multiple tap devices to an external NIC, which works 
a lot better than when
you have a bridge inbetween. There is also work underway to add a bridging 
capability to macvlan, so
you can communicate directly between guests like you can do with a bridge.

Michael's vhost_net can plug into the same macvlan infrastructure, so the work 
is complementary.

> 4) Modify the net_dev and skb structure to permit allocated skb to use user 
> space directly mapped payload
> buffer address rather then kernel allocated.

yes.

> As zero copy is also your goal, we are interested in what's in your mind, and 
> would like to collaborate with you if possible.
> BTW, we will send our VMDq write-up very soon.

Ok, cool.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


RE: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-01 Thread Xin, Xiaohui
> One way to share the effort is to make vmdq queues available as normal 
kernel interfaces.  It would take quite a bit of work, but the end 
result is that no other components need to be change, and it makes vmdq 
useful outside kvm.  It also greatly reduces the amount of integration 
work needed throughout the stack (kvm/qemu/libvirt).

Yes. The common queue pair interface which we want to present will also apply 
to normal hardware, and try to leave other components unknown.

Thanks
Xiaohui

-Original Message-
From: Avi Kivity [mailto:a...@redhat.com] 
Sent: Tuesday, September 01, 2009 1:52 AM
To: Xin, Xiaohui
Cc: m...@redhat.com; net...@vger.kernel.org; 
virtualization@lists.linux-foundation.org; k...@vger.kernel.org; 
linux-ker...@vger.kernel.org; mi...@elte.hu; linux...@kvack.org; 
a...@linux-foundation.org; h...@zytor.com; gregory.hask...@gmail.com
Subject: Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

On 08/31/2009 02:42 PM, Xin, Xiaohui wrote:
> Hi, Michael
> That's a great job. We are now working on support VMDq on KVM, and since the 
> VMDq hardware presents L2 sorting based on MAC addresses and VLAN tags, our 
> target is to implement a zero copy solution using VMDq. We stared from the 
> virtio-net architecture. What we want to proposal is to use AIO combined with 
> direct I/O:
> 1) Modify virtio-net Backend service in Qemu to submit aio requests composed 
> from virtqueue.
> 2) Modify TUN/TAP device to support aio operations and the user space buffer 
> directly mapping into the host kernel.
> 3) Let a TUN/TAP device binds to single rx/tx queue from the NIC.
> 4) Modify the net_dev and skb structure to permit allocated skb to use user 
> space directly mapped payload buffer address rather then kernel allocated.
>
> As zero copy is also your goal, we are interested in what's in your mind, and 
> would like to collaborate with you if possible.
>

One way to share the effort is to make vmdq queues available as normal 
kernel interfaces.  It would take quite a bit of work, but the end 
result is that no other components need to be change, and it makes vmdq 
useful outside kvm.  It also greatly reduces the amount of integration 
work needed throughout the stack (kvm/qemu/libvirt).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


RE: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-01 Thread Xin, Xiaohui

>It may be possible to make vmdq appear like an sr-iov capable device 
>from userspace.  sr-iov provides the userspace interfaces to allocate 
>interfaces and assign mac addresses.  To make it useful, you would have 
>to handle tx multiplexing in the driver but that would be much easier to 
>consume for kvm

What we have thought is to support multiple net_dev structures 
according to multiple queue pairs of one vmdq adapter and presents
multiple mac address in user space and each one mac can be used 
by a guest. 
What does the tx multiplexing in the driver exactly mean?

Thanks
Xiaohui

-Original Message-
From: Anthony Liguori [mailto:anth...@codemonkey.ws] 
Sent: Tuesday, September 01, 2009 5:57 AM
To: Avi Kivity
Cc: Xin, Xiaohui; m...@redhat.com; net...@vger.kernel.org; 
virtualization@lists.linux-foundation.org; k...@vger.kernel.org; 
linux-ker...@vger.kernel.org; mi...@elte.hu; linux...@kvack.org; 
a...@linux-foundation.org; h...@zytor.com; gregory.hask...@gmail.com
Subject: Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

Avi Kivity wrote:
> On 08/31/2009 02:42 PM, Xin, Xiaohui wrote:
>> Hi, Michael
>> That's a great job. We are now working on support VMDq on KVM, and 
>> since the VMDq hardware presents L2 sorting based on MAC addresses 
>> and VLAN tags, our target is to implement a zero copy solution using 
>> VMDq. We stared from the virtio-net architecture. What we want to 
>> proposal is to use AIO combined with direct I/O:
>> 1) Modify virtio-net Backend service in Qemu to submit aio requests 
>> composed from virtqueue.
>> 2) Modify TUN/TAP device to support aio operations and the user space 
>> buffer directly mapping into the host kernel.
>> 3) Let a TUN/TAP device binds to single rx/tx queue from the NIC.
>> 4) Modify the net_dev and skb structure to permit allocated skb to 
>> use user space directly mapped payload buffer address rather then 
>> kernel allocated.
>>
>> As zero copy is also your goal, we are interested in what's in your 
>> mind, and would like to collaborate with you if possible.
>>
>
> One way to share the effort is to make vmdq queues available as normal 
> kernel interfaces.

It may be possible to make vmdq appear like an sr-iov capable device 
from userspace.  sr-iov provides the userspace interfaces to allocate 
interfaces and assign mac addresses.  To make it useful, you would have 
to handle tx multiplexing in the driver but that would be much easier to 
consume for kvm.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-08-31 Thread Anthony Liguori
Avi Kivity wrote:
> On 08/31/2009 02:42 PM, Xin, Xiaohui wrote:
>> Hi, Michael
>> That's a great job. We are now working on support VMDq on KVM, and 
>> since the VMDq hardware presents L2 sorting based on MAC addresses 
>> and VLAN tags, our target is to implement a zero copy solution using 
>> VMDq. We stared from the virtio-net architecture. What we want to 
>> proposal is to use AIO combined with direct I/O:
>> 1) Modify virtio-net Backend service in Qemu to submit aio requests 
>> composed from virtqueue.
>> 2) Modify TUN/TAP device to support aio operations and the user space 
>> buffer directly mapping into the host kernel.
>> 3) Let a TUN/TAP device binds to single rx/tx queue from the NIC.
>> 4) Modify the net_dev and skb structure to permit allocated skb to 
>> use user space directly mapped payload buffer address rather then 
>> kernel allocated.
>>
>> As zero copy is also your goal, we are interested in what's in your 
>> mind, and would like to collaborate with you if possible.
>>
>
> One way to share the effort is to make vmdq queues available as normal 
> kernel interfaces.

It may be possible to make vmdq appear like an sr-iov capable device 
from userspace.  sr-iov provides the userspace interfaces to allocate 
interfaces and assign mac addresses.  To make it useful, you would have 
to handle tx multiplexing in the driver but that would be much easier to 
consume for kvm.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-08-31 Thread Avi Kivity
On 08/31/2009 02:42 PM, Xin, Xiaohui wrote:
> Hi, Michael
> That's a great job. We are now working on support VMDq on KVM, and since the 
> VMDq hardware presents L2 sorting based on MAC addresses and VLAN tags, our 
> target is to implement a zero copy solution using VMDq. We stared from the 
> virtio-net architecture. What we want to proposal is to use AIO combined with 
> direct I/O:
> 1) Modify virtio-net Backend service in Qemu to submit aio requests composed 
> from virtqueue.
> 2) Modify TUN/TAP device to support aio operations and the user space buffer 
> directly mapping into the host kernel.
> 3) Let a TUN/TAP device binds to single rx/tx queue from the NIC.
> 4) Modify the net_dev and skb structure to permit allocated skb to use user 
> space directly mapped payload buffer address rather then kernel allocated.
>
> As zero copy is also your goal, we are interested in what's in your mind, and 
> would like to collaborate with you if possible.
>

One way to share the effort is to make vmdq queues available as normal 
kernel interfaces.  It would take quite a bit of work, but the end 
result is that no other components need to be change, and it makes vmdq 
useful outside kvm.  It also greatly reduces the amount of integration 
work needed throughout the stack (kvm/qemu/libvirt).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-08-31 Thread Arnd Bergmann
On Monday 31 August 2009, Xin, Xiaohui wrote:
> 
> Hi, Michael
> That's a great job. We are now working on support VMDq on KVM, and since the 
> VMDq hardware presents L2 sorting
> based on MAC addresses and VLAN tags, our target is to implement a zero copy 
> solution using VMDq.

I'm also interested in helping there, please include me in the discussions.

> We stared
> from the virtio-net architecture. What we want to proposal is to use AIO 
> combined with direct I/O:
> 1) Modify virtio-net Backend service in Qemu to submit aio requests composed 
> from virtqueue.

right, that sounds useful.

> 2) Modify TUN/TAP device to support aio operations and the user space buffer 
> directly mapping into the host kernel.
> 3) Let a TUN/TAP device binds to single rx/tx queue from the NIC.

I don't think we should do that with the tun/tap driver. By design, tun/tap is 
a way to interact with the
networking stack as if coming from a device. The only way this connects to an 
external adapter is through
a bridge or through IP routing, which means that it does not correspond to a 
specific NIC.

I have worked on a driver I called 'macvtap' in lack of a better name, to add a 
new tap frontend to
the 'macvlan' driver. Since macvlan lets you add slaves to a single NIC device, 
this gives you a direct
connection between one or multiple tap devices to an external NIC, which works 
a lot better than when
you have a bridge inbetween. There is also work underway to add a bridging 
capability to macvlan, so
you can communicate directly between guests like you can do with a bridge.

Michael's vhost_net can plug into the same macvlan infrastructure, so the work 
is complementary.

> 4) Modify the net_dev and skb structure to permit allocated skb to use user 
> space directly mapped payload
> buffer address rather then kernel allocated.

yes.

> As zero copy is also your goal, we are interested in what's in your mind, and 
> would like to collaborate with you if possible.
> BTW, we will send our VMDq write-up very soon.

Ok, cool.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization