Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-15 Thread Rusty Russell
On Thu, 2007-04-12 at 06:32 +0300, Avi Kivity wrote:
 I hadn't considered an always-blocking (or unbuffered) networking API. 
 It's very counter to current APIs, but does make sense with things like
 syslets.  Without syslets, I don't think it's very useful as you need
 some artificial threads to keep things humming along.
 
 (How would userspace specify it? O_DIRECT when opening the tap?)

TBH, I hadn't thought that far.  Tap already has those IFF_NO_PI etc
flags, but it might make sense to just be the default.  From userspace's
POV it's not a semantic change.

OK, just tested: I can get 230,000 packets (28 byte UDP) through the tun
device in a second (130,000 actually out the 100-base-T NIC, 100,000
dropped).  If the tun driver's write() blocks until the skb is
destroyed, it's 4,000 packets.

So your intuition was right: skb_free latency on xmit (at least for this
e1000) is far too large for anything but an async solution.

Will ponder further.

Thanks!
Rusty.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-15 Thread Avi Kivity
Rusty Russell wrote:
 On Thu, 2007-04-12 at 06:32 +0300, Avi Kivity wrote:
   
 I hadn't considered an always-blocking (or unbuffered) networking API. 
 It's very counter to current APIs, but does make sense with things like
 syslets.  Without syslets, I don't think it's very useful as you need
 some artificial threads to keep things humming along.

 (How would userspace specify it? O_DIRECT when opening the tap?)
 

 TBH, I hadn't thought that far.  Tap already has those IFF_NO_PI etc
 flags, but it might make sense to just be the default.  From userspace's
 POV it's not a semantic change.

 OK, just tested: I can get 230,000 packets (28 byte UDP) through the tun
 device in a second (130,000 actually out the 100-base-T NIC, 100,000
 dropped).  If the tun driver's write() blocks until the skb is
 destroyed, it's 4,000 packets.

 So your intuition was right: skb_free latency on xmit (at least for this
 e1000) is far too large for anything but an async solution.

 Will ponder further.
   

I think aio_write (but done copyless-lessly) is the way to go.  Not only
is the infrastructure there, but the API already allows for multiple
packet submission and for batching completions.  Fitting into that
framework ought to be easier than starting yet another one.

It still misses scatter/gather and integration with fd-based
notification, but there are patches around for that.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-11 Thread Avi Kivity

Rusty Russell wrote:

On Wed, 2007-04-11 at 07:26 +0300, Avi Kivity wrote:
  

Nope.  Being async is critical for copyless networking:

- in the transmit path, so need to stop the sender (guest) from touching
the memory until it's on the wire.  This means 100% of packets sent will
be blocked.



Hi Avi,

You keep saying stuff like this, and I keep ignoring it.  OK, I'll
bite:

Why would we try to prevent the sender from altering the packets?

  


To avoid data corruption.

The guest wants to send a packet.  It calls write(), which causes an skb 
to be allocated, data to be copied into it, the entire networking stack 
gets into gear, and the guest-side driver instructs the device to send 
the packet.


With async operations, the saga continues like this: the host-side 
driver allocates an skb, get_page()s and attaches the data to the new 
skb, this skb crosses the bridge, trickles into the real ethernet 
device, gets queued there, sent, interrupts fire, triggering async 
completion.  On this completion, we send a virtual interrupt to the 
guest, which tells it to destroy the skb and reclaim the pages attached 
to it.


Without async operations, we don't have a hook to notify the guest when 
to reclaim the skb.  If we do it too soon, the skb can be reclaimed and 
the memory reused before the real device gets to see it, so we end up 
sending data that we did not intend.  The only way to avoid it is to 
copy the data somewhere safe, but that is exactly what we don't want to do.



- multiple packets per operation (for interrupt mitigation) (like
lio_listio)



The benefits for interrupt mitigation are less clear to me in a virtual
environment (scheduling tends to make it happen anyway); I'd want to
benchmark it.

  


Yes, the guest will probably submit multiple packets in one hypercall.  
It would be nice for the userspace driver to be able to submit them to 
the host kernel in one syscall.



Some kind of batching to reduce syscall overhead, perhaps, but TSO would
go a fair way towards that anyway (probably not enough).

  


For some workloads, sure.



- scatter/gather packets (iovecs)



Yes, and this is already present in the tap device.  Anthony suggested a
slightly nasty hack for multiple sg packets in one writev()/readv, which
could also give us batching.

  


No need for hacks if we get list aio support one day.


- configurable wakeup (by packet count/timeout) for queue management



I'm not convinced that this is a showstopper, though.
  


It probably isn't.  It's free with aio though.

  

- hacks (tso)



I'd usually go for a batch interface over TSO, but if the card we're
sending to actually does TSO then TSO will probably win.
  


Sure, if tso helps a regular host then it should help one that happens 
to be running a virtual machine.


  

Most of these can be provided by a combination of the pending aio work,
the pending aio/fd integration, and the not-so-pending tap aio work.  As
the first two are available as patches and the third is limited to the
tap device, it is not unreasonable to try it out.  Maybe it will turn
out not to be as difficult as I predicted just a few lines above.



Indeed, I don't think we're asking for a revolution a-la VJ-style
channels.  But I'm still itching to get back to that, and this might yet
provide an excuse 8)
  


I'll be happy if this can be made to work.  It will make the paravirt 
guest-side driver work in kvm-less setups, which are useful for testing, 
and of course reduction in kernel code is beneficial.  It will be slower 
that in-kernel, but if we get the batching right, perhaps not 
significantly slower.  I'm mostly concerned that this depends on code 
that has eluded merging for such a long time.



--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-11 Thread Rusty Russell
On Wed, 2007-04-11 at 17:28 +0300, Avi Kivity wrote:
 Rusty Russell wrote:
  On Wed, 2007-04-11 at 07:26 +0300, Avi Kivity wrote:

  Nope.  Being async is critical for copyless networking:
 
 With async operations, the saga continues like this: the host-side 
 driver allocates an skb, get_page()s and attaches the data to the new 
 skb, this skb crosses the bridge, trickles into the real ethernet 
 device, gets queued there, sent, interrupts fire, triggering async 
 completion.  On this completion, we send a virtual interrupt to the 
 guest, which tells it to destroy the skb and reclaim the pages attached 
 to it.

Hi Avi!

Thanks for spelling it out, I now understand your POV.  I had
considered it obvious that a (non-async) write which didn't copy would
block until the skb was finished with, which is easy to code up within
the tap device itself.  Otherwise it's actually an async write without a
notification mechanism, which I agree is broken.

Note though: if the guest can change the packet headers they can
subvert some firewall rules and possibly crash the host.  None of the
networking code I wrote expects packets to change in flight 8(

This applies to a userspace or kernelspace driver.

  Yes, and this is already present in the tap device.  Anthony suggested a
  slightly nasty hack for multiple sg packets in one writev()/readv, which
  could also give us batching.
 
 No need for hacks if we get list aio support one day.

As you point out though, aio is not something we want to hold our breath
for.  Plus, aio never makes things simpler, and complexity kills
puppies.

Cheers!
Rusty.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-10 Thread Evgeniy Polyakov
On Mon, Apr 09, 2007 at 04:38:18PM +0300, Avi Kivity ([EMAIL PROTECTED]) wrote:
 But I don't get this we can enhance the kernel but not userspace vibe
 8(
   
 
 I've been waiting for network aio since ~2003.  If it arrives in the 
 next few days, I'm all for it; much more than kvm can use it 
 profitably.  But I'm not going to write that interface myself.

Hmm, you missed at least two implementations of network aio in the 
previous year, and now with syslets we can have third one.

But it looks from this discussion, that it will not prevent from
changing in-kernel driver - place a hook into skb allocation path and
allocate data from opposing memory - get pages from another side and put
them into fragments, then copy headers into skb-data.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-10 Thread Avi Kivity
Evgeniy Polyakov wrote:
 On Mon, Apr 09, 2007 at 04:38:18PM +0300, Avi Kivity ([EMAIL PROTECTED]) 
 wrote:
   
 But I don't get this we can enhance the kernel but not userspace vibe
 8(
  
   
 I've been waiting for network aio since ~2003.  If it arrives in the 
 next few days, I'm all for it; much more than kvm can use it 
 profitably.  But I'm not going to write that interface myself.
 

 Hmm, you missed at least two implementations of network aio in the 
 previous year, and now with syslets we can have third one.
   

I meant, network aio in the mainline kernel.  I am aware of the various
out-of-tree implementations.

 But it looks from this discussion, that it will not prevent from
 changing in-kernel driver - place a hook into skb allocation path and
 allocate data from opposing memory - get pages from another side and put
 them into fragments, then copy headers into skb-data.
   

I don't understand this (opposing memory, another side?).  Can you
elaborate?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-10 Thread Evgeniy Polyakov
On Tue, Apr 10, 2007 at 11:19:52AM +0300, Avi Kivity ([EMAIL PROTECTED]) wrote:
 I meant, network aio in the mainline kernel.  I am aware of the various
 out-of-tree implementations.

If potential users do not pay attention to initial implementaion, it is
quite hard to them to get into. But actually it does not matter to this
discussion.

  But it looks from this discussion, that it will not prevent from
  changing in-kernel driver - place a hook into skb allocation path and
  allocate data from opposing memory - get pages from another side and put
  them into fragments, then copy headers into skb-data.

 
 I don't understand this (opposing memory, another side?).  Can you
 elaborate?

You want to implement zero-copy network device between host and guest, if
I understood this thread correctly?
So, for sending part, device allocates pages from receiver's memory (or
from shared memory), receiver gets an 'interrupt' and got pages from own
memory, which are attached to new skb and transferred up to the network
stack.
It can be extended to use shared ring of pages.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-10 Thread Avi Kivity

Evgeniy Polyakov wrote:

But it looks from this discussion, that it will not prevent from
changing in-kernel driver - place a hook into skb allocation path and
allocate data from opposing memory - get pages from another side and put
them into fragments, then copy headers into skb-data.
  
  

I don't understand this (opposing memory, another side?).  Can you
elaborate?



You want to implement zero-copy network device between host and guest, if
I understood this thread correctly?
So, for sending part, device allocates pages from receiver's memory (or
from shared memory), receiver gets an 'interrupt' and got pages from own
memory, which are attached to new skb and transferred up to the network
stack.
It can be extended to use shared ring of pages.
  


This is what Xen does.  It is actually less performant than copying, IIRC.

The problem with flipping pages around is that physical addresses are 
cached both in the kvm mmu and in the on-chip tlbs, necessitating 
expensive page table walks and tlb invalidation IPIs.


Note that for sending from the guest an external host can be done 
copylessly, and for the receive side using a dma engine (like I/OAT) can 
reduce the cost of the copy.


--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-10 Thread Evgeniy Polyakov
On Tue, Apr 10, 2007 at 02:21:24PM +0300, Avi Kivity ([EMAIL PROTECTED]) wrote:
 You want to implement zero-copy network device between host and guest, if
 I understood this thread correctly?
 So, for sending part, device allocates pages from receiver's memory (or
 from shared memory), receiver gets an 'interrupt' and got pages from own
 memory, which are attached to new skb and transferred up to the network
 stack.
 It can be extended to use shared ring of pages.
   
 
 This is what Xen does.  It is actually less performant than copying, IIRC.
 
 The problem with flipping pages around is that physical addresses are 
 cached both in the kvm mmu and in the on-chip tlbs, necessitating 
 expensive page table walks and tlb invalidation IPIs.

Hmm, I'm not familiar with Xen driver, but similar technique was used
with zero-copy network sniffer some time ago, substituting userspace
pages with pages containing skb data was about 25-50% faster than
copying 1500 bytes in general, and in order of 10 times faster in some
cases.

Check a link please in case we are talking about different ideas:
http://marc.info/?l=linux-netdevm=112262743505711w=2

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-10 Thread Avi Kivity

Evgeniy Polyakov wrote:

This is what Xen does.  It is actually less performant than copying, IIRC.

The problem with flipping pages around is that physical addresses are 
cached both in the kvm mmu and in the on-chip tlbs, necessitating 
expensive page table walks and tlb invalidation IPIs.



Hmm, I'm not familiar with Xen driver, but similar technique was used
with zero-copy network sniffer some time ago, substituting userspace
pages with pages containing skb data was about 25-50% faster than
copying 1500 bytes in general, and in order of 10 times faster in some
cases.

Check a link please in case we are talking about different ideas:
http://marc.info/?l=linux-netdevm=112262743505711w=2

  


I don't really understand what you're testing there.  in particular, how 
can the copying time change so dramatically depending on whether you've 
just rebooted or not?




--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-10 Thread Evgeniy Polyakov
On Tue, Apr 10, 2007 at 03:17:45PM +0300, Avi Kivity ([EMAIL PROTECTED]) wrote:
 Check a link please in case we are talking about different ideas:
 http://marc.info/?l=linux-netdevm=112262743505711w=2
 
   
 
 I don't really understand what you're testing there.  in particular, how 
 can the copying time change so dramatically depending on whether you've 
 just rebooted or not?
 
I tested page remapping time - i.e. time to replace a page in two
different mappings - the same should be performed in host and guest
kernels if such design is going to be used for communication.

I can only explain after-reboot slow copy with empty caches - arbitrary
kernel pages were copied into buffer (not the same data as in posted
code).

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-10 Thread Avi Kivity

Evgeniy Polyakov wrote:

On Tue, Apr 10, 2007 at 03:17:45PM +0300, Avi Kivity ([EMAIL PROTECTED]) wrote:
  

Check a link please in case we are talking about different ideas:
http://marc.info/?l=linux-netdevm=112262743505711w=2

 
  
I don't really understand what you're testing there.  in particular, how 
can the copying time change so dramatically depending on whether you've 
just rebooted or not?

 
I tested page remapping time - i.e. time to replace a page in two

different mappings - the same should be performed in host and guest
kernels if such design is going to be used for communication.

I can only explain after-reboot slow copy with empty caches - arbitrary
kernel pages were copied into buffer (not the same data as in posted
code).
  


Doing this in kvm would be significantly more complex, as we'd need to 
use full reverse mapping to locate all guest mappings (we already 
reverse map writable pages for other reasons), so the 25-50% difference 
might be nullified or even turn into overhead.


Here are the Xen numbers for reference.  Xen probably has more overhead 
than kvm for such things, though, as it needs to do hypercalls from dom0 
which is in-kernel for kvm.


http://lists.xensource.com/archives/html/xen-devel/2007-03/msg01218.html

--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-10 Thread Rusty Russell
On Mon, 2007-04-09 at 16:38 +0300, Avi Kivity wrote:
 Moreover, some things just don't lend themselves to a userspace 
 abstraction.  If we want to expose tso (tcp segmentation offload), we 
 can easily do so with a kernel driver since the kernel interfaces are 
 all tso aware.  Tacking on tso awareness to tun/tap is doable, but at 
 the very least wierd.

It is kinda weird, yes, but it certainly makes sense.  All the arguments
for tso apply in triplicate to userspace packet sends...

  We're dealing with the tun/tap device here, not a socket.
 
 Hmm.  tun actually has aio_write implemented, but it seems synchronous.  
 So does the read path.
 
 If these are made truly asynchronous, and the write path is made in 
 addition copyless, then we might have something workable.  I still 
 cringe at having a pagetable walk in order to deliver a 1500-byte packet.

Right, now we're talking!

However, it's not clear to me why creating an skb which references a kvm
guest's memory doesn't need a pagetable walk, but a packet in (other)
userspace memory does?

My conviction which started this discussion is that if we can offer an
efficient interface for kvm, we should be able to offer an efficient
interface for any (other) userspace.

As to async, I'm not *so* worried about that for the moment, although it
would probably be nicer to fail than to block.  Otherwise we could
simply set an skb destructor to wake us up.

  Again, sendfile is a *much* harder problem than sending a single packet
  once, which is the question here.
 
 sendfile() is a *different* problem.  It doesn't need completion because 
 the data is assumed not to change under it.

Well, let's not argue over that, it's irrelevant.  Hopefully we can do
that over a beer or equivalent sometime.

I think the first step is to see how much worse a decent userspace net
driver is compared with the current in-kernel one.

Rusty.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-10 Thread Avi Kivity
Rusty Russell wrote:
 On Mon, 2007-04-09 at 16:38 +0300, Avi Kivity wrote:
   
 Moreover, some things just don't lend themselves to a userspace 
 abstraction.  If we want to expose tso (tcp segmentation offload), we 
 can easily do so with a kernel driver since the kernel interfaces are 
 all tso aware.  Tacking on tso awareness to tun/tap is doable, but at 
 the very least wierd.
 

 It is kinda weird, yes, but it certainly makes sense.  All the arguments
 for tso apply in triplicate to userspace packet sends...

   

Well, write() with a large buffer is a sort of tso device.  The problem
is tso breaks through several layers (like I'm advocating in the other
thread :), pushing tcp functionality into ethernet.  Well, we've seen worse.


 We're dealing with the tun/tap device here, not a socket.
   
 Hmm.  tun actually has aio_write implemented, but it seems synchronous.  
 So does the read path.

 If these are made truly asynchronous, and the write path is made in 
 addition copyless, then we might have something workable.  I still 
 cringe at having a pagetable walk in order to deliver a 1500-byte packet.
 

 Right, now we're talking!

 However, it's not clear to me why creating an skb which references a kvm
 guest's memory doesn't need a pagetable walk, but a packet in (other)
 userspace memory does?
   

Currently guest pages are stashed in a kernel array, as well as being
mmap()ed into user space.

That's not a very strong argument though, as I'd like to be map
userspace memory into the guest, or map address_spaces to the guest, or
something, so accessing guest physical memory will become more expensive
in time.

 My conviction which started this discussion is that if we can offer an
 efficient interface for kvm, we should be able to offer an efficient
 interface for any (other) userspace.
   

Fully agreed.  It's mostly a question of who and when.  Designing and
implementing this interface is going to be difficult, require deep
knowledge of Linux networking, and consume a lot of time.

 As to async, I'm not *so* worried about that for the moment, although it
 would probably be nicer to fail than to block.  Otherwise we could
 simply set an skb destructor to wake us up.
   

Nope.  Being async is critical for copyless networking:

- in the transmit path, so need to stop the sender (guest) from touching
the memory until it's on the wire.  This means 100% of packets sent will
be blocked.
- in the receive path, you could separate receive notification from the
single copy that must be done (like poll() + read()), but to make use of
dma engines you need to provide the end address beforehand.

 I think the first step is to see how much worse a decent userspace net
 driver is compared with the current in-kernel one.
   

A userspace net interface needs to provide the following:

- true async operations
- multiple packets per operation (for interrupt mitigation) (like
lio_listio)
- scatter/gather packets (iovecs)
- configurable wakeup (by packet count/timeout) for queue management
- hacks (tso)

Most of these can be provided by a combination of the pending aio work,
the pending aio/fd integration, and the not-so-pending tap aio work.  As
the first two are available as patches and the third is limited to the
tap device, it is not unreasonable to try it out.  Maybe it will turn
out not to be as difficult as I predicted just a few lines above.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-09 Thread Avi Kivity

Rusty Russell wrote:

On Sun, 2007-04-08 at 08:36 +0300, Avi Kivity wrote:
  

Rusty Russell wrote:


Hi Avi,

I don't think you've thought about this very hard.  The receive copy is
completely independent with whether the packet is going to the guest via
a kernel driver or via userspace, so not relevant.
  
  
A packet received in the kernel cannot be made available to userspace in 
a safe manner without a copy, as it will not be aligned with page 
boundaries, so userspace cannot examine the packet until after one copy 
has occured.



Hi Avi!

I'm a little puzzled by your response.  Hmm...

lguest's userspace network frontend does exactly as many copies as
Ingo's in-host-kernel code.  One from the Guest, one to the Guest.

  


kvm pvnet is suboptimal now.  The number of copies could be reduced by 
two (to zero), by constructing an skb that points to guest memory.  
Right now, this can only be done in-kernel.


With current userspace networking interfaces, one cannot build a network 
device that has less than one copy on transmit, because sendmsg() *must* 
copy the data (as there is no completion notification).  sendfilev(), 
even if it existed, cannot be used: it is copyless, but lacks completion 
notification.  It is useful only on unchanging data like read-only files.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-09 Thread Rusty Russell
On Mon, 2007-04-09 at 10:10 +0300, Avi Kivity wrote:
 Rusty Russell wrote:
  I'm a little puzzled by your response.  Hmm...
 
  lguest's userspace network frontend does exactly as many copies as
  Ingo's in-host-kernel code.  One from the Guest, one to the Guest.
 
 kvm pvnet is suboptimal now.  The number of copies could be reduced by 
 two (to zero), by constructing an skb that points to guest memory.  
 Right now, this can only be done in-kernel.

Sorry, you lost me here.  You mean both input and output copies can be
eliminated?  Or are you talking about another two copies somewhere?

But I don't get this we can enhance the kernel but not userspace vibe
8(

 With current userspace networking interfaces, one cannot build a network 
 device that has less than one copy on transmit, because sendmsg() *must* 
 copy the data (as there is no completion notification).

Why are you talking about sendmsg()?  Perhaps this is where we're
getting tangled up.

We're dealing with the tun/tap device here, not a socket.

  sendfilev(), 
 even if it existed, cannot be used: it is copyless, but lacks completion 
 notification.  It is useful only on unchanging data like read-only files.

Again, sendfile is a *much* harder problem than sending a single packet
once, which is the question here.

Rusty.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-09 Thread Avi Kivity

Rusty Russell wrote:

On Mon, 2007-04-09 at 10:10 +0300, Avi Kivity wrote:
  

Rusty Russell wrote:


I'm a little puzzled by your response.  Hmm...

lguest's userspace network frontend does exactly as many copies as
Ingo's in-host-kernel code.  One from the Guest, one to the Guest.
  
kvm pvnet is suboptimal now.  The number of copies could be reduced by 
two (to zero), by constructing an skb that points to guest memory.  
Right now, this can only be done in-kernel.



Sorry, you lost me here.  You mean both input and output copies can be
eliminated?  Or are you talking about another two copies somewhere?
  


On the transmit path, current kvm pvnet has two copies:

1.  on the guest side, the driver copies the skb data into the shared ring
2. on the host side, the device copies the data from the ring into a 
newly allocated skb


Both of these copies can be eliminated with a host-side kernel.  With 
current userspace interfaces, only one copy can be eliminated.


Similar logic applies to receive, except that one copy must remain.


But I don't get this we can enhance the kernel but not userspace vibe
8(
  


I've been waiting for network aio since ~2003.  If it arrives in the 
next few days, I'm all for it; much more than kvm can use it 
profitably.  But I'm not going to write that interface myself.


Moreover, some things just don't lend themselves to a userspace 
abstraction.  If we want to expose tso (tcp segmentation offload), we 
can easily do so with a kernel driver since the kernel interfaces are 
all tso aware.  Tacking on tso awareness to tun/tap is doable, but at 
the very least wierd.


  
With current userspace networking interfaces, one cannot build a network 
device that has less than one copy on transmit, because sendmsg() *must* 
copy the data (as there is no completion notification).



Why are you talking about sendmsg()?  Perhaps this is where we're
getting tangled up.

We're dealing with the tun/tap device here, not a socket.

  


Hmm.  tun actually has aio_write implemented, but it seems synchronous.  
So does the read path.


If these are made truly asynchronous, and the write path is made in 
addition copyless, then we might have something workable.  I still 
cringe at having a pagetable walk in order to deliver a 1500-byte packet.



 sendfilev(), 
even if it existed, cannot be used: it is copyless, but lacks completion 
notification.  It is useful only on unchanging data like read-only files.



Again, sendfile is a *much* harder problem than sending a single packet
once, which is the question here.
  


sendfile() is a *different* problem.  It doesn't need completion because 
the data is assumed not to change under it.


Consider that the guest may be issuing a megabyte-sized sendfile() which 
is broken into 17 tso frames.  We need to preserve the large structures 
as much as possible or we end up repeating the simple single packet 
once path 700 times.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-08 Thread Muli Ben-Yehuda
On Sun, Apr 08, 2007 at 08:36:14AM +0300, Avi Kivity wrote:

 That is not the common case.  Nor is it true when there is a
 mismatch between the card's capabilties and guest expectations and
 constraints.  For example, guest memory is not physically contiguous
 so a NIC that won't do scatter/gather will require bouncing (or an
 iommu, but that's not here yet).

Actually, Allen Key from Intel just posted the first VT-d patches to
xen-devel a couple of days ago. I wonder if anyone is working on kvm
support (which would require Linux support).

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-08 Thread Rusty Russell
On Sun, 2007-04-08 at 08:36 +0300, Avi Kivity wrote:
 Rusty Russell wrote:
  Hi Avi,
 
  I don't think you've thought about this very hard.  The receive copy is
  completely independent with whether the packet is going to the guest via
  a kernel driver or via userspace, so not relevant.

 
 A packet received in the kernel cannot be made available to userspace in 
 a safe manner without a copy, as it will not be aligned with page 
 boundaries, so userspace cannot examine the packet until after one copy 
 has occured.

Hi Avi!

I'm a little puzzled by your response.  Hmm...

lguest's userspace network frontend does exactly as many copies as
Ingo's in-host-kernel code.  One from the Guest, one to the Guest.

Does that clarify?
Rusty.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-07 Thread Avi Kivity

Rusty Russell wrote:

On Thu, 2007-04-05 at 10:17 +0300, Avi Kivity wrote:
  

Rusty Russell wrote:


You didn't quote Anthony's point about it's more about there not being
good enough userspace interfaces to do network IO.

It's easier to write a kernel-space network driver, but it's not
obviously the right thing to do until we can show that an efficient
packet-level userspace interface isn't possible.  I don't think that's
been done, and it would be interesting to try.
  
  
In the case of networking, the copyful interfaces on receive are driven 
by the hardware not knowing how to split the header from the data.  On 
transmit I agree, it could be made copyless from userspace (somthing 
like sendfilev, only not file oriented).



Hi Avi,

I don't think you've thought about this very hard.  The receive copy is
completely independent with whether the packet is going to the guest via
a kernel driver or via userspace, so not relevant.
  


A packet received in the kernel cannot be made available to userspace in 
a safe manner without a copy, as it will not be aligned with page 
boundaries, so userspace cannot examine the packet until after one copy 
has occured.  After userspace has determined what to do with the packet, 
another copy must take place to get it there.


There's a counterexample, mmapped sockets, but that works only when all 
packets arriving on a card are exposed to the same process.  This is 
useful for tcpdump or for what you outline below but is hardly generic.



And if all packets from the card are going to the guest, you can
deliver directly.  Userspace or kernel, no difference.
  


That is not the common case.  Nor is it true when there is a mismatch 
between the card's capabilties and guest expectations and constraints.  
For example, guest memory is not physically contiguous so a NIC that 
won't do scatter/gather will require bouncing (or an iommu, but that's 
not here yet).



And we have a sendfilev not file oriented: it's called writev 8)
  


writev() cannot be made copyless for networking.  One needs an async 
interface so the kernel can complete the write after the NIC acks the 
dma transfer, or a kernel driver.



An in-kernel driver can avoid system call overhead and page references.
But a better tap device helps more than just KVM.
  


I'll believe it when I see it.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-06 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 * Anthony Liguori [EMAIL PROTECTED] wrote:
 
  [...] Did Linux have extremely high quality code in 1994?
 
 yes! It was crutial to strive for extremely high quality code all the 
 time. That was the only way to grow Linux's codebase, which was 
 ~300,000 lines of code in 1994, to the current 7.2+ million lines of 
 code, without losing maintainability. [...]

in fact Linux 1.0, released in early 1994, was only 170,000 LOC:

  http://www.kernel.org/pub/linux/kernel/v1.0/linux-1.0.tar.gz

and i just looked at a few random files in it - it's pretty clean.

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-06 Thread Ingo Molnar

* Rusty Russell [EMAIL PROTECTED] wrote:

  prototyping new kernel APIs to implement user-space network drivers, 
  on a crufty codebase is not something that should be done lightly.
 
 I think you overestimate my radicalism.  I was considering readv() and 
 writev() on the tap device.

ok :-) How would packeting be handled, or would this be alike a raw 
socket in essence, but not in 'peek' but 'filter through' mode? I think 
it's not quite trivial. (but maybe i'm way too radical again :)

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-05 Thread Ingo Molnar

* Avi Kivity [EMAIL PROTECTED] wrote:

 [...] But the difference in cruftiness between kvm and qemu code 
 should not enter into the discussion of where to do things.

i agree that it doesnt enter the discussion for the *PIC question, but 
it very much enters the discussion for the question that i replied to:

   You didn't quote Anthony's point about it's more about there not 
   being good enough userspace interfaces to do network IO.
   
   It's easier to write a kernel-space network driver, but it's not
obviously the right thing to do until we can show that an 
   efficient packet-level userspace interface isn't possible.  I 
   don't think that's been done, and it would be interesting to try.

prototyping new kernel APIs to implement user-space network drivers, on 
a crufty codebase is not something that should be done lightly. Any 
negative result will not bring us any real conclusion. (was the failure 
due to the concept, due the API or due to the crufty codebase?)

(but ... this is really a side-track issue for the *PIC question at 
hand. PICs are not network devices, they are essential platform 
components and almost an extended part of the CPU.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-05 Thread Avi Kivity

Ingo Molnar wrote:
so right now the only option for a clean codebase is the KVM in-kernel 
code.


I strongly disagree with this.  Bad code in userspace is not an excuse 
for shoving stuff into the kernel, where maintaining it is much more 
expensive, and the cause of a mistake can be system crashes and data 
loss, affecting unrelated processes.  If we move something into the 
kernel, we'd better have a really good reason for it.


Qemu code _is_ crufty.  We can do one of three things:
1. live with it
2. fork it and clean it up
3. clean it up incrementally and merge it upstream

Currently we're doing (1).  You're suggesting a variant of (2), fork 
plus move into the kernel.  The right thing to do IMO is (3), but I 
don't see anybody volunteering.  Qemu picked up additional committers 
recently and I believe they would be receptive to cleanups.


[In the *pic/pit case, we have other reasons to push things into the 
kernel.  But this code is crap, let's rewrite it in the kernel is not 
a justification I'll accept.  I'd be much happier if we could quantify 
these other reasons.]



--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-05 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 * Rusty Russell [EMAIL PROTECTED] wrote:
 
  It's easier to write a kernel-space network driver, but it's not 
  obviously the right thing to do until we can show that an efficient 
  packet-level userspace interface isn't possible.  I don't think 
  that's been done, and it would be interesting to try.
 
 yes, i agree in theory, [...]

let me explain my position a bit more verbosely:

i agree in terms of 'network driver' (and more generally in terms of 
'device', which includes network, storage, console, etc. devices): 
having a user-space driver option should still be possible and it should 
be integrated well. Qemu is quite rich and flexible in these areas and 
we dont want to throw away or isolate that body of code.

but i dont agree in terms of PIC code, which is the main argument in 
this particular thread. There's little precedent for any add-ons for 
PICs in user-space, nor any particular PIC handling richness in Qemu 
that we'd like to preserve.

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-05 Thread Avi Kivity

Ingo Molnar wrote:

* Avi Kivity [EMAIL PROTECTED] wrote:

  
so right now the only option for a clean codebase is the KVM 
in-kernel code.
  

I strongly disagree with this.



are you disagreeing with my statement that the KVM kernel-side code is 
the only clean codebase here? To me this is a clear fact :)
  


No, I agree with that.  I just disagree with choosing to put the *pic 
code (or other code) into the kernel on *that* basis.  The selection 
should be on design/performance issues alone, *not* the state of 
existing code.


I only pointed out that the only clean codebase at the moment is the KVM 
in-kernel code - i did not make the argument (at all) that every new 
piece of KVM code should be done in the kernel. That would be stupid - 
do you think i'd advocate for example moving command line argument 
parsing into the kernel?
  


No.  But the difference in cruftiness between kvm and qemu code should 
not enter into the discussion of where to do things.


and as i said in the mail: the kernel _is_ the best place to do this 
particular stuff.
  


I agree with this, maybe for different reasons.


--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-05 Thread Ingo Molnar

* Avi Kivity [EMAIL PROTECTED] wrote:

  so right now the only option for a clean codebase is the KVM 
  in-kernel code.
 
 I strongly disagree with this.

are you disagreeing with my statement that the KVM kernel-side code is 
the only clean codebase here? To me this is a clear fact :)

I only pointed out that the only clean codebase at the moment is the KVM 
in-kernel code - i did not make the argument (at all) that every new 
piece of KVM code should be done in the kernel. That would be stupid - 
do you think i'd advocate for example moving command line argument 
parsing into the kernel?

and as i said in the mail: the kernel _is_ the best place to do this 
particular stuff.

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-05 Thread Ingo Molnar

* Rusty Russell [EMAIL PROTECTED] wrote:

 It's easier to write a kernel-space network driver, but it's not 
 obviously the right thing to do until we can show that an efficient 
 packet-level userspace interface isn't possible.  I don't think that's 
 been done, and it would be interesting to try.

yes, i agree in theory, but IMO this is largely beside the point. What 
matters most for developing a project is _the quality of the codebase_. 
That attracts developers, developers improve the code, which then 
attracts users, which attracts more developers, etc., etc. As long as 
the quality of the codebase is maintained, this is a self-sustaining 
process. You've seen that happen with Linux. [ And of course, the 
crutial step #0 is: a sane, open-minded maintainer with good taste ;-) ]

qemu's code quality is not really suitable for that basic OSS model, in 
my opinion. It has been a mostly one-man show for a long time with 
various hostile forks, bin-only kernel module and other actions that 
easily poison an OSS project.

the result is not surprising: important portions of qemu have grown into 
a hard to hack, hard to maintain codebase with poor code quality, with 
gems like:

 #ifdef _WIN32
 void CALLBACK host_alarm_handler(UINT uTimerID, UINT uMsg,
  DWORD_PTR dwUser, DWORD_PTR dw1, DWORD_PTR 
dw2)
 #else
 static void host_alarm_handler(int host_signum)
 #endif
 {
 #if 0
 #define DISP_FREQ 1000

and that's not just some random driver - this is _the_ main central 
timer code of qemu.

so right now the only option for a clean codebase is the KVM in-kernel 
code. It's clean and sweet and integrates nicely into the rest of the 
kernel. The kernel is also obviously the final place where most 
virtualization technologies want to show up because it's the entity that 
is the closest to the guest context: we _dont_ want to _force_ network 
traffic (let alone interrupt handling) through a userspace context, only 
if the functionality of the task absolutely requires it. (but in most 
cases we'll try to come up with a maximally flexible scheme that can 
just drive things straight via the kernel. netfilter/iptables isnt in 
user-space either, partly for that reason.)

but architectural issues aside (ignoring that the kernel _is_ the best 
place to do this particular of stuff), this question is still mainly 
dominated by the basic question of code quality. I'd rather move 
something into the Linux kernel, enforce its code quality that way, and 
_then_ add whatever clean infrastructure is needed to push it back into 
user-space again (into a different codebase), than having to hack the 
monolithic 200 KLOC+ qemu codebase that is shackled with support for 
tons of arcane architectures nobody uses and tons of arcane OS variants 
that no-one cares about. Now qemu is a very important enabler and 
platform-reference-implementation for KVM to fall back to, but it's not 
the place to put crutial new code into, at least currently.

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-05 Thread Avi Kivity

Rusty Russell wrote:

You didn't quote Anthony's point about it's more about there not being
good enough userspace interfaces to do network IO.

It's easier to write a kernel-space network driver, but it's not
obviously the right thing to do until we can show that an efficient
packet-level userspace interface isn't possible.  I don't think that's
been done, and it would be interesting to try.
  


In the case of networking, the copyful interfaces on receive are driven 
by the hardware not knowing how to split the header from the data.  On 
transmit I agree, it could be made copyless from userspace (somthing 
like sendfilev, only not file oriented).


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-05 Thread Anthony Liguori

Ingo Molnar wrote:

* Rusty Russell [EMAIL PROTECTED] wrote:

  
It's easier to write a kernel-space network driver, but it's not 
obviously the right thing to do until we can show that an efficient 
packet-level userspace interface isn't possible.  I don't think that's 
been done, and it would be interesting to try.



yes, i agree in theory, but IMO this is largely beside the point. What 
matters most for developing a project is _the quality of the codebase_. 
That attracts developers, developers improve the code, which then 
attracts users, which attracts more developers, etc., etc. As long as 
the quality of the codebase is maintained, this is a self-sustaining 
process. You've seen that happen with Linux. [ And of course, the 
crutial step #0 is: a sane, open-minded maintainer with good taste ;-) ]


qemu's code quality is not really suitable for that basic OSS model, in 
my opinion.


I think you may want to step off your high horse there.  QEMU's code may 
not be Linux kernel quality but it's certainly not anywhere near the 
worst that is out there.  Linux is over decade old.  QEMU is only around 
3 years old.  Did Linux have extremely high quality code in 1994?  
Instead of posting code snippets to LKML, it would be much more 
constructive to post patches to qemu-devel.  It's not like the QEMU 
maintainers are actively ignoring your efforts to improve the code.


but architectural issues aside (ignoring that the kernel _is_ the best 
place to do this particular of stuff),


Right.  We don't put things in the kernel just because we don't like the 
way the userspace code is written.  If that logic was valid, then Linus 
would be working on moving all of Gnome into the kernel.


This discussion has two parts.  The first is whether or not the kernel 
is the right place for a paravirtual network driver backend.  My current 
believe is that we could not get enough performance from something like 
tun to do it in userspace.  I also believe that we could improve tun (or 
create a replacement) so that we could implement a PV network driver 
backend in userspace.  Admittedly, I'm not an expert in networking 
though so I could be wrong here.


The second part is whether the platform devices should go in the 
kernel.  I agree with you that having the PIT in the kernel is probably 
a good idea.  I also agree that we probably have no choice but to move 
the APIC into the kernel (not for PV drivers, but for TPR performance 
and SMP support).


Regards,

Anthony Liguori

 this question is still mainly 
dominated by the basic question of code quality. I'd rather move 
something into the Linux kernel, enforce its code quality that way, and 
_then_ add whatever clean infrastructure is needed to push it back into 
user-space again (into a different codebase), than having to hack the 
monolithic 200 KLOC+ qemu codebase that is shackled with support for 
tons of arcane architectures nobody uses and tons of arcane OS variants 
that no-one cares about. Now qemu is a very important enabler and 
platform-reference-implementation for KVM to fall back to, but it's not 
the place to put crutial new code into, at least currently.


Ingo
  


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-05 Thread Rusty Russell
On Thu, 2007-04-05 at 13:36 +0200, Ingo Molnar wrote:
 prototyping new kernel APIs to implement user-space network drivers, on 
 a crufty codebase is not something that should be done lightly.

I think you overestimate my radicalism.  I was considering readv() and
writev() on the tap device.

Qemu's infrastructure may hurt kvm here, but lguest won't be able to use
that excuse.

 track issue for the *PIC question at 
 hand. PICs are not network devices, they are essential platform 
 components and almost an extended part of the CPU.)

Definitely, I'm only interested in stealing^H^H^Hsharing KVM devices.
The subject is now deeply misleading 8(

Cheers,
Rusty.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-04 Thread Rusty Russell
On Wed, 2007-04-04 at 23:21 +0200, Ingo Molnar wrote:
 * Anthony Liguori [EMAIL PROTECTED] wrote:
 
  But why is it a good thing to do PV drivers in the kernel?  You lose 
  flexibility and functionality to gain performance. [...]
 
 in Linux a kernel-space network driver can still be tunneled over 
 user-space code, and hence you can add arbitrary add-on functionality 
 (and thus have flexibility), without slowing down the common case (which 
 would be to tunnel the guest's network traffic into the firewall rules 
 of the kernel. No need to touch user-space for any of that).

You didn't quote Anthony's point about it's more about there not being
good enough userspace interfaces to do network IO.

It's easier to write a kernel-space network driver, but it's not
obviously the right thing to do until we can show that an efficient
packet-level userspace interface isn't possible.  I don't think that's
been done, and it would be interesting to try.

Cheers,
Rusty.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html