Re: Userspace TCP/IP virtual router example

Antti Kantee Sun, 16 Mar 2014 18:34:29 -0700

On 16/03/14 22:07, Martin Unzner wrote:
> It is great that if_virt gets a speed up, it was not excessively fast
> the last time I tried it.


There are two sides to if_virt.  Arguably the interface is rather poorly 
named, but then again I didn't see these kinds of uses in 2008 when I 
wrote it as a way to shovel packets via /dev/tap.  Using tap is 
understandably slow.

These days if_virt serves more as a way to attach a hypercall 
implementation as an interface.  That's what has been my target for 
recent improvements, and therefore transitively improving the 
performance of e.g. dpdk-rumptcpip and netmap-rumptcpip (which both 
attach via if_virt).

> Also, thank you for forcing me to look up jumbo frames and TSO again, I
> had them completely confused. Neither was enabled, though, which is why
> I suspect a netmap bug has caused the trouble.

Not sure how missing TSO would cause bugs -- the stack should just do 
segmentation to (path)mtu-sized chunks itself in that case -- but of 
course trying to process jumbo frames without telling the stack that the 
interface layer is capable of jumbo frames will lead to discrepancies.

> I have another question: Is it OK to use the rump_sys_ methods, or would
> it be faster to do fork and schedule manually? You write that the
> rumpns_sendto method calls curlwp itself, so it should actually not
> matter that I simply replaced the normal system call sento with
> rump_sys_sendto, should it? In you dissertation, you mention that
> scheduling manually is only necessary if you are missing the wrapper, or
> have I missed something there?

I'm not sure I understand the question, but not deterred by that I'll 
answer anyway ;)

There's two things you need for a thread to run correctly in a rump 
kernel: curlwp and curcpu (it's pretty natural: you need to know what 
you're running and where you're running it).  If the host thread you use 
to call the schedule operation of a rump kernel has curlwp set, 
scheduling is a matter of picking curcpu.  If there is no curlwp set 
when rump_schedule() is called, the scheduling routine allocates a 
temporary one for the duration of call.  The purpose of this 
curlwp-creating dynamicity is to make it as simple as possible to call a 
rump kernel from any host thread contex.  The prior is optimized to be 
fast, the latter is not.

The fast path is two atomic memory operations per a schedule+unschedule 
pair, and locks and releases curcpu.  So, theoretically, assuming you 
have a dedicated core and would want to call sendto() a billion times, 
it would be faster to bypass rump_sys_sendto() interface by calling 
schedule manually, looping a billion times calling rumpns_sendto() and 
unscheduling.  However, note that the rump kernel will still unschedule 
internally whenever it needs to block, and messing with that behaviour 
is seriously asking for deadlocks, so I'm not sure I'd start to optimize 
anything from that angle.

So, yes, rump_sys_sendto() is designed to be a 
drop-in-replacement-with-no-strings-attached for sendto() -- at least 
assuming you have the corresponding socket opened ;).  Calling 
rump_sys_sendto() will be significantly faster if you have curlwp set 
(what I call a "bound" thread), but it will work correctly either way.

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
rumpkernel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rumpkernel-users

Re: Userspace TCP/IP virtual router example

Reply via email to