Re: svn commit: r241703 - head/sys/kern

2012-10-18 Thread Andre Oppermann

On 18.10.2012 22:22, Andre Oppermann wrote:

Author: andre
Date: Thu Oct 18 20:22:17 2012
New Revision: 241703
URL: http://svn.freebsd.org/changeset/base/241703

Log:
   Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within
   zero copy specialized sosend_copyin() helper function.


Note that I'm not saying zero copy should be used or is even
more performant than the optimized m_uiotombuf() function.
Actually there may be some real bit-rot to zero copy sockets.
I've just started looking into it.

Note that zero copy isn't entirely true either as it marks
the page as COW.  So when the userspace application reuses
the memory it is copied anyway.  Also the overhead of doing
the VM magic and mbuf attachment of a VM page isn't free
either.  To really benefit from it an application has to be
written with COW in mind and not reuse the memory that was
just written to the socket.  For non-aware applications it
may be a net performance loss overall.

Also I don't like the name zero-copy-socket as it promises
too much for those not into socket, mbuf and VM magic.
I'd rather call it cow-socket or something like that as it
describes much better what is actually happening behind the
scenes.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r241703 - head/sys/kern

2012-10-18 Thread Navdeep Parhar

Hello Andre,

A couple of things if you're poking around in this area...

On 10/18/12 13:44, Andre Oppermann wrote:

On 18.10.2012 22:22, Andre Oppermann wrote:

Author: andre
Date: Thu Oct 18 20:22:17 2012
New Revision: 241703
URL: http://svn.freebsd.org/changeset/base/241703

Log:
   Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within
   zero copy specialized sosend_copyin() helper function.


Note that I'm not saying zero copy should be used or is even
more performant than the optimized m_uiotombuf() function.


Some time back I played around with a modified m_uiotombuf() that was 
aware of the mbuf_jumbo_16K zone (instead of limiting itself to 4K 
mbufs).  In some cases it performed better than the stock m_uiotombuf. 
I suspect this change would also help drivers that are unable to deal 
with long gather lists when doing TSO.  But my testing wasn't rigorous 
enough (I was merely playing around), and the drivers I work with can 
mostly cope with whatever the kernel throws at them.  So nothing came 
out of it.



Actually there may be some real bit-rot to zero copy sockets.
I've just started looking into it.


I have a cxgbe(4)-specific true zero-copy implementation.  The rx side 
is in head, the tx side works only for blocking sockets (the easy 
case) and I haven't checked it in anywhere.  Take a look at 
t4_soreceive_ddp() and m_mbuftouio_ddp() in sys/dev/cxgbe/t4_ddp.c. 
They're mostly identical to the kernel routines they're based on (read: 
copy-pasted from).  You may find them of some interest if you're working 
in this area and are thinking of adding zero-copy hooks to the socket 
implementation.


Regards,
Navdeep
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r241703 - head/sys/kern

2012-10-18 Thread Andre Oppermann

On 18.10.2012 23:06, Navdeep Parhar wrote:

Hello Andre,

A couple of things if you're poking around in this area...


I didn't really mean to dive too deep into COW socket writes.


On 10/18/12 13:44, Andre Oppermann wrote:

On 18.10.2012 22:22, Andre Oppermann wrote:

Author: andre
Date: Thu Oct 18 20:22:17 2012
New Revision: 241703
URL: http://svn.freebsd.org/changeset/base/241703

Log:
   Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within
   zero copy specialized sosend_copyin() helper function.


Note that I'm not saying zero copy should be used or is even
more performant than the optimized m_uiotombuf() function.


Some time back I played around with a modified m_uiotombuf() that was aware of 
the mbuf_jumbo_16K
zone (instead of limiting itself to 4K mbufs).  In some cases it performed 
better than the stock
m_uiotombuf. I suspect this change would also help drivers that are unable to 
deal with long gather
lists when doing TSO.  But my testing wasn't rigorous enough (I was merely 
playing around), and the
drivers I work with can mostly cope with whatever the kernel throws at them.  
So nothing came out of
it.


The jumbo 16K zone is special in that the memory is actually allocated
by contigmalloc to get physically contiguous RAM. After some uptime and
heavy use this may become difficult to obtain. Also contigmalloc has to
hunt for it which may cause quite a bit of overhead.

4K mbufs, actually PAGE_SIZE mbufs, are very easily obtainable and fast.

To be honest I'm not really happy about  PAGE_SIZE mbufs.  They were
introduced at a time when DMA engines were more limited and couldn't
do S/G DMA on receive.

So performance with  PAGE_SIZE mbufs may be a little bit better but
when you approach memory fragmentation after some heavy system usage
it sucks up to the point where it fails most of the time.  PAGE_SIZE
mbufs always perform the same with very little deviation.

In an ideal scenario I'd like to see 9K and 16K mbufs go away and
have the RX DMA ring stitch a packet up out of PAGE_SIZE mbufs.


Actually there may be some real bit-rot to zero copy sockets.
I've just started looking into it.


I have a cxgbe(4)-specific true zero-copy implementation.  The rx side is in 
head, the tx side works
only for blocking sockets (the easy case) and I haven't checked it in 
anywhere.  Take a look at
t4_soreceive_ddp() and m_mbuftouio_ddp() in sys/dev/cxgbe/t4_ddp.c. They're 
mostly identical to the
kernel routines they're based on (read: copy-pasted from).  You may find them 
of some interest if
you're working in this area and are thinking of adding zero-copy hooks to the 
socket implementation.


I'm going to have a look at it think about how to generically support
DDP either way with our socket buffer layout.

Actually that may end up as the golden path. Do away with  PAGE_SIZE
mbufs, sink page flipping COW (incorrectly named ZERO_COPY) and use
DDP for those who need utmost performance (as I said only COW aware
applications gain a bit of speed, unaware may end up much worse).

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r241703 - head/sys/kern

2012-10-18 Thread Navdeep Parhar

On 10/18/12 15:03, Andre Oppermann wrote:

On 18.10.2012 23:06, Navdeep Parhar wrote:

Hello Andre,

A couple of things if you're poking around in this area...


I didn't really mean to dive too deep into COW socket writes.


On 10/18/12 13:44, Andre Oppermann wrote:

On 18.10.2012 22:22, Andre Oppermann wrote:

Author: andre
Date: Thu Oct 18 20:22:17 2012
New Revision: 241703
URL: http://svn.freebsd.org/changeset/base/241703

Log:
   Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within
   zero copy specialized sosend_copyin() helper function.


Note that I'm not saying zero copy should be used or is even
more performant than the optimized m_uiotombuf() function.


Some time back I played around with a modified m_uiotombuf() that was
aware of the mbuf_jumbo_16K
zone (instead of limiting itself to 4K mbufs).  In some cases it
performed better than the stock
m_uiotombuf. I suspect this change would also help drivers that are
unable to deal with long gather
lists when doing TSO.  But my testing wasn't rigorous enough (I was
merely playing around), and the
drivers I work with can mostly cope with whatever the kernel throws at
them.  So nothing came out of
it.


The jumbo 16K zone is special in that the memory is actually allocated
by contigmalloc to get physically contiguous RAM. After some uptime and
heavy use this may become difficult to obtain. Also contigmalloc has to
hunt for it which may cause quite a bit of overhead.

4K mbufs, actually PAGE_SIZE mbufs, are very easily obtainable and fast.

To be honest I'm not really happy about  PAGE_SIZE mbufs.  They were
introduced at a time when DMA engines were more limited and couldn't
do S/G DMA on receive.

So performance with  PAGE_SIZE mbufs may be a little bit better but
when you approach memory fragmentation after some heavy system usage
it sucks up to the point where it fails most of the time.  PAGE_SIZE
mbufs always perform the same with very little deviation.

In an ideal scenario I'd like to see 9K and 16K mbufs go away and
have the RX DMA ring stitch a packet up out of PAGE_SIZE mbufs.


Sure, when the backend allocator gets called it's easier for it to find 
a single page than multiple contiguous pages.  But if the system's 
workload keeps the 16K zone warm then the zone allocator doesn't have to 
reach out to the backend allocator all the time.  The large clusters do 
have their advantages.  I guess cluster consumers that prefer 16K but 
are willing to fall back to PAGE_SIZE when the larger zone is depleted 
will do well no matter what the memory situation is.


Regards,
Navdeep




Actually there may be some real bit-rot to zero copy sockets.
I've just started looking into it.


I have a cxgbe(4)-specific true zero-copy implementation.  The rx side
is in head, the tx side works
only for blocking sockets (the easy case) and I haven't checked it
in anywhere.  Take a look at
t4_soreceive_ddp() and m_mbuftouio_ddp() in sys/dev/cxgbe/t4_ddp.c.
They're mostly identical to the
kernel routines they're based on (read: copy-pasted from).  You may
find them of some interest if
you're working in this area and are thinking of adding zero-copy hooks
to the socket implementation.


I'm going to have a look at it think about how to generically support
DDP either way with our socket buffer layout.

Actually that may end up as the golden path. Do away with  PAGE_SIZE
mbufs, sink page flipping COW (incorrectly named ZERO_COPY) and use
DDP for those who need utmost performance (as I said only COW aware
applications gain a bit of speed, unaware may end up much worse).



___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org