On Thu, 9 Nov 2006, Ken Hornstein wrote:

When we're at the subject of already tuned solutions: What
possibilities are there to combine this with the use of sendfile()?
Experiences from apache and other projects show that there are very
noticeable effects even when doing sendfile() on small chunks compared
to the classic read/write-approach.

Well ... I am certainly willing to investigate it.  The problems I see are
threefold:

- You really want to make sure the header and bulk data end up in one
 TCP frame.  If you utilize sendfile(), it isn't possible to guarantee
 that because you'll have to do two seperate operations: one write() to
 do the header data, then the sendfile() call to move the bulk data (right
 now writev() is used so header data and bulk data get coalesced into one
 TCP frame).  If you have a series of small TCP frames interspersed with
 large frames, performance will go into the crapper.  The way reads are
 done in RxTCP, it could work ... but I see from at least the Linux
 sendfile() manpage that the reader cannot be a socket, so that takes that
 off the table.  Apache has a much simpler problem; they're not trying
 to have a virtualized multichannel stream protocol over TCP.

Yeah, sendfile() focuses on the sending-issue.

 I see that Solaris has sendfilev(), and one of the items it can take is
 a userspace buffer, so that could address the sending issue.  But
 it's not clear to me that the Solaris sendfilev() avoids userspace
 copies, since it's a library function and not a system call.

AIX sendfile can also do this. In the end I guess you'll want to do some sort of portability layer, or let the #ifdefs eat your code.

- If you want to do a checksum of the bulk data, you need to read the
 bulk data it into memory ... and you lose the benefit of sendfile().

Isn't the TCP checksumming enough? Anyhow, encryption would also have this effect.

In any case, I was just curious about it being possible at all. Modern servers shouldn't have any problems delivering gige-speed without sendfile given sane code, it will be very interesting to see what happens when 10gige gets common though. A wild guess is that we'll be limited by disk speed.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se     |    [EMAIL PROTECTED]
---------------------------------------------------------------------------
 "If the Apocalypse comes, beep me"- Buffy
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to