Hi Tom!

On 14 Sep 2005, at 11:15, Tom Keiser wrote:

On 9/14/05, Roland Kuhn <[EMAIL PROTECTED]> wrote:

Dear experts!

Having just strace'd the fileserver (non-LWP, single-threaded) on
Linux, I noticed that the data are read from disk using readv in
packets of 1396bytes, 16kB per syscall. In the face of chunksize=1MB
from the client side this does not seem terribly efficient to me, but
of course I see the benefit of reading chunks which can readily be
transferred. If my interpretation is wrong or this is an artifact of
not using tviced, please say so (if possible with a short reference
to the source), otherwise it would be nice to know why the fileserver
cannot read(fd, buf, 1048576) as that would give at least one order
of magnitude better performance from the RAID and (journalled)
filesystem.



This is an artifact of the bad decisions that were made when
implemeting the rx jumbogram protocol many years ago.  Unfortunately,
jumbogram extension headers are interspersed between each data
continuation vector.  Thus, we need a separate system iovec for each
rx packet continuation buffer.  The end result is storedata_rxstyle
and fetchdata_rxstyle end up doing two vector io syscalls
(recvmsg+writev or readv+sendmsg) per ~16kb of data.  The jumbogram
protocol needs to be replaced.

Thanks for the explanation. Wouldn't it be possible to keep the network protocol (including the sendmsg) as it is, but still to read bigger chunks? The outgoing messages are constructed using iovecs anyway, so why not intersperse the extension headers at sendmsg time?

Ciao,
                    Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str. 85747 Garching
Telefon 089/289-12592; Telefax 089/289-12570
--
A mouse is a device used to point at
the xterm you want to type in.
Kim Alm on a.s.r.
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GS/CS/M/MU d-(++) s:+ a-> C+++ UL++++ P-(+) L+++ E(+) W+ !N K- w--- M + !V Y+
PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++>++++ h---- y+++
------END GEEK CODE BLOCK------


Attachment: PGP.sig
Description: This is a digitally signed message part

Reply via email to