Hi Florian:

This is a very interesting result.  I think you are correct in observing
that there is no reason for TIPC not to return as much data as it can
during a stream-based receive operation, and your suggested modification
looks good to me.  However, I'd like to see if people agree with the
reasoning I've used to arrive at such as conclusion ...

I believe that the code exists in its current form is because I
mis-interpreted the IEEE Std 1003.1 description of what MSG_WAITALL is
supposed to do.  In my defence, the standard appears vague and/or
incomplete in its description of what this flag does.  (See
http://www.opengroup.org/onlinepubs/009695399/toc.htm for the full
text.)

The first statement the standard makes is: "On SOCK_STREAM sockets
[MSG_WAITALL] requests that the function block until the full amount of
data can be returned. The function may return the smaller amount of data
if the socket is a message-based socket, if a signal is caught, if the
connection is terminated, if MSG_PEEK was specified, or if an error is
pending for the socket."  Note that this statement says nothing about
what the effect of MSG_WAITALL is supposed to be on SOCK_DGRAM,
SOCK_RDM, and SOCK_SEQPACKET sockets!

Later on, the standard says: "If the MSG_WAITALL flag is not set, data
shall be returned only up to the end of the first message."  I had
originally thought this statement applied to all socket types, which is
why TIPC's receive code currently bails out once it reaches the end of
the first [TIPC] message in the receive queue.  However, upon reading
some of the surrounding text, I now think that the term "message" refers
to the atomic data units sent by SOCK_DGRAM, SOCK_RDM, and
SOCK_SEQPACKET, meaning the statement does NOT apply to SOCK_STREAM;
consequently, we can/should ignore the TIPC message boundaries in a
SOCK_STREAM receive even when MSG_WAITALL is not specified, just as
Florian suggests.

However, there are still a couple of questions we need to answer:

1) If we use Florian's modification, TIPC will still only return data
from the first message in the receive queue when both MSG_PEEK and
MSG_WAITALL are specified, rather than taking as much data as it can.
Is this what we want?  Personally, I'm OK with this.  I don't think it's
worthwhile to further complicate the recv_stream() routine by requiring
it to copy data from a message that isn't the first one in the receive
queue, and I doubt people will specify both of these flags in the same
receive anyway.  Also, bailing out after the first message when MSG_PEEK
is specified appears to be compatible with the wording of the standard,
so I don't think people could complain we weren't doing the right thing.

2) We need to define what MSG_WAITALL means for a non-stream socket.
The final statement in the standard raises the possibility that you
could use MSG_WAITALL to receive multiple atomic data units in a single
receive (although the final one might be truncated), however it doesn't
actually require this (i.e. it only says what should happen if the flag
is not set, not what should happen if it is set).  Personally, I think
using MSG_WAITALL this way conflicts with the atomic nature of
non-stream message sends, and (again) I doubt many users will want to do
this sort of thing anyway.  TIPC's receive code currently ignores the
MSG_WAITALL flag for non-stream sockets, and as long as we point this
out in our documentation I don't think we will get too many complaints.

Other opinions?

Regards,
Al 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Florian Westphal
Sent: Monday, June 25, 2007 5:48 AM
To: tipc-discussion@lists.sourceforge.net
Subject: [tipc-discussion] TIPC slower than TCP - really?

Hello everyone,

I've finally been able to do some measurements on GB Ethernet.
Initial tests showed that TCP (~90 MB/s) was significantly faster than
TIPC (SOCK_STREAM, ~60 MB/s) for more typical write sizes (ie.
writes in 4k, 8k chunks). But then I noticed that the receiving box was
under 100% CPU load. Upon investigating this, I noticed that all read()
calls to the socket only returned
1476 bytes (1476 + 24 Byte header = 1500, the MTU used). When I looked
at the code i saw that TIPC would only ask for more data if one passes
the MSG_WAITALL flag to recv(). When I changed the benchmark tool on the
receiving end accordingly, load dropped by about 50% and throughput
increased to more than 90-95 MB/s, being a little faster than TCP in
some cases.

Now, I think there is little reason to not check for more data if it is
there and suggest the following modification to TIPC:
        if ((sz_copied < buf_len)    /* didn't get all requested data */

-           && (flags & MSG_WAITALL) /* ... and need to wait for more */
+           && (!skb_queue_empty(&sk->sk_receive_queue) /* ... and there
is more data to read ... */
+           || (flags & MSG_WAITALL)) /* ... or need to wait for more */
            && (!(flags & MSG_PEEK)) /* ... and aren't just peeking at
data */
            && (!err)                /* ... and haven't reached a FIN */
            )

This should have a similar effect as passing MSG_WAITALL under most
high-traffic scenarios (I'll re-test this of course).

I've put some of the results obtained so far here:
http://www.strlen.de/tipc/

Thoughts?

Florian

------------------------------------------------------------------------
-
This SF.net email is sponsored by DB2 Express Download DB2 Express C -
the FREE version of DB2 express and take control of your XML. No limits.
Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to