Hi Florian: This is a very interesting result. I think you are correct in observing that there is no reason for TIPC not to return as much data as it can during a stream-based receive operation, and your suggested modification looks good to me. However, I'd like to see if people agree with the reasoning I've used to arrive at such as conclusion ...
I believe that the code exists in its current form is because I mis-interpreted the IEEE Std 1003.1 description of what MSG_WAITALL is supposed to do. In my defence, the standard appears vague and/or incomplete in its description of what this flag does. (See http://www.opengroup.org/onlinepubs/009695399/toc.htm for the full text.) The first statement the standard makes is: "On SOCK_STREAM sockets [MSG_WAITALL] requests that the function block until the full amount of data can be returned. The function may return the smaller amount of data if the socket is a message-based socket, if a signal is caught, if the connection is terminated, if MSG_PEEK was specified, or if an error is pending for the socket." Note that this statement says nothing about what the effect of MSG_WAITALL is supposed to be on SOCK_DGRAM, SOCK_RDM, and SOCK_SEQPACKET sockets! Later on, the standard says: "If the MSG_WAITALL flag is not set, data shall be returned only up to the end of the first message." I had originally thought this statement applied to all socket types, which is why TIPC's receive code currently bails out once it reaches the end of the first [TIPC] message in the receive queue. However, upon reading some of the surrounding text, I now think that the term "message" refers to the atomic data units sent by SOCK_DGRAM, SOCK_RDM, and SOCK_SEQPACKET, meaning the statement does NOT apply to SOCK_STREAM; consequently, we can/should ignore the TIPC message boundaries in a SOCK_STREAM receive even when MSG_WAITALL is not specified, just as Florian suggests. However, there are still a couple of questions we need to answer: 1) If we use Florian's modification, TIPC will still only return data from the first message in the receive queue when both MSG_PEEK and MSG_WAITALL are specified, rather than taking as much data as it can. Is this what we want? Personally, I'm OK with this. I don't think it's worthwhile to further complicate the recv_stream() routine by requiring it to copy data from a message that isn't the first one in the receive queue, and I doubt people will specify both of these flags in the same receive anyway. Also, bailing out after the first message when MSG_PEEK is specified appears to be compatible with the wording of the standard, so I don't think people could complain we weren't doing the right thing. 2) We need to define what MSG_WAITALL means for a non-stream socket. The final statement in the standard raises the possibility that you could use MSG_WAITALL to receive multiple atomic data units in a single receive (although the final one might be truncated), however it doesn't actually require this (i.e. it only says what should happen if the flag is not set, not what should happen if it is set). Personally, I think using MSG_WAITALL this way conflicts with the atomic nature of non-stream message sends, and (again) I doubt many users will want to do this sort of thing anyway. TIPC's receive code currently ignores the MSG_WAITALL flag for non-stream sockets, and as long as we point this out in our documentation I don't think we will get too many complaints. Other opinions? Regards, Al -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Florian Westphal Sent: Monday, June 25, 2007 5:48 AM To: tipc-discussion@lists.sourceforge.net Subject: [tipc-discussion] TIPC slower than TCP - really? Hello everyone, I've finally been able to do some measurements on GB Ethernet. Initial tests showed that TCP (~90 MB/s) was significantly faster than TIPC (SOCK_STREAM, ~60 MB/s) for more typical write sizes (ie. writes in 4k, 8k chunks). But then I noticed that the receiving box was under 100% CPU load. Upon investigating this, I noticed that all read() calls to the socket only returned 1476 bytes (1476 + 24 Byte header = 1500, the MTU used). When I looked at the code i saw that TIPC would only ask for more data if one passes the MSG_WAITALL flag to recv(). When I changed the benchmark tool on the receiving end accordingly, load dropped by about 50% and throughput increased to more than 90-95 MB/s, being a little faster than TCP in some cases. Now, I think there is little reason to not check for more data if it is there and suggest the following modification to TIPC: if ((sz_copied < buf_len) /* didn't get all requested data */ - && (flags & MSG_WAITALL) /* ... and need to wait for more */ + && (!skb_queue_empty(&sk->sk_receive_queue) /* ... and there is more data to read ... */ + || (flags & MSG_WAITALL)) /* ... or need to wait for more */ && (!(flags & MSG_PEEK)) /* ... and aren't just peeking at data */ && (!err) /* ... and haven't reached a FIN */ ) This should have a similar effect as passing MSG_WAITALL under most high-traffic scenarios (I'll re-test this of course). I've put some of the results obtained so far here: http://www.strlen.de/tipc/ Thoughts? Florian ------------------------------------------------------------------------ - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion