Hi, My interpretation, see below. I think it is consistent with what others already have expressed here.
What will be really interesting now is to see which performance we can get once we get rid of the "double" de-fragmentation of long messages we currently have. ///jon Stephens, Allan wrote: > Hi Florian: > > This is a very interesting result. I think you are correct in observing > that there is no reason for TIPC not to return as much data as it can > during a stream-based receive operation, and your suggested modification > looks good to me. However, I'd like to see if people agree with the > reasoning I've used to arrive at such as conclusion ... > > I believe that the code exists in its current form is because I > mis-interpreted the IEEE Std 1003.1 description of what MSG_WAITALL is > supposed to do. In my defence, the standard appears vague and/or > incomplete in its description of what this flag does. (See > http://www.opengroup.org/onlinepubs/009695399/toc.htm for the full > text.) > > The first statement the standard makes is: "On SOCK_STREAM sockets > [MSG_WAITALL] requests that the function block until the full amount of > data can be returned. Clear enough to me. There is no such thing as a "message" in SOCK_STREAM sockets, so "the full amount" must refer to the read buffer, which must be filled before the call returns. > The function may return the smaller amount of data > if the socket is a message-based socket, if a signal is caught, if the > connection is terminated, if MSG_PEEK was specified, or if an error is > pending for the socket." Note that this statement says nothing about > what the effect of MSG_WAITALL is supposed to be on SOCK_DGRAM, > SOCK_RDM, and SOCK_SEQPACKET sockets! > "message-based socket" means exactly SOCK_DGRAM, SOCK_RDM and SOCK_SEQPACKET. "Smaller amount of data" once again refers to the receive buffer. We return before that buffer is full, because the message we received was shorter (but still complete). And we never receive incomplete messages anyway in TIPC. When there is a conflict, i.e. when there isn't enough pending data to fill the receive buffer, MSG_PEEK seems to override MSG_WAITALL, even for SOCK_STREAM sockets, according to the wording. > Later on, the standard says: "If the MSG_WAITALL flag is not set, data > shall be returned only up to the end of the first message." I had > originally thought this statement applied to all socket types, which is > why TIPC's receive code currently bails out once it reaches the end of > the first [TIPC] message in the receive queue. However, upon reading > some of the surrounding text, I now think that the term "message" refers > to the atomic data units sent by SOCK_DGRAM, SOCK_RDM, and > SOCK_SEQPACKET, meaning the statement does NOT apply to SOCK_STREAM; > consequently, we can/should ignore the TIPC message boundaries in a > SOCK_STREAM receive even when MSG_WAITALL is not specified, just as > Florian suggests. > Agree completely. What happens to be a message at the TIPC level is completely irrelevant at the socket level of a stream socket. It is just a chunk of data that happened to be sent as one unit (in one send() call) from the sending socket. It is still only an anonymous part of a stream. > However, there are still a couple of questions we need to answer: > > 1) If we use Florian's modification, TIPC will still only return data > from the first message in the receive queue when both MSG_PEEK and > MSG_WAITALL are specified, rather than taking as much data as it can. > Is this what we want? Personally, I'm OK with this. I don't think it's > worthwhile to further complicate the recv_stream() routine by requiring > it to copy data from a message that isn't the first one in the receive > queue, and I doubt people will specify both of these flags in the same > receive anyway. I think it is ok. If the user has specified MSG_PEEK, he must be prepared to receive less than the full buffer. > Also, bailing out after the first message when MSG_PEEK > is specified appears to be compatible with the wording of the standard, > so I don't think people could complain we weren't doing the right thing. > > 2) We need to define what MSG_WAITALL means for a non-stream socket. > The final statement in the standard raises the possibility that you > could use MSG_WAITALL to receive multiple atomic data units in a single > receive (although the final one might be truncated), however it doesn't > actually require this (i.e. it only says what should happen if the flag > is not set, not what should happen if it is set). Personally, I think > using MSG_WAITALL this way conflicts with the atomic nature of > non-stream message sends, and (again) I doubt many users will want to do > this sort of thing anyway. TIPC's receive code currently ignores the > MSG_WAITALL flag for non-stream sockets, and as long as we point this > out in our documentation I don't think we will get too many complaints. > Here the wording clearly is ambiguous. Nothing is said on the effect of MSG_WAITALL if there is more than one pending message. I agree that it is counter-intuitive to allow more than one message in a receive buffer, when the socket is message oriented. I also think that Florian has a strong point: That UDP ignores the MSG_WAITALL flag in this case. This means that there will be no surprises for most users, and we have a de-facto standard to rely on when the formal standard is insufficient. > Other opinions? > > Regards, > Al > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Florian Westphal > Sent: Monday, June 25, 2007 5:48 AM > To: tipc-discussion@lists.sourceforge.net > Subject: [tipc-discussion] TIPC slower than TCP - really? > > Hello everyone, > > I've finally been able to do some measurements on GB Ethernet. > Initial tests showed that TCP (~90 MB/s) was significantly faster than > TIPC (SOCK_STREAM, ~60 MB/s) for more typical write sizes (ie. > writes in 4k, 8k chunks). But then I noticed that the receiving box was > under 100% CPU load. Upon investigating this, I noticed that all read() > calls to the socket only returned > 1476 bytes (1476 + 24 Byte header = 1500, the MTU used). When I looked > at the code i saw that TIPC would only ask for more data if one passes > the MSG_WAITALL flag to recv(). When I changed the benchmark tool on the > receiving end accordingly, load dropped by about 50% and throughput > increased to more than 90-95 MB/s, being a little faster than TCP in > some cases. > > Now, I think there is little reason to not check for more data if it is > there and suggest the following modification to TIPC: > if ((sz_copied < buf_len) /* didn't get all requested data */ > > - && (flags & MSG_WAITALL) /* ... and need to wait for more */ > + && (!skb_queue_empty(&sk->sk_receive_queue) /* ... and there > is more data to read ... */ > + || (flags & MSG_WAITALL)) /* ... or need to wait for more */ > && (!(flags & MSG_PEEK)) /* ... and aren't just peeking at > data */ > && (!err) /* ... and haven't reached a FIN */ > ) > > This should have a similar effect as passing MSG_WAITALL under most > high-traffic scenarios (I'll re-test this of course). > > I've put some of the results obtained so far here: > http://www.strlen.de/tipc/ > > Thoughts? > > Florian > > ------------------------------------------------------------------------ > - > This SF.net email is sponsored by DB2 Express Download DB2 Express C - > the FREE version of DB2 express and take control of your XML. No limits. > Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > tipc-discussion mailing list > tipc-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > tipc-discussion mailing list > tipc-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion