Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc
On Sat, Mar 4, 2017 at 12:51 PM, Xin Longwrote: > On Sat, Mar 4, 2017 at 1:57 AM, Xin Long wrote: >> On Sat, Mar 4, 2017 at 12:31 AM, David Laight >> wrote: >>> From: Xin Long Sent: 03 March 2017 15:43 >>> ... > It is much more important to get MSG_MORE working 'properly' for SCTP > than for TCP. For TCP an application can always use a long send. >>> "long send" ?, you mean bigger data, or keeping sending? I didn't get the difference between SCTP and TCP, they are similar when sending data. >>> >>> With tcp an application can always replace two send()/write() >>> calls with a single call to writev(). >>> For sctp two send() calls must be made in order to generate two >>> data chunks. >>> So it is much easier for a tcp application to generate 'full' >>> ethernet packets. >> okay, it should not be a important reason, and sctp might also support >> it one day. :-) >> >>> > > ... >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct >> msghdr *msg, size_t msg_len) >>* breaks. >>*/ >> err = sctp_primitive_SEND(net, asoc, datamsg); >> + asoc->force_delay = 0; >> /* Did the lower layer accept the chunk? */ >> if (err) { >> sctp_datamsg_free(datamsg); > > I don't think this is right - or needed. > You only get to the above if some test has decided to send data chunks. > So it just means that the NEXT time someone tries to send data all the > queued data gets sent. >>> the NEXT time someone tries to send data with "MSG_MORE clear", yes, but with "MSG_MORE set", it will still delay. > I'm guessing that the whole thing gets called in a loop (definitely > needed > for very long data chunks, or after the window is opened). >>> yes, if users keep sending data chunks with MSG_MORE set, no data with "MSG_MORE clear" gap. > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE > set it would expect to see a lot of full ethernet frames be sent. >>> right. >>> > With the above a frame will be sent (containing all but 1 chunk) when the > amount of queued data becomes too large for an ethernet frame, and > immediately > followed by a second ethernet frame with 1 chunk in it. >>> "followed by a second ethernet frame with 1 chunk in it.", I think this's what you're really worried about, right ? But sctp flush data queue NOT like what you think, it's not keep traversing the queue untill the queue is empty. once a packet with chunks in one ethernet frame is sent, sctp_outq_flush will return. it will pack chunks and send the next packet again untill some other 'event' triggers it, like retransmission or data received from peer. I don't think this is a problem. >>> >>> Erm that can't work. >>> I think there is code to convert a large user send into multiple data >>> chunks. >>> So if the user does a 4k (say) send several large chunks get queued. >>> These would need to all be sent at once. >>> >>> Similarly when the transmit window is received. >>> So somewhere there ought to be a loop that will send more than one packet. >> As far as I can see, no loop like you said, mostly, the incoming >> chunk (like SACK) from peer will trigger the next flush out. >> I can try to trace the path in kernel for sure tomorrow. > okay, you are right, I missed sctp_packet_transmit_chunk also call > sctp_packet_transmit to send the current packet. :) > > But if we keep sending data with "MSG_MORE", after one ethernet frame > is sent, "followed by a second ethernet frame with 1 chunk in it" will NOT > happen, as in this loop the asoc's msg_more flag is still set, and this flush > is called by sctp_sendmsg(the function msg_more should care more). > > did I miss something ? Hi, David Laight As now sctp MSG_MORE is broken with msg->force_day, I hope we can use this patch before you get a better one to fix it, so that users would not be confused with the strange behaviour. what do you say ? > >> >>> > Now it might be that the flag needs clearing when retransmissions are > queued. > OTOH they might get sent for other reasons. >>> Before we really overthought about MSG_MORE, no need to care about retransmissions, define MSG_MORE, in my opinion, it works more for *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them. >>> >>> Eh? and when nagle disabled. >>> If 'inflight' isn't 0 then most paths don't flush data. >> I knew, but MSG_MORE is different thing, it should only try to work for the >> current and following data. >> >>> We cannot let asoc's more_more flag work as global, it will block elsewhere sending data chunks, not only sctp_sendmsg. >>> >>> If the connection was flow
Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc
On Sat, Mar 4, 2017 at 1:57 AM, Xin Longwrote: > On Sat, Mar 4, 2017 at 12:31 AM, David Laight wrote: >> From: Xin Long >>> Sent: 03 March 2017 15:43 >> ... >>> > It is much more important to get MSG_MORE working 'properly' for SCTP >>> > than for TCP. For TCP an application can always use a long send. >> >>> "long send" ?, you mean bigger data, or keeping sending? >>> I didn't get the difference between SCTP and TCP, they >>> are similar when sending data. >> >> With tcp an application can always replace two send()/write() >> calls with a single call to writev(). >> For sctp two send() calls must be made in order to generate two >> data chunks. >> So it is much easier for a tcp application to generate 'full' >> ethernet packets. > okay, it should not be a important reason, and sctp might also support > it one day. :-) > >> >>> >>> > >>> > ... >>> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct >>> >> msghdr *msg, size_t msg_len) >>> >>* breaks. >>> >>*/ >>> >> err = sctp_primitive_SEND(net, asoc, datamsg); >>> >> + asoc->force_delay = 0; >>> >> /* Did the lower layer accept the chunk? */ >>> >> if (err) { >>> >> sctp_datamsg_free(datamsg); >>> > >>> > I don't think this is right - or needed. >>> > You only get to the above if some test has decided to send data chunks. >>> > So it just means that the NEXT time someone tries to send data all the >>> > queued data gets sent. >> >>> the NEXT time someone tries to send data with "MSG_MORE clear", >>> yes, but with "MSG_MORE set", it will still delay. >>> >>> > I'm guessing that the whole thing gets called in a loop (definitely needed >>> > for very long data chunks, or after the window is opened). >> >>> yes, if users keep sending data chunks with MSG_MORE set, no >>> data with "MSG_MORE clear" gap. >>> >>> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE >>> > set it would expect to see a lot of full ethernet frames be sent. >> >>> right. >> >>> > With the above a frame will be sent (containing all but 1 chunk) when the >>> > amount of queued data becomes too large for an ethernet frame, and >>> > immediately >>> > followed by a second ethernet frame with 1 chunk in it. >> >>> "followed by a second ethernet frame with 1 chunk in it.", I think this's >>> what you're really worried about, right ? >>> But sctp flush data queue NOT like what you think, it's not keep traversing >>> the queue untill the queue is empty. >>> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush >>> will return. it will pack chunks and send the next packet again untill some >>> other 'event' triggers it, like retransmission or data received from peer. >>> I don't think this is a problem. >> >> Erm that can't work. >> I think there is code to convert a large user send into multiple data chunks. >> So if the user does a 4k (say) send several large chunks get queued. >> These would need to all be sent at once. >> >> Similarly when the transmit window is received. >> So somewhere there ought to be a loop that will send more than one packet. > As far as I can see, no loop like you said, mostly, the incoming > chunk (like SACK) from peer will trigger the next flush out. > I can try to trace the path in kernel for sure tomorrow. okay, you are right, I missed sctp_packet_transmit_chunk also call sctp_packet_transmit to send the current packet. :) But if we keep sending data with "MSG_MORE", after one ethernet frame is sent, "followed by a second ethernet frame with 1 chunk in it" will NOT happen, as in this loop the asoc's msg_more flag is still set, and this flush is called by sctp_sendmsg(the function msg_more should care more). did I miss something ? > >> >>> > Now it might be that the flag needs clearing when retransmissions are >>> > queued. >>> > OTOH they might get sent for other reasons. >> >>> Before we really overthought about MSG_MORE, no need to care about >>> retransmissions, define MSG_MORE, in my opinion, it works more for >>> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing >>> them. >> >> Eh? and when nagle disabled. >> If 'inflight' isn't 0 then most paths don't flush data. > I knew, but MSG_MORE is different thing, it should only try to work for the > current and following data. > >> >>> We cannot let asoc's more_more flag work as global, it will block elsewhere >>> sending data chunks, not only sctp_sendmsg. >> >> If the connection was flow controlled off, and more 'credit' arrives and >> there >> is less that an ethernet frame's worth of data pending, and the last send >> said 'MSG_MORE' there is no point sending anything until the application >> does a send with MSG_MORE clear. > got you, I think you have different understanding about MSG_MORE > while this patch just try to make it work like TCP's msg_more, but what > you mentioned here is the same as TCP thing,
Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc
On Sat, Mar 4, 2017 at 12:31 AM, David Laightwrote: > From: Xin Long >> Sent: 03 March 2017 15:43 > ... >> > It is much more important to get MSG_MORE working 'properly' for SCTP >> > than for TCP. For TCP an application can always use a long send. > >> "long send" ?, you mean bigger data, or keeping sending? >> I didn't get the difference between SCTP and TCP, they >> are similar when sending data. > > With tcp an application can always replace two send()/write() > calls with a single call to writev(). > For sctp two send() calls must be made in order to generate two > data chunks. > So it is much easier for a tcp application to generate 'full' > ethernet packets. okay, it should not be a important reason, and sctp might also support it one day. :-) > >> >> > >> > ... >> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct >> >> msghdr *msg, size_t msg_len) >> >>* breaks. >> >>*/ >> >> err = sctp_primitive_SEND(net, asoc, datamsg); >> >> + asoc->force_delay = 0; >> >> /* Did the lower layer accept the chunk? */ >> >> if (err) { >> >> sctp_datamsg_free(datamsg); >> > >> > I don't think this is right - or needed. >> > You only get to the above if some test has decided to send data chunks. >> > So it just means that the NEXT time someone tries to send data all the >> > queued data gets sent. > >> the NEXT time someone tries to send data with "MSG_MORE clear", >> yes, but with "MSG_MORE set", it will still delay. >> >> > I'm guessing that the whole thing gets called in a loop (definitely needed >> > for very long data chunks, or after the window is opened). > >> yes, if users keep sending data chunks with MSG_MORE set, no >> data with "MSG_MORE clear" gap. >> >> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE >> > set it would expect to see a lot of full ethernet frames be sent. > >> right. > >> > With the above a frame will be sent (containing all but 1 chunk) when the >> > amount of queued data becomes too large for an ethernet frame, and >> > immediately >> > followed by a second ethernet frame with 1 chunk in it. > >> "followed by a second ethernet frame with 1 chunk in it.", I think this's >> what you're really worried about, right ? >> But sctp flush data queue NOT like what you think, it's not keep traversing >> the queue untill the queue is empty. >> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush >> will return. it will pack chunks and send the next packet again untill some >> other 'event' triggers it, like retransmission or data received from peer. >> I don't think this is a problem. > > Erm that can't work. > I think there is code to convert a large user send into multiple data chunks. > So if the user does a 4k (say) send several large chunks get queued. > These would need to all be sent at once. > > Similarly when the transmit window is received. > So somewhere there ought to be a loop that will send more than one packet. As far as I can see, no loop like you said, mostly, the incoming chunk (like SACK) from peer will trigger the next flush out. I can try to trace the path in kernel for sure tomorrow. > >> > Now it might be that the flag needs clearing when retransmissions are >> > queued. >> > OTOH they might get sent for other reasons. > >> Before we really overthought about MSG_MORE, no need to care about >> retransmissions, define MSG_MORE, in my opinion, it works more for >> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them. > > Eh? and when nagle disabled. > If 'inflight' isn't 0 then most paths don't flush data. I knew, but MSG_MORE is different thing, it should only try to work for the current and following data. > >> We cannot let asoc's more_more flag work as global, it will block elsewhere >> sending data chunks, not only sctp_sendmsg. > > If the connection was flow controlled off, and more 'credit' arrives and there > is less that an ethernet frame's worth of data pending, and the last send > said 'MSG_MORE' there is no point sending anything until the application > does a send with MSG_MORE clear. got you, I think you have different understanding about MSG_MORE while this patch just try to make it work like TCP's msg_more, but what you mentioned here is the same as TCP thing, seems you also want to improve TCP's MSG_MORE :-) > > I'm not sure what causes a retransmission to send data, I suspect that > 'inflight' > can easily be non-zero at that time. The thing that causes a retransmission to send data is that both tx and rtx send data through sctp_outq_flush, in which it will try to send rtx queue, then rx queue. yes, once a packet is sent out and not yet be SACKed, "inflight" will not be zero, so when retransmiting, "inflight" must be non-zero. > Likely something causes a packet be generated - which then collects the data > chunks. > > David > >
RE: [PATCH net] sctp: change to save MSG_MORE flag into assoc
From: Xin Long > Sent: 03 March 2017 15:43 ... > > It is much more important to get MSG_MORE working 'properly' for SCTP > > than for TCP. For TCP an application can always use a long send. > "long send" ?, you mean bigger data, or keeping sending? > I didn't get the difference between SCTP and TCP, they > are similar when sending data. With tcp an application can always replace two send()/write() calls with a single call to writev(). For sctp two send() calls must be made in order to generate two data chunks. So it is much easier for a tcp application to generate 'full' ethernet packets. > > > > > ... > >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct > >> msghdr *msg, size_t msg_len) > >>* breaks. > >>*/ > >> err = sctp_primitive_SEND(net, asoc, datamsg); > >> + asoc->force_delay = 0; > >> /* Did the lower layer accept the chunk? */ > >> if (err) { > >> sctp_datamsg_free(datamsg); > > > > I don't think this is right - or needed. > > You only get to the above if some test has decided to send data chunks. > > So it just means that the NEXT time someone tries to send data all the > > queued data gets sent. > the NEXT time someone tries to send data with "MSG_MORE clear", > yes, but with "MSG_MORE set", it will still delay. > > > I'm guessing that the whole thing gets called in a loop (definitely needed > > for very long data chunks, or after the window is opened). > yes, if users keep sending data chunks with MSG_MORE set, no > data with "MSG_MORE clear" gap. > > > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE > > set it would expect to see a lot of full ethernet frames be sent. > right. > > With the above a frame will be sent (containing all but 1 chunk) when the > > amount of queued data becomes too large for an ethernet frame, and > > immediately > > followed by a second ethernet frame with 1 chunk in it. > "followed by a second ethernet frame with 1 chunk in it.", I think this's > what you're really worried about, right ? > But sctp flush data queue NOT like what you think, it's not keep traversing > the queue untill the queue is empty. > once a packet with chunks in one ethernet frame is sent, sctp_outq_flush > will return. it will pack chunks and send the next packet again untill some > other 'event' triggers it, like retransmission or data received from peer. > I don't think this is a problem. Erm that can't work. I think there is code to convert a large user send into multiple data chunks. So if the user does a 4k (say) send several large chunks get queued. These would need to all be sent at once. Similarly when the transmit window is received. So somewhere there ought to be a loop that will send more than one packet. > > Now it might be that the flag needs clearing when retransmissions are > > queued. > > OTOH they might get sent for other reasons. > Before we really overthought about MSG_MORE, no need to care about > retransmissions, define MSG_MORE, in my opinion, it works more for > *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them. Eh? and when nagle disabled. If 'inflight' isn't 0 then most paths don't flush data. > We cannot let asoc's more_more flag work as global, it will block elsewhere > sending data chunks, not only sctp_sendmsg. If the connection was flow controlled off, and more 'credit' arrives and there is less that an ethernet frame's worth of data pending, and the last send said 'MSG_MORE' there is no point sending anything until the application does a send with MSG_MORE clear. I'm not sure what causes a retransmission to send data, I suspect that 'inflight' can easily be non-zero at that time. Likely something causes a packet be generated - which then collects the data chunks. David
Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc
On Fri, Mar 3, 2017 at 8:49 PM, David Laightwrote: > From: Xin Long >> Sent: 03 March 2017 06:24 >> David Laight noticed the support for MSG_MORE with datamsg->force_day >> didn't really work as we expected, as the first msg with MSG_MORE set >> would always block the following chunks' dequeuing. >> >> This Patch is to rewrite it by saving the MSG_MORE flag into assoc as >> Divid Laight suggested. >^ typo ah, sorry. :P > >> asoc->force_delay is used to save MSG_MORE flag before a msg is sent. >> Once this msg is queued, asoc->force_delay is set back to 0, so that >> it will not affect other places flushing out queue. > > That doesn't seem right nor make sense. > >> asoc->force_delay works as a 'local param' here as the msg sending is >> under protection of sock lock. It would make sctp's MSG_MORE work as >> tcp's. > > It is much more important to get MSG_MORE working 'properly' for SCTP > than for TCP. For TCP an application can always use a long send. "long send" ?, you mean bigger data, or keeping sending? I didn't get the difference between SCTP and TCP, they are similar when sending data. > > ... >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr >> *msg, size_t msg_len) >>* breaks. >>*/ >> err = sctp_primitive_SEND(net, asoc, datamsg); >> + asoc->force_delay = 0; >> /* Did the lower layer accept the chunk? */ >> if (err) { >> sctp_datamsg_free(datamsg); > > I don't think this is right - or needed. > You only get to the above if some test has decided to send data chunks. > So it just means that the NEXT time someone tries to send data all the > queued data gets sent. the NEXT time someone tries to send data with "MSG_MORE clear", yes, but with "MSG_MORE set", it will still delay. > I'm guessing that the whole thing gets called in a loop (definitely needed > for very long data chunks, or after the window is opened). yes, if users keep sending data chunks with MSG_MORE set, no data with "MSG_MORE clear" gap. > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE > set it would expect to see a lot of full ethernet frames be sent. right. > With the above a frame will be sent (containing all but 1 chunk) when the > amount of queued data becomes too large for an ethernet frame, and immediately > followed by a second ethernet frame with 1 chunk in it. "followed by a second ethernet frame with 1 chunk in it.", I think this's what you're really worried about, right ? But sctp flush data queue NOT like what you think, it's not keep traversing the queue untill the queue is empty. once a packet with chunks in one ethernet frame is sent, sctp_outq_flush will return. it will pack chunks and send the next packet again untill some other 'event' triggers it, like retransmission or data received from peer. I don't think this is a problem. > > Now it might be that the flag needs clearing when retransmissions are queued. > OTOH they might get sent for other reasons. Before we really overthought about MSG_MORE, no need to care about retransmissions, define MSG_MORE, in my opinion, it works more for *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them. We cannot let asoc's more_more flag work as global, it will block elsewhere sending data chunks, not only sctp_sendmsg. Thanks > > David > >
RE: [PATCH net] sctp: change to save MSG_MORE flag into assoc
From: Xin Long > Sent: 03 March 2017 06:24 > David Laight noticed the support for MSG_MORE with datamsg->force_day > didn't really work as we expected, as the first msg with MSG_MORE set > would always block the following chunks' dequeuing. > > This Patch is to rewrite it by saving the MSG_MORE flag into assoc as > Divid Laight suggested. ^ typo > asoc->force_delay is used to save MSG_MORE flag before a msg is sent. > Once this msg is queued, asoc->force_delay is set back to 0, so that > it will not affect other places flushing out queue. That doesn't seem right nor make sense. > asoc->force_delay works as a 'local param' here as the msg sending is > under protection of sock lock. It would make sctp's MSG_MORE work as > tcp's. It is much more important to get MSG_MORE working 'properly' for SCTP than for TCP. For TCP an application can always use a long send. ... > @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr > *msg, size_t msg_len) >* breaks. >*/ > err = sctp_primitive_SEND(net, asoc, datamsg); > + asoc->force_delay = 0; > /* Did the lower layer accept the chunk? */ > if (err) { > sctp_datamsg_free(datamsg); I don't think this is right - or needed. You only get to the above if some test has decided to send data chunks. So it just means that the NEXT time someone tries to send data all the queued data gets sent. I'm guessing that the whole thing gets called in a loop (definitely needed for very long data chunks, or after the window is opened). Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE set it would expect to see a lot of full ethernet frames be sent. With the above a frame will be sent (containing all but 1 chunk) when the amount of queued data becomes too large for an ethernet frame, and immediately followed by a second ethernet frame with 1 chunk in it. Now it might be that the flag needs clearing when retransmissions are queued. OTOH they might get sent for other reasons. David