Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc

2017-03-07 Thread Xin Long
On Sat, Mar 4, 2017 at 12:51 PM, Xin Long  wrote:
> On Sat, Mar 4, 2017 at 1:57 AM, Xin Long  wrote:
>> On Sat, Mar 4, 2017 at 12:31 AM, David Laight  
>> wrote:
>>> From: Xin Long
 Sent: 03 March 2017 15:43
>>> ...
 > It is much more important to get MSG_MORE working 'properly' for SCTP
 > than for TCP. For TCP an application can always use a long send.
>>>
 "long send" ?, you mean bigger data, or keeping sending?
 I didn't get the difference between SCTP and TCP, they
 are similar when sending data.
>>>
>>> With tcp an application can always replace two send()/write()
>>> calls with a single call to writev().
>>> For sctp two send() calls must be made in order to generate two
>>> data chunks.
>>> So it is much easier for a tcp application to generate 'full'
>>> ethernet packets.
>> okay, it should not be a important reason, and sctp might also support
>> it one day. :-)
>>
>>>

 >
 > ...
 >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct 
 >> msghdr *msg, size_t msg_len)
 >>* breaks.
 >>*/
 >>   err = sctp_primitive_SEND(net, asoc, datamsg);
 >> + asoc->force_delay = 0;
 >>   /* Did the lower layer accept the chunk? */
 >>   if (err) {
 >>   sctp_datamsg_free(datamsg);
 >
 > I don't think this is right - or needed.
 > You only get to the above if some test has decided to send data chunks.
 > So it just means that the NEXT time someone tries to send data all the
 > queued data gets sent.
>>>
 the NEXT time someone tries to send data with "MSG_MORE clear",
 yes, but with "MSG_MORE set", it will still delay.

 > I'm guessing that the whole thing gets called in a loop (definitely 
 > needed
 > for very long data chunks, or after the window is opened).
>>>
 yes, if users keep sending data chunks with MSG_MORE set, no
 data with "MSG_MORE clear" gap.

 > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
 > set it would expect to see a lot of full ethernet frames be sent.
>>>
 right.
>>>
 > With the above a frame will be sent (containing all but 1 chunk) when the
 > amount of queued data becomes too large for an ethernet frame, and 
 > immediately
 > followed by a second ethernet frame with 1 chunk in it.
>>>
 "followed by a second ethernet frame with 1 chunk in it.", I think this's
 what you're really worried about, right ?
 But sctp flush data queue NOT like what you think, it's not keep traversing
 the queue untill the queue is empty.
 once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
 will return. it will pack chunks and send the next packet again untill some
 other 'event' triggers it, like retransmission or data received from peer.
 I don't think this is a problem.
>>>
>>> Erm that can't work.
>>> I think there is code to convert a large user send into multiple data 
>>> chunks.
>>> So if the user does a 4k (say) send several large chunks get queued.
>>> These would need to all be sent at once.
>>>
>>> Similarly when the transmit window is received.
>>> So somewhere there ought to be a loop that will send more than one packet.
>> As far as I can see, no loop like you said, mostly, the incoming
>> chunk (like SACK) from peer will trigger the next flush out.
>> I can try to trace the path in kernel for sure tomorrow.
> okay, you are right, I missed sctp_packet_transmit_chunk also call
> sctp_packet_transmit to send the current packet. :)
>
> But if we keep sending data with "MSG_MORE", after one ethernet frame
> is sent, "followed by a second ethernet frame with 1 chunk in it" will NOT
> happen, as in this loop the asoc's msg_more flag is still set, and this flush
> is called by sctp_sendmsg(the function msg_more should care more).
>
> did I miss something ?
Hi, David Laight

As now sctp MSG_MORE is broken with msg->force_day, I hope we
can use this patch before you get a better one to fix it, so that users
would not be confused with the strange behaviour.

what do you say ?

>
>>
>>>
 > Now it might be that the flag needs clearing when retransmissions are 
 > queued.
 > OTOH they might get sent for other reasons.
>>>
 Before we really overthought about MSG_MORE, no need to care about
 retransmissions, define MSG_MORE, in my opinion, it works more for
 *inflight is 0*, if it's not 0, we shouldn't stop other places flushing 
 them.
>>>
>>> Eh? and when nagle disabled.
>>> If 'inflight' isn't 0 then most paths don't flush data.
>> I knew, but MSG_MORE is different thing, it should only try to work for the
>> current and following data.
>>
>>>
 We cannot let asoc's more_more flag work as global, it will block elsewhere
 sending data chunks, not only sctp_sendmsg.
>>>
>>> If the connection was flow 

Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc

2017-03-03 Thread Xin Long
On Sat, Mar 4, 2017 at 1:57 AM, Xin Long  wrote:
> On Sat, Mar 4, 2017 at 12:31 AM, David Laight  wrote:
>> From: Xin Long
>>> Sent: 03 March 2017 15:43
>> ...
>>> > It is much more important to get MSG_MORE working 'properly' for SCTP
>>> > than for TCP. For TCP an application can always use a long send.
>>
>>> "long send" ?, you mean bigger data, or keeping sending?
>>> I didn't get the difference between SCTP and TCP, they
>>> are similar when sending data.
>>
>> With tcp an application can always replace two send()/write()
>> calls with a single call to writev().
>> For sctp two send() calls must be made in order to generate two
>> data chunks.
>> So it is much easier for a tcp application to generate 'full'
>> ethernet packets.
> okay, it should not be a important reason, and sctp might also support
> it one day. :-)
>
>>
>>>
>>> >
>>> > ...
>>> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct 
>>> >> msghdr *msg, size_t msg_len)
>>> >>* breaks.
>>> >>*/
>>> >>   err = sctp_primitive_SEND(net, asoc, datamsg);
>>> >> + asoc->force_delay = 0;
>>> >>   /* Did the lower layer accept the chunk? */
>>> >>   if (err) {
>>> >>   sctp_datamsg_free(datamsg);
>>> >
>>> > I don't think this is right - or needed.
>>> > You only get to the above if some test has decided to send data chunks.
>>> > So it just means that the NEXT time someone tries to send data all the
>>> > queued data gets sent.
>>
>>> the NEXT time someone tries to send data with "MSG_MORE clear",
>>> yes, but with "MSG_MORE set", it will still delay.
>>>
>>> > I'm guessing that the whole thing gets called in a loop (definitely needed
>>> > for very long data chunks, or after the window is opened).
>>
>>> yes, if users keep sending data chunks with MSG_MORE set, no
>>> data with "MSG_MORE clear" gap.
>>>
>>> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
>>> > set it would expect to see a lot of full ethernet frames be sent.
>>
>>> right.
>>
>>> > With the above a frame will be sent (containing all but 1 chunk) when the
>>> > amount of queued data becomes too large for an ethernet frame, and 
>>> > immediately
>>> > followed by a second ethernet frame with 1 chunk in it.
>>
>>> "followed by a second ethernet frame with 1 chunk in it.", I think this's
>>> what you're really worried about, right ?
>>> But sctp flush data queue NOT like what you think, it's not keep traversing
>>> the queue untill the queue is empty.
>>> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
>>> will return. it will pack chunks and send the next packet again untill some
>>> other 'event' triggers it, like retransmission or data received from peer.
>>> I don't think this is a problem.
>>
>> Erm that can't work.
>> I think there is code to convert a large user send into multiple data chunks.
>> So if the user does a 4k (say) send several large chunks get queued.
>> These would need to all be sent at once.
>>
>> Similarly when the transmit window is received.
>> So somewhere there ought to be a loop that will send more than one packet.
> As far as I can see, no loop like you said, mostly, the incoming
> chunk (like SACK) from peer will trigger the next flush out.
> I can try to trace the path in kernel for sure tomorrow.
okay, you are right, I missed sctp_packet_transmit_chunk also call
sctp_packet_transmit to send the current packet. :)

But if we keep sending data with "MSG_MORE", after one ethernet frame
is sent, "followed by a second ethernet frame with 1 chunk in it" will NOT
happen, as in this loop the asoc's msg_more flag is still set, and this flush
is called by sctp_sendmsg(the function msg_more should care more).

did I miss something ?

>
>>
>>> > Now it might be that the flag needs clearing when retransmissions are 
>>> > queued.
>>> > OTOH they might get sent for other reasons.
>>
>>> Before we really overthought about MSG_MORE, no need to care about
>>> retransmissions, define MSG_MORE, in my opinion, it works more for
>>> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing 
>>> them.
>>
>> Eh? and when nagle disabled.
>> If 'inflight' isn't 0 then most paths don't flush data.
> I knew, but MSG_MORE is different thing, it should only try to work for the
> current and following data.
>
>>
>>> We cannot let asoc's more_more flag work as global, it will block elsewhere
>>> sending data chunks, not only sctp_sendmsg.
>>
>> If the connection was flow controlled off, and more 'credit' arrives and 
>> there
>> is less that an ethernet frame's worth of data pending, and the last send
>> said 'MSG_MORE' there is no point sending anything until the application
>> does a send with MSG_MORE clear.
> got you, I think you have different understanding about MSG_MORE
> while this patch just try to make it work like TCP's msg_more, but what
> you mentioned here is the same as TCP thing, 

Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc

2017-03-03 Thread Xin Long
On Sat, Mar 4, 2017 at 12:31 AM, David Laight  wrote:
> From: Xin Long
>> Sent: 03 March 2017 15:43
> ...
>> > It is much more important to get MSG_MORE working 'properly' for SCTP
>> > than for TCP. For TCP an application can always use a long send.
>
>> "long send" ?, you mean bigger data, or keeping sending?
>> I didn't get the difference between SCTP and TCP, they
>> are similar when sending data.
>
> With tcp an application can always replace two send()/write()
> calls with a single call to writev().
> For sctp two send() calls must be made in order to generate two
> data chunks.
> So it is much easier for a tcp application to generate 'full'
> ethernet packets.
okay, it should not be a important reason, and sctp might also support
it one day. :-)

>
>>
>> >
>> > ...
>> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct 
>> >> msghdr *msg, size_t msg_len)
>> >>* breaks.
>> >>*/
>> >>   err = sctp_primitive_SEND(net, asoc, datamsg);
>> >> + asoc->force_delay = 0;
>> >>   /* Did the lower layer accept the chunk? */
>> >>   if (err) {
>> >>   sctp_datamsg_free(datamsg);
>> >
>> > I don't think this is right - or needed.
>> > You only get to the above if some test has decided to send data chunks.
>> > So it just means that the NEXT time someone tries to send data all the
>> > queued data gets sent.
>
>> the NEXT time someone tries to send data with "MSG_MORE clear",
>> yes, but with "MSG_MORE set", it will still delay.
>>
>> > I'm guessing that the whole thing gets called in a loop (definitely needed
>> > for very long data chunks, or after the window is opened).
>
>> yes, if users keep sending data chunks with MSG_MORE set, no
>> data with "MSG_MORE clear" gap.
>>
>> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
>> > set it would expect to see a lot of full ethernet frames be sent.
>
>> right.
>
>> > With the above a frame will be sent (containing all but 1 chunk) when the
>> > amount of queued data becomes too large for an ethernet frame, and 
>> > immediately
>> > followed by a second ethernet frame with 1 chunk in it.
>
>> "followed by a second ethernet frame with 1 chunk in it.", I think this's
>> what you're really worried about, right ?
>> But sctp flush data queue NOT like what you think, it's not keep traversing
>> the queue untill the queue is empty.
>> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
>> will return. it will pack chunks and send the next packet again untill some
>> other 'event' triggers it, like retransmission or data received from peer.
>> I don't think this is a problem.
>
> Erm that can't work.
> I think there is code to convert a large user send into multiple data chunks.
> So if the user does a 4k (say) send several large chunks get queued.
> These would need to all be sent at once.
>
> Similarly when the transmit window is received.
> So somewhere there ought to be a loop that will send more than one packet.
As far as I can see, no loop like you said, mostly, the incoming
chunk (like SACK) from peer will trigger the next flush out.
I can try to trace the path in kernel for sure tomorrow.

>
>> > Now it might be that the flag needs clearing when retransmissions are 
>> > queued.
>> > OTOH they might get sent for other reasons.
>
>> Before we really overthought about MSG_MORE, no need to care about
>> retransmissions, define MSG_MORE, in my opinion, it works more for
>> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.
>
> Eh? and when nagle disabled.
> If 'inflight' isn't 0 then most paths don't flush data.
I knew, but MSG_MORE is different thing, it should only try to work for the
current and following data.

>
>> We cannot let asoc's more_more flag work as global, it will block elsewhere
>> sending data chunks, not only sctp_sendmsg.
>
> If the connection was flow controlled off, and more 'credit' arrives and there
> is less that an ethernet frame's worth of data pending, and the last send
> said 'MSG_MORE' there is no point sending anything until the application
> does a send with MSG_MORE clear.
got you, I think you have different understanding about MSG_MORE
while this patch just try to make it work like TCP's msg_more, but what
you mentioned here is the same as TCP thing, seems you also want
to improve TCP's MSG_MORE :-)

>
> I'm not sure what causes a retransmission to send data, I suspect that 
> 'inflight'
> can easily be non-zero at that time.
The thing that causes a retransmission to send data is that both tx and
rtx send data through sctp_outq_flush, in which it will try to send rtx queue,
then rx queue.

yes, once a packet is sent out and not yet be SACKed, "inflight" will not be
zero, so when retransmiting, "inflight" must be non-zero.

> Likely something causes a packet be generated - which then collects the data 
> chunks.
>
> David
>
>


RE: [PATCH net] sctp: change to save MSG_MORE flag into assoc

2017-03-03 Thread David Laight
From: Xin Long
> Sent: 03 March 2017 15:43
...
> > It is much more important to get MSG_MORE working 'properly' for SCTP
> > than for TCP. For TCP an application can always use a long send.

> "long send" ?, you mean bigger data, or keeping sending?
> I didn't get the difference between SCTP and TCP, they
> are similar when sending data.

With tcp an application can always replace two send()/write()
calls with a single call to writev().
For sctp two send() calls must be made in order to generate two
data chunks.
So it is much easier for a tcp application to generate 'full'
ethernet packets. 

> 
> >
> > ...
> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct 
> >> msghdr *msg, size_t msg_len)
> >>* breaks.
> >>*/
> >>   err = sctp_primitive_SEND(net, asoc, datamsg);
> >> + asoc->force_delay = 0;
> >>   /* Did the lower layer accept the chunk? */
> >>   if (err) {
> >>   sctp_datamsg_free(datamsg);
> >
> > I don't think this is right - or needed.
> > You only get to the above if some test has decided to send data chunks.
> > So it just means that the NEXT time someone tries to send data all the
> > queued data gets sent.

> the NEXT time someone tries to send data with "MSG_MORE clear",
> yes, but with "MSG_MORE set", it will still delay.
> 
> > I'm guessing that the whole thing gets called in a loop (definitely needed
> > for very long data chunks, or after the window is opened).

> yes, if users keep sending data chunks with MSG_MORE set, no
> data with "MSG_MORE clear" gap.
> 
> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
> > set it would expect to see a lot of full ethernet frames be sent.

> right.

> > With the above a frame will be sent (containing all but 1 chunk) when the
> > amount of queued data becomes too large for an ethernet frame, and 
> > immediately
> > followed by a second ethernet frame with 1 chunk in it.

> "followed by a second ethernet frame with 1 chunk in it.", I think this's
> what you're really worried about, right ?
> But sctp flush data queue NOT like what you think, it's not keep traversing
> the queue untill the queue is empty.
> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
> will return. it will pack chunks and send the next packet again untill some
> other 'event' triggers it, like retransmission or data received from peer.
> I don't think this is a problem.

Erm that can't work.
I think there is code to convert a large user send into multiple data chunks.
So if the user does a 4k (say) send several large chunks get queued.
These would need to all be sent at once.

Similarly when the transmit window is received.
So somewhere there ought to be a loop that will send more than one packet.

> > Now it might be that the flag needs clearing when retransmissions are 
> > queued.
> > OTOH they might get sent for other reasons.

> Before we really overthought about MSG_MORE, no need to care about
> retransmissions, define MSG_MORE, in my opinion, it works more for
> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.

Eh? and when nagle disabled.
If 'inflight' isn't 0 then most paths don't flush data.

> We cannot let asoc's more_more flag work as global, it will block elsewhere
> sending data chunks, not only sctp_sendmsg.

If the connection was flow controlled off, and more 'credit' arrives and there
is less that an ethernet frame's worth of data pending, and the last send
said 'MSG_MORE' there is no point sending anything until the application
does a send with MSG_MORE clear.

I'm not sure what causes a retransmission to send data, I suspect that 
'inflight'
can easily be non-zero at that time.
Likely something causes a packet be generated - which then collects the data 
chunks.

David




Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc

2017-03-03 Thread Xin Long
On Fri, Mar 3, 2017 at 8:49 PM, David Laight  wrote:
> From: Xin Long
>> Sent: 03 March 2017 06:24
>> David Laight noticed the support for MSG_MORE with datamsg->force_day
>> didn't really work as we expected, as the first msg with MSG_MORE set
>> would always block the following chunks' dequeuing.
>>
>> This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
>> Divid Laight suggested.
>^ typo
ah, sorry. :P
>
>> asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
>> Once this msg is queued, asoc->force_delay is set back to 0, so that
>> it will not affect other places flushing out queue.
>
> That doesn't seem right nor make sense.
>
>> asoc->force_delay works as a 'local param' here as the msg sending is
>> under protection of sock lock.  It would make sctp's MSG_MORE work as
>> tcp's.
>
> It is much more important to get MSG_MORE working 'properly' for SCTP
> than for TCP. For TCP an application can always use a long send.
"long send" ?, you mean bigger data, or keeping sending?
I didn't get the difference between SCTP and TCP, they
are similar when sending data.

>
> ...
>> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr 
>> *msg, size_t msg_len)
>>* breaks.
>>*/
>>   err = sctp_primitive_SEND(net, asoc, datamsg);
>> + asoc->force_delay = 0;
>>   /* Did the lower layer accept the chunk? */
>>   if (err) {
>>   sctp_datamsg_free(datamsg);
>
> I don't think this is right - or needed.
> You only get to the above if some test has decided to send data chunks.
> So it just means that the NEXT time someone tries to send data all the
> queued data gets sent.
the NEXT time someone tries to send data with "MSG_MORE clear",
yes, but with "MSG_MORE set", it will still delay.

> I'm guessing that the whole thing gets called in a loop (definitely needed
> for very long data chunks, or after the window is opened).
yes, if users keep sending data chunks with MSG_MORE set, no
data with "MSG_MORE clear" gap.

> Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
> set it would expect to see a lot of full ethernet frames be sent.
right.
> With the above a frame will be sent (containing all but 1 chunk) when the
> amount of queued data becomes too large for an ethernet frame, and immediately
> followed by a second ethernet frame with 1 chunk in it.
"followed by a second ethernet frame with 1 chunk in it.", I think this's
what you're really worried about, right ?
But sctp flush data queue NOT like what you think, it's not keep traversing
the queue untill the queue is empty.
once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
will return. it will pack chunks and send the next packet again untill some
other 'event' triggers it, like retransmission or data received from peer.
I don't think this is a problem.

>
> Now it might be that the flag needs clearing when retransmissions are queued.
> OTOH they might get sent for other reasons.
Before we really overthought about MSG_MORE, no need to care about
retransmissions, define MSG_MORE, in my opinion, it works more for
*inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.

We cannot let asoc's more_more flag work as global, it will block elsewhere
sending data chunks, not only sctp_sendmsg.

Thanks

>
> David
>
>


RE: [PATCH net] sctp: change to save MSG_MORE flag into assoc

2017-03-03 Thread David Laight
From: Xin Long
> Sent: 03 March 2017 06:24
> David Laight noticed the support for MSG_MORE with datamsg->force_day
> didn't really work as we expected, as the first msg with MSG_MORE set
> would always block the following chunks' dequeuing.
> 
> This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
> Divid Laight suggested.
   ^ typo

> asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
> Once this msg is queued, asoc->force_delay is set back to 0, so that
> it will not affect other places flushing out queue.

That doesn't seem right nor make sense.

> asoc->force_delay works as a 'local param' here as the msg sending is
> under protection of sock lock.  It would make sctp's MSG_MORE work as
> tcp's.

It is much more important to get MSG_MORE working 'properly' for SCTP
than for TCP. For TCP an application can always use a long send.

...
> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr 
> *msg, size_t msg_len)
>* breaks.
>*/
>   err = sctp_primitive_SEND(net, asoc, datamsg);
> + asoc->force_delay = 0;
>   /* Did the lower layer accept the chunk? */
>   if (err) {
>   sctp_datamsg_free(datamsg);

I don't think this is right - or needed.
You only get to the above if some test has decided to send data chunks.
So it just means that the NEXT time someone tries to send data all the
queued data gets sent.
I'm guessing that the whole thing gets called in a loop (definitely needed
for very long data chunks, or after the window is opened).
Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
set it would expect to see a lot of full ethernet frames be sent.
With the above a frame will be sent (containing all but 1 chunk) when the
amount of queued data becomes too large for an ethernet frame, and immediately
followed by a second ethernet frame with 1 chunk in it.

Now it might be that the flag needs clearing when retransmissions are queued.
OTOH they might get sent for other reasons.

David