-----Original Message----- From: Jon Maloy <jma...@redhat.com> Sent: Tuesday, May 5, 2020 1:13 AM To: Tuong Tong Lien <tuong.t.l...@dektech.com.au>; ma...@donjonn.com; ying....@windriver.com; tipc-discussion@lists.sourceforge.net Cc: tipc-dek <tipc-...@dektech.com.au> Subject: Re: [RFC PATCH 2/2] tipc: add test for Nagle algorithm effectiveness
On 5/4/20 7:28 AM, Tuong Lien wrote: > When streaming in Nagle mode, we try to bundle small messages from user > as many as possible if there is one outstanding buffer, i.e. not ACK-ed > by the receiving side, which helps boost up the overall throughput. So, > the algorithm's effectiveness really depends on when Nagle ACK comes or > what the specific network latency (RTT) is, compared to the user's > message sending rate. > > In a bad case, the user's sending rate is low or the network latency is > small, there will not be many bundles, so making a Nagle ACK or waiting > for it is not meaningful. > For example: a user sends its messages every 100ms and the RTT is 50ms, > then for each messages, we require one Nagle ACK but then there is only > one user message sent without any bundles. > > In a better case, even if we have a few bundles (e.g. the RTT = 300ms), > but now the user sends messages in medium size, then there will not be > any difference at all, that says 3 x 1000-byte data messages if bundled > will still result in 3 bundles with MTU = 1500. > > When Nagle is ineffective, the delay in user message sending is clearly > wasted instead of sending directly. > > Besides, adding Nagle ACKs will consume some processor load on both the > sending and receiving sides. > > This commit adds a test on the effectiveness of the Nagle algorithm for > an individual connection in the network on which it actually runs. > Particularly, upon receipt of a Nagle ACK we will compare the number of > bundles in the backlog queue to the number of user messages which would > be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle > mode will be kept for further message sending. Otherwise, we will leave > Nagle and put a 'penalty' on the connection, so it will have to spend > more 'one-way' messages before being able to re-enter Nagle. > > In addition, the 'ack-required' bit is only set when really needed that > the number of Nagle ACKs will be reduced during Nagle mode. > > Testing with benchmark showed that with the patch, there was not much > difference in throughput for small messages since the tool continuously > sends messages without a break, so Nagle would still take in effect. > > Signed-off-by: Tuong Lien <tuong.t.l...@dektech.com.au> > --- > net/tipc/msg.c | 3 --- > net/tipc/msg.h | 14 +++++++++++-- > net/tipc/socket.c | 60 > ++++++++++++++++++++++++++++++++++++++++++++----------- > 3 files changed, 60 insertions(+), 17 deletions(-) > > diff --git a/net/tipc/msg.c b/net/tipc/msg.c > index 69d68512300a..732cd95b5c75 100644 > --- a/net/tipc/msg.c > +++ b/net/tipc/msg.c > @@ -235,9 +235,6 @@ int tipc_msg_append(struct tipc_msg *_hdr, struct msghdr > *m, int dlen, > msg_set_size(hdr, MIN_H_SIZE); > __skb_queue_tail(txq, skb); > total += 1; > - if (prev) > - msg_set_ack_required(buf_msg(prev), 0); > - msg_set_ack_required(hdr, 1); > } > hdr = buf_msg(skb); > curr = msg_blocks(hdr); > diff --git a/net/tipc/msg.h b/net/tipc/msg.h > index 5f37a611e8c9..44543892af11 100644 > --- a/net/tipc/msg.h > +++ b/net/tipc/msg.h > @@ -340,9 +340,19 @@ static inline int msg_ack_required(struct tipc_msg *m) > return msg_bits(m, 0, 18, 1); > } > > -static inline void msg_set_ack_required(struct tipc_msg *m, u32 d) > +static inline void msg_set_ack_required(struct tipc_msg *m) > { > - msg_set_bits(m, 0, 18, 1, d); > + msg_set_bits(m, 0, 18, 1, 1); > +} > + > +static inline int msg_nagle_ack(struct tipc_msg *m) > +{ > + return msg_bits(m, 0, 18, 1); > +} > + > +static inline void msg_set_nagle_ack(struct tipc_msg *m) > +{ > + msg_set_bits(m, 0, 18, 1, 1); > } > > static inline bool msg_is_rcast(struct tipc_msg *m) > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > index 4e71774528ad..93b0a6159cb1 100644 > --- a/net/tipc/socket.c > +++ b/net/tipc/socket.c > @@ -119,7 +119,10 @@ struct tipc_sock { > struct rcu_head rcu; > struct tipc_group *group; > u32 oneway; > + u32 nagle_start; > u16 snd_backlog; > + u16 msg_acc; > + u16 pkt_cnt; > bool expect_ack; > bool nodelay; > bool group_is_open; > @@ -143,7 +146,7 @@ static int tipc_sk_insert(struct tipc_sock *tsk); > static void tipc_sk_remove(struct tipc_sock *tsk); > static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t > dsz); > static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t > dsz); > -static void tipc_sk_push_backlog(struct tipc_sock *tsk); > +static void tipc_sk_push_backlog(struct tipc_sock *tsk, bool nagle_ack); > > static const struct proto_ops packet_ops; > static const struct proto_ops stream_ops; > @@ -474,6 +477,7 @@ static int tipc_sk_create(struct net *net, struct socket > *sock, > tsk = tipc_sk(sk); > tsk->max_pkt = MAX_PKT_DEFAULT; > tsk->maxnagle = 0; > + tsk->nagle_start = 4; > INIT_LIST_HEAD(&tsk->publications); > INIT_LIST_HEAD(&tsk->cong_links); > msg = &tsk->phdr; > @@ -541,7 +545,7 @@ static void __tipc_shutdown(struct socket *sock, int > error) > !tsk_conn_cong(tsk))); > > /* Push out delayed messages if in Nagle mode */ > - tipc_sk_push_backlog(tsk); > + tipc_sk_push_backlog(tsk, false); > /* Remove pending SYN */ > __skb_queue_purge(&sk->sk_write_queue); > > @@ -1252,14 +1256,37 @@ void tipc_sk_mcast_rcv(struct net *net, struct > sk_buff_head *arrvq, > /* tipc_sk_push_backlog(): send accumulated buffers in socket write queue > * when socket is in Nagle mode > */ > -static void tipc_sk_push_backlog(struct tipc_sock *tsk) > +static void tipc_sk_push_backlog(struct tipc_sock *tsk, bool nagle_ack) > { > struct sk_buff_head *txq = &tsk->sk.sk_write_queue; > + struct sk_buff *skb = skb_peek_tail(txq); > struct net *net = sock_net(&tsk->sk); > u32 dnode = tsk_peer_node(tsk); > - struct sk_buff *skb = skb_peek(txq); > int rc; > > + if (nagle_ack) { > + tsk->pkt_cnt += skb_queue_len(txq); > + if (!tsk->pkt_cnt || tsk->msg_acc / tsk->pkt_cnt < 2) { > + tsk->oneway = 0; > + if (tsk->nagle_start < 1000) > + tsk->nagle_start *= 2; > + tsk->expect_ack = false; > + pr_debug("tsk %10u: bad nagle %u -> %u, next start > %u!\n", > + tsk->portid, tsk->msg_acc, tsk->pkt_cnt, > + tsk->nagle_start); > + } else { > + tsk->nagle_start = 4; > + if (skb) { > + msg_set_ack_required(buf_msg(skb)); > + tsk->expect_ack = true; > + } else { > + tsk->expect_ack = false; > + } > + } > + tsk->msg_acc = 0; > + tsk->pkt_cnt = 0; > + } > + > if (!skb || tsk->cong_link_cnt) > return; > > @@ -1267,9 +1294,10 @@ static void tipc_sk_push_backlog(struct tipc_sock *tsk) > if (msg_is_syn(buf_msg(skb))) > return; > > + if (tsk->msg_acc) > + tsk->pkt_cnt += skb_queue_len(txq); > tsk->snt_unacked += tsk->snd_backlog; > tsk->snd_backlog = 0; > - tsk->expect_ack = true; > rc = tipc_node_xmit(net, txq, dnode, tsk->portid); > if (rc == -ELINKCONG) > tsk->cong_link_cnt = 1; > @@ -1322,8 +1350,7 @@ static void tipc_sk_conn_proto_rcv(struct tipc_sock > *tsk, struct sk_buff *skb, > return; > } else if (mtyp == CONN_ACK) { > was_cong = tsk_conn_cong(tsk); > - tsk->expect_ack = false; > - tipc_sk_push_backlog(tsk); > + tipc_sk_push_backlog(tsk, msg_nagle_ack(hdr)); > tsk->snt_unacked -= msg_conn_ack(hdr); > if (tsk->peer_caps & TIPC_BLOCK_FLOWCTL) > tsk->snd_win = msg_adv_win(hdr); > @@ -1544,17 +1571,24 @@ static int __tipc_sendstream(struct socket *sock, > struct msghdr *m, size_t dlen) > break; > send = min_t(size_t, dlen - sent, TIPC_MAX_USER_MSG_SIZE); > blocks = tsk->snd_backlog; > - if (tsk->oneway++ >= 4 && send <= maxnagle) { > + if (tsk->oneway++ >= tsk->nagle_start && send <= maxnagle) { > rc = tipc_msg_append(hdr, m, send, maxnagle, txq); > if (unlikely(rc < 0)) > break; > blocks += rc; > + tsk->msg_acc++; > if (blocks <= 64 && tsk->expect_ack) { > tsk->snd_backlog = blocks; > sent += send; > break; > + } else if (blocks > 64) { > + tsk->pkt_cnt += skb_queue_len(txq); > + } else { > + > msg_set_ack_required(buf_msg(skb_peek_tail(txq))); > + tsk->expect_ack = true; > + tsk->msg_acc = 0; > + tsk->pkt_cnt = 0; > } > - tsk->expect_ack = true; > } else { > rc = tipc_msg_build(hdr, m, sent, send, maxpkt, txq); > if (unlikely(rc != send)) > @@ -2092,7 +2126,7 @@ static void tipc_sk_proto_rcv(struct sock *sk, > smp_wmb(); > tsk->cong_link_cnt--; > wakeup = true; > - tipc_sk_push_backlog(tsk); > + tipc_sk_push_backlog(tsk, false); > break; > case GROUP_PROTOCOL: > tipc_group_proto_rcv(grp, &wakeup, hdr, inputq, xmitq); > @@ -2181,7 +2215,7 @@ static bool tipc_sk_filter_connect(struct tipc_sock > *tsk, struct sk_buff *skb, > return false; > case TIPC_ESTABLISHED: > if (!skb_queue_empty(&sk->sk_write_queue)) > - tipc_sk_push_backlog(tsk); > + tipc_sk_push_backlog(tsk, false); > /* Accept only connection-based messages sent by peer */ > if (likely(con_msg && !err && pport == oport && > pnode == onode)) { > @@ -2189,8 +2223,10 @@ static bool tipc_sk_filter_connect(struct tipc_sock > *tsk, struct sk_buff *skb, > struct sk_buff *skb; > > skb = tipc_sk_build_ack(tsk); > - if (skb) > + if (skb) { > + msg_set_nagle_ack(buf_msg(skb)); > __skb_queue_tail(xmitq, skb); > + } > } > return true; > } Nice job. Does this even solve the latency problem you had observed at customer site? [Tuong]: Yes, after applying these two patches, they didn't observe the increase in latency or CPU load in their tests (even though Nagle was still enabled in some way). BR/Tuong Acked-by: Jon Maloy <jma...@redhat.com> _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion