> -----Original Message-----
> From: Xue, Ying [mailto:ying....@windriver.com]
> Sent: Tuesday, 23 August, 2016 05:48
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net;
> Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>; Richard
> Alpe <richard.a...@ericsson.com>
> Cc: ma...@donjonn.com; gbala...@gmail.com
> Subject: RE: [PATCH net-next 1/3] tipc: transfer broadcast nacks in link state
> messages
> 
> -----Original Message-----
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Wednesday, August 17, 2016 2:09 AM
> To: tipc-discussion@lists.sourceforge.net;
> parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com;
> jon.ma...@ericsson.com
> Cc: ma...@donjonn.com; gbala...@gmail.com
> Subject: [PATCH net-next 1/3] tipc: transfer broadcast nacks in link state
> messages
> 
> When we send broadcasts in clusters of more 70-80 nodes, we sometimes see
> the broadcast link resetting because of an excessive number of 
> retransmissions.
> This is caused by a combination of two factors:
> 
> 1) A 'NACK crunch", where loss of broadcast packets is discovered
>    and NACK'ed by several nodes simultaneously, leading to multiple
>    redundant broadcast retransmissions.
> 
> 2) The fact that the NACKS as such also are sent as broadcast, leading
>    to excessive load and packet loss on the transmitting switch/bridge.
> 
> This commit deals with the latter problem, by moving sending of broadcast 
> nacks
> from the dedicated BCAST_PROTOCOL/NACK message type to regular unicast
> LINK_PROTOCOL/STATE messages. We allocate 10 unused bits in word 8 of the
> said message for this purpose, and introduce a new capability bit,
> TIPC_BCAST_STATE_NACK in order to keep the change backwards compatible.
> 
> [Ying] Using unicast to send NACK message is much better than bcast.
> 
> 
> /* tipc_link_bc_sync_rcv - update rcv link according to peer's send state
>   */
> -void tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
> -                        struct sk_buff_head *xmitq)
> +int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
> +                       struct sk_buff_head *xmitq)
>  {
>       u16 peers_snd_nxt = msg_bc_snd_nxt(hdr);
> +     u16 from = msg_bcast_ack(hdr) + 1;
> +     u16 to = from + msg_bc_gap(hdr) - 1;
> +     int rc = 0;
> 
>       if (!link_is_up(l))
> -             return;
> +             return rc;
> 
>       if (!msg_peer_node_is_up(hdr))
> -             return;
> +             return rc;
> 
>       /* Open when peer ackowledges our bcast init msg (pkt #1) */
>       if (msg_ack(hdr))
>               l->bc_peer_is_up = true;
> 
>       if (!l->bc_peer_is_up)
> -             return;
> +             return rc;
> 
>       /* Ignore if peers_snd_nxt goes beyond receive window */
>       if (more(peers_snd_nxt, l->rcv_nxt + l->window))
> -             return;
> +             return rc;
> +
> +     if (!less(to, from)) {
> +             rc = tipc_link_retrans(l->bc_sndlink, from, to, xmitq);
> +             l->stats.recv_nacks++;
> +     }
> +
> +     l->snd_nxt = peers_snd_nxt;
> +     if (link_bc_rcv_gap(l))
> +             rc |= TIPC_LINK_SND_STATE;
> +
> +     /* Return now if sender supports nack via STATE messages */
> +     if (l->peer_caps & TIPC_BCAST_STATE_NACK)
> +             return rc;
> +
> 
> [Ying] If peer nodes doesn't support the new NACK way, it seems we not only
> retransmit the missed messages through tipc_link_retrans(), but also we send a
> NACK message created with tipc_link_build_bc_proto_msg(). But in the old
> mode, we don't need to retransmit the missed messages at the moment. Do you
> think this different behavior is right?

In the old code link_bc_sync_rcv() was dealing only with one task: to let the 
receiving endpoint discover gaps in the packet sequence, and if necessary send 
NACKs according to certain rules.

In the new code this function has two tasks: 1) the receiving endpoint still 
discovers gaps, but now sends NACKs unconditionally (in STATE messages).  2) 
The sending endpoint registers reported gaps in msg_bc_gap() (i.e., a received 
NACKs), and performs retransmission as requested.

Now, if the receiving node supports the new algorithm and the sending doesn't, 
the receiver will produce an 'old' NACK message and send it out, but not a 
'new' one. Neither will it retransmit anything, both because there probably is 
nothing to retransmit (it is the receiver), and because msg_bc_gap() can only 
be zero,  so 'to' will be less than 'from', and nothing will ever be 
retransmitted.

If it is the other way around, the sender supports the new algorithm and the 
receiver doesn't, the receiver can only produce an 'old' NACK, and the sender 
will have to retransmit in the (now otherwise unused) tipc_link_bc_nack_rcv() 
function. Once again, msg_bc_gap() can only be zero, so nothing will be 
retransmitted in link_bc_sync_rcv().

So, from an 'old' node's viewpoint there is no changed behavior, and everything 
is backwards compatible.

BR
///jon

> 
> Thanks,
> Ying
> 
> +     /* Otherwise, be backwards compatible */
> 
>       if (!more(peers_snd_nxt, l->rcv_nxt)) {
>               l->nack_state = BC_NACK_SND_CONDITIONAL;
> -             return;
> +             return 0;
>       }
> 
>       /* Don't NACK if one was recently sent or peeked */
>       if (l->nack_state == BC_NACK_SND_SUPPRESS) {
>               l->nack_state = BC_NACK_SND_UNCONDITIONAL;
> -             return;
> +             return 0;
>       }
> 
>       /* Conditionally delay NACK sending until next synch rcv */
>       if (l->nack_state == BC_NACK_SND_CONDITIONAL) {
>               l->nack_state = BC_NACK_SND_UNCONDITIONAL;
>               if ((peers_snd_nxt - l->rcv_nxt) < TIPC_MIN_LINK_WIN)
> -                     return;
> +                     return 0;
>       }
> 
>       /* Send NACK now but suppress next one */
>       tipc_link_build_bc_proto_msg(l, true, peers_snd_nxt, xmitq);
>       l->nack_state = BC_NACK_SND_SUPPRESS;
> +     return 0;
>  }

------------------------------------------------------------------------------
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to