Re: [tipc-discussion] [PATCH RFC 2/2] tipc: fix changeover issues due to large packet
Hi Tuong. Thank you for your explanation . It makes sense to me. Please go ahead. Thanks, Ying -Original Message- From: Tuong Lien Tong [mailto:tuong.t.l...@dektech.com.au] Sent: Monday, June 17, 2019 4:49 PM To: Xue, Ying; tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com Subject: RE: [PATCH RFC 2/2] tipc: fix changeover issues due to large packet Hi Ying, Thanks for your comments! Regarding your last statement, yes when making the patch, I noticed that the "tipc_msg_build()" and "tipc_msg_fragment()" do a similar task, also I tried to think a way to combine them but didn't because of the reasons: 1- The "core" functions to copy the data are different since the "tipc_msg_build()" plays with user data in the iov buffers, whereas, for the other, it's skb data. Also, the outputs are different, the first function will set the messages' type in header such as "FIRST_FRAGMENT", "FRAGMENT" or "LAST_FRAGMENT", but not with the other because it will overwrite the tunnel messages' type... that I had to use the other field (fragm_no/nof_fragms) to determine this at the receiving side... 2- I don't want to touch the old code that can be risky :( BR/Tuong -Original Message- From: Ying Xue Sent: Sunday, June 16, 2019 1:42 PM To: Tuong Lien ; tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com Subject: Re: [PATCH RFC 2/2] tipc: fix changeover issues due to large packet > 2) The same scenario above can happen more easily in case the MTU of > the links is set differently or when changing. In that case, as long as > a large message in the failure link's transmq queue was built and > fragmented with its link's MTU > the other link's one, the issue will > happen (there is no need of a link synching in advance). > > 3) The link synching procedure also faces with the same issue but since > the link synching is only started upon receipt of a SYNCH_MSG, dropping > the message will not result in a state deadlock, but it is not expected > as design. > > The 1) & 3) issues are resolved by the previous commit 81e4dd94b214 This is the same as previous commit. The commit ID might be invalid after it's merged into upstream. > ("tipc: optimize link synching mechanism") by generating only a dummy > SYNCH_MSG (i.e. without data) at the link synching, so the size of a > FAILOVER_MSG if any then will never exceed the link's MTU. > /** > + * tipc_msg_fragment - build a fragment skb list for TIPC message > + * > + * @skb: TIPC message skb > + * @hdr: internal msg header to be put on the top of the fragments > + * @pktmax: max size of a fragment incl. the header > + * @frags: returned fragment skb list > + * > + * Returns 0 if the fragmentation is successful, otherwise: -EINVAL > + * or -ENOMEM > + */ > +int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr, > + int pktmax, struct sk_buff_head *frags) > +{ > + int pktno, nof_fragms, dsz, dmax, eat; > + struct tipc_msg *_hdr; > + struct sk_buff *_skb; > + u8 *data; > + > + /* Non-linear buffer? */ > + if (skb_linearize(skb)) > + return -ENOMEM; > + > + data = (u8 *)skb->data; > + dsz = msg_size(buf_msg(skb)); > + dmax = pktmax - INT_H_SIZE; > + > + if (dsz <= dmax || !dmax) > + return -EINVAL; > + > + nof_fragms = dsz / dmax + 1; > + > + for (pktno = 1; pktno <= nof_fragms; pktno++) { > + if (pktno < nof_fragms) > + eat = dmax; > + else > + eat = dsz % dmax; > + > + _skb = tipc_buf_acquire(INT_H_SIZE + eat, GFP_ATOMIC); > + if (!_skb) > + goto error; > + > + skb_orphan(_skb); > + __skb_queue_tail(frags, _skb); > + > + skb_copy_to_linear_data(_skb, hdr, INT_H_SIZE); > + skb_copy_to_linear_data_offset(_skb, INT_H_SIZE, data, eat); > + data += eat; > + > + _hdr = buf_msg(_skb); > + msg_set_fragm_no(_hdr, pktno); > + msg_set_nof_fragms(_hdr, nof_fragms); > + msg_set_size(_hdr, INT_H_SIZE + eat); > + } > + return 0; > + In fact we have similar code in tipc_msg_build() where we also fragment packet if necessary. In order to eliminate redundant code, I suggest we should extract the common code into a separate function and then tipc_msg_build() and tipc_msg_fragment() call it. ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [PATCH RFC 2/2] tipc: fix changeover issues due to large packet
Hi Ying, Thanks for your comments! Regarding your last statement, yes when making the patch, I noticed that the "tipc_msg_build()" and "tipc_msg_fragment()" do a similar task, also I tried to think a way to combine them but didn't because of the reasons: 1- The "core" functions to copy the data are different since the "tipc_msg_build()" plays with user data in the iov buffers, whereas, for the other, it's skb data. Also, the outputs are different, the first function will set the messages' type in header such as "FIRST_FRAGMENT", "FRAGMENT" or "LAST_FRAGMENT", but not with the other because it will overwrite the tunnel messages' type... that I had to use the other field (fragm_no/nof_fragms) to determine this at the receiving side... 2- I don't want to touch the old code that can be risky :( BR/Tuong -Original Message- From: Ying Xue Sent: Sunday, June 16, 2019 1:42 PM To: Tuong Lien ; tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com Subject: Re: [PATCH RFC 2/2] tipc: fix changeover issues due to large packet > 2) The same scenario above can happen more easily in case the MTU of > the links is set differently or when changing. In that case, as long as > a large message in the failure link's transmq queue was built and > fragmented with its link's MTU > the other link's one, the issue will > happen (there is no need of a link synching in advance). > > 3) The link synching procedure also faces with the same issue but since > the link synching is only started upon receipt of a SYNCH_MSG, dropping > the message will not result in a state deadlock, but it is not expected > as design. > > The 1) & 3) issues are resolved by the previous commit 81e4dd94b214 This is the same as previous commit. The commit ID might be invalid after it's merged into upstream. > ("tipc: optimize link synching mechanism") by generating only a dummy > SYNCH_MSG (i.e. without data) at the link synching, so the size of a > FAILOVER_MSG if any then will never exceed the link's MTU. > /** > + * tipc_msg_fragment - build a fragment skb list for TIPC message > + * > + * @skb: TIPC message skb > + * @hdr: internal msg header to be put on the top of the fragments > + * @pktmax: max size of a fragment incl. the header > + * @frags: returned fragment skb list > + * > + * Returns 0 if the fragmentation is successful, otherwise: -EINVAL > + * or -ENOMEM > + */ > +int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr, > + int pktmax, struct sk_buff_head *frags) > +{ > + int pktno, nof_fragms, dsz, dmax, eat; > + struct tipc_msg *_hdr; > + struct sk_buff *_skb; > + u8 *data; > + > + /* Non-linear buffer? */ > + if (skb_linearize(skb)) > + return -ENOMEM; > + > + data = (u8 *)skb->data; > + dsz = msg_size(buf_msg(skb)); > + dmax = pktmax - INT_H_SIZE; > + > + if (dsz <= dmax || !dmax) > + return -EINVAL; > + > + nof_fragms = dsz / dmax + 1; > + > + for (pktno = 1; pktno <= nof_fragms; pktno++) { > + if (pktno < nof_fragms) > + eat = dmax; > + else > + eat = dsz % dmax; > + > + _skb = tipc_buf_acquire(INT_H_SIZE + eat, GFP_ATOMIC); > + if (!_skb) > + goto error; > + > + skb_orphan(_skb); > + __skb_queue_tail(frags, _skb); > + > + skb_copy_to_linear_data(_skb, hdr, INT_H_SIZE); > + skb_copy_to_linear_data_offset(_skb, INT_H_SIZE, data, eat); > + data += eat; > + > + _hdr = buf_msg(_skb); > + msg_set_fragm_no(_hdr, pktno); > + msg_set_nof_fragms(_hdr, nof_fragms); > + msg_set_size(_hdr, INT_H_SIZE + eat); > + } > + return 0; > + In fact we have similar code in tipc_msg_build() where we also fragment packet if necessary. In order to eliminate redundant code, I suggest we should extract the common code into a separate function and then tipc_msg_build() and tipc_msg_fragment() call it. ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [PATCH RFC 2/2] tipc: fix changeover issues due to large packet
> 2) The same scenario above can happen more easily in case the MTU of > the links is set differently or when changing. In that case, as long as > a large message in the failure link's transmq queue was built and > fragmented with its link's MTU > the other link's one, the issue will > happen (there is no need of a link synching in advance). > > 3) The link synching procedure also faces with the same issue but since > the link synching is only started upon receipt of a SYNCH_MSG, dropping > the message will not result in a state deadlock, but it is not expected > as design. > > The 1) & 3) issues are resolved by the previous commit 81e4dd94b214 This is the same as previous commit. The commit ID might be invalid after it's merged into upstream. > ("tipc: optimize link synching mechanism") by generating only a dummy > SYNCH_MSG (i.e. without data) at the link synching, so the size of a > FAILOVER_MSG if any then will never exceed the link's MTU. > /** > + * tipc_msg_fragment - build a fragment skb list for TIPC message > + * > + * @skb: TIPC message skb > + * @hdr: internal msg header to be put on the top of the fragments > + * @pktmax: max size of a fragment incl. the header > + * @frags: returned fragment skb list > + * > + * Returns 0 if the fragmentation is successful, otherwise: -EINVAL > + * or -ENOMEM > + */ > +int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr, > + int pktmax, struct sk_buff_head *frags) > +{ > + int pktno, nof_fragms, dsz, dmax, eat; > + struct tipc_msg *_hdr; > + struct sk_buff *_skb; > + u8 *data; > + > + /* Non-linear buffer? */ > + if (skb_linearize(skb)) > + return -ENOMEM; > + > + data = (u8 *)skb->data; > + dsz = msg_size(buf_msg(skb)); > + dmax = pktmax - INT_H_SIZE; > + > + if (dsz <= dmax || !dmax) > + return -EINVAL; > + > + nof_fragms = dsz / dmax + 1; > + > + for (pktno = 1; pktno <= nof_fragms; pktno++) { > + if (pktno < nof_fragms) > + eat = dmax; > + else > + eat = dsz % dmax; > + > + _skb = tipc_buf_acquire(INT_H_SIZE + eat, GFP_ATOMIC); > + if (!_skb) > + goto error; > + > + skb_orphan(_skb); > + __skb_queue_tail(frags, _skb); > + > + skb_copy_to_linear_data(_skb, hdr, INT_H_SIZE); > + skb_copy_to_linear_data_offset(_skb, INT_H_SIZE, data, eat); > + data += eat; > + > + _hdr = buf_msg(_skb); > + msg_set_fragm_no(_hdr, pktno); > + msg_set_nof_fragms(_hdr, nof_fragms); > + msg_set_size(_hdr, INT_H_SIZE + eat); > + } > + return 0; > + In fact we have similar code in tipc_msg_build() where we also fragment packet if necessary. In order to eliminate redundant code, I suggest we should extract the common code into a separate function and then tipc_msg_build() and tipc_msg_fragment() call it. ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
[tipc-discussion] [PATCH RFC 2/2] tipc: fix changeover issues due to large packet
In conjunction with changing the interfaces' MTU (e.g. especially in the case of a bonding) where the TIPC links are brought up and down in a short time, a couple of issues were detected with the current link changeover mechanism: 1) When one link is up but immediately forced down again, the failover procedure will be carried out in order to failover all the messages in the link's transmq queue onto the other working link. The link and node state is also set to FAILINGOVER as part of the process. The message will be transmited in form of a FAILOVER_MSG, so its size is plus of 40 bytes (= the message header size). There is no problem if the original message size is not larger than the link's MTU - 40, and indeed this is the max size of a normal payload messages. However, in the situation above, because the link has just been up, the messages in the link's transmq are almost SYNCH_MSGs which had been generated by the link synching procedure, then their size might reach the max value already! When the FAILOVER_MSG is built on the top of such a SYNCH_MSG, its size will exceed the link's MTU. As a result, the messages are dropped silently and the failover procedure will never end up, the link will not be able to exit the FAILINGOVER state, so cannot be re-established. 2) The same scenario above can happen more easily in case the MTU of the links is set differently or when changing. In that case, as long as a large message in the failure link's transmq queue was built and fragmented with its link's MTU > the other link's one, the issue will happen (there is no need of a link synching in advance). 3) The link synching procedure also faces with the same issue but since the link synching is only started upon receipt of a SYNCH_MSG, dropping the message will not result in a state deadlock, but it is not expected as design. The 1) & 3) issues are resolved by the previous commit 81e4dd94b214 ("tipc: optimize link synching mechanism") by generating only a dummy SYNCH_MSG (i.e. without data) at the link synching, so the size of a FAILOVER_MSG if any then will never exceed the link's MTU. For the 2) issue, the only solution is trying to fragment the messages in the failure link's transmq queue according to the working link's MTU so they can be failovered then. A new function is made to accomplish this, it will still be a TUNNEL PROTOCOL/FAILOVER MSG but if the original message size is too large, it will be fragmented & reassembled at the receiving side. Signed-off-by: Tuong Lien --- net/tipc/link.c | 92 + net/tipc/msg.c | 62 ++ net/tipc/msg.h | 18 ++- 3 files changed, 158 insertions(+), 14 deletions(-) diff --git a/net/tipc/link.c b/net/tipc/link.c index 6924cf1e526f..3d2d2e4e4f14 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -182,6 +182,7 @@ struct tipc_link { /* Fragmentation/reassembly */ struct sk_buff *reasm_buf; + struct sk_buff *reasm_tnlmsg; /* Broadcast */ u16 ackers; @@ -899,8 +900,10 @@ void tipc_link_reset(struct tipc_link *l) l->backlog[TIPC_CRITICAL_IMPORTANCE].len = 0; l->backlog[TIPC_SYSTEM_IMPORTANCE].len = 0; kfree_skb(l->reasm_buf); + kfree_skb(l->reasm_tnlmsg); kfree_skb(l->failover_reasm_skb); l->reasm_buf = NULL; + l->reasm_tnlmsg = NULL; l->failover_reasm_skb = NULL; l->rcv_unacked = 0; l->snd_nxt = 1; @@ -943,6 +946,9 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list, int rc = 0; if (unlikely(msg_size(hdr) > mtu)) { + pr_warn("Purging list (len = %d), head msg <%d, %d>: %d\n", + skb_queue_len(list), msg_user(hdr), + msg_type(hdr), msg_size(hdr)); skb_queue_purge(list); return -EMSGSIZE; } @@ -1212,6 +1218,7 @@ static int tipc_link_tnl_rcv(struct tipc_link *l, struct sk_buff *skb, struct sk_buff_head *inputq) { struct sk_buff **reasm_skb = >failover_reasm_skb; + struct sk_buff **reasm_tnlmsg = >reasm_tnlmsg; struct sk_buff_head *fdefq = >failover_deferdq; struct tipc_msg *hdr = buf_msg(skb); struct sk_buff *iskb; @@ -1219,25 +1226,44 @@ static int tipc_link_tnl_rcv(struct tipc_link *l, struct sk_buff *skb, int rc = 0; u16 seqno; - /* SYNCH_MSG */ - if (msg_type(hdr) == SYNCH_MSG) - goto drop; + if (msg_type(hdr) == SYNCH_MSG) { + kfree_skb(skb); + return 0; + } - /* FAILOVER_MSG */ - if (!tipc_msg_extract(skb, , )) { - pr_warn_ratelimited("Cannot extract FAILOVER_MSG, defq: %d\n", - skb_queue_len(fdefq)); - return rc; + /* No fragmentation? */ + if (likely(!msg_nof_fragms(hdr))) { +