Re: [tipc-discussion] [PATCH RFC 2/2] tipc: fix changeover issues due to large packet

2019-06-18 Thread Xue, Ying
Hi Tuong.

Thank you for your explanation . It makes sense to me. 
Please go ahead. 

Thanks,
Ying

-Original Message-
From: Tuong Lien Tong [mailto:tuong.t.l...@dektech.com.au] 
Sent: Monday, June 17, 2019 4:49 PM
To: Xue, Ying; tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com
Subject: RE: [PATCH RFC 2/2] tipc: fix changeover issues due to large packet

Hi Ying,

Thanks for your comments!
Regarding your last statement, yes when making the patch, I noticed that the 
"tipc_msg_build()" and "tipc_msg_fragment()" do a similar task, also I tried to 
think a way to combine them but didn't because of the reasons:
1- The "core" functions to copy the data are different since the 
"tipc_msg_build()" plays with user data in the iov buffers, whereas, for the 
other, it's skb data.
Also, the outputs are different, the first function will set the messages' type 
in header such as "FIRST_FRAGMENT", "FRAGMENT" or "LAST_FRAGMENT", but not with 
the other because it will overwrite the tunnel messages' type... that I had to 
use the other field (fragm_no/nof_fragms) to determine this at the receiving 
side...
2- I don't want to touch the old code that can be risky :(

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Sunday, June 16, 2019 1:42 PM
To: Tuong Lien ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com
Subject: Re: [PATCH RFC 2/2] tipc: fix changeover issues due to large packet

> 2) The same scenario above can happen more easily in case the MTU of
> the links is set differently or when changing. In that case, as long as
> a large message in the failure link's transmq queue was built and
> fragmented with its link's MTU > the other link's one, the issue will
> happen (there is no need of a link synching in advance).
> 
> 3) The link synching procedure also faces with the same issue but since
> the link synching is only started upon receipt of a SYNCH_MSG, dropping
> the message will not result in a state deadlock, but it is not expected
> as design.
> 
> The 1) & 3) issues are resolved by the previous commit 81e4dd94b214

This is the same as previous commit. The commit ID might be invalid
after it's merged into upstream.

> ("tipc: optimize link synching mechanism") by generating only a dummy
> SYNCH_MSG (i.e. without data) at the link synching, so the size of a
> FAILOVER_MSG if any then will never exceed the link's MTU.

>  /**
> + * tipc_msg_fragment - build a fragment skb list for TIPC message
> + *
> + * @skb: TIPC message skb
> + * @hdr: internal msg header to be put on the top of the fragments
> + * @pktmax: max size of a fragment incl. the header
> + * @frags: returned fragment skb list
> + *
> + * Returns 0 if the fragmentation is successful, otherwise: -EINVAL
> + * or -ENOMEM
> + */
> +int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr,
> +   int pktmax, struct sk_buff_head *frags)
> +{
> + int pktno, nof_fragms, dsz, dmax, eat;
> + struct tipc_msg *_hdr;
> + struct sk_buff *_skb;
> + u8 *data;
> +
> + /* Non-linear buffer? */
> + if (skb_linearize(skb))
> + return -ENOMEM;
> +
> + data = (u8 *)skb->data;
> + dsz = msg_size(buf_msg(skb));
> + dmax = pktmax - INT_H_SIZE;
> +
> + if (dsz <= dmax || !dmax)
> + return -EINVAL;
> +
> + nof_fragms = dsz / dmax + 1;
> +
> + for (pktno = 1; pktno <= nof_fragms; pktno++) {
> + if (pktno < nof_fragms)
> + eat = dmax;
> + else
> + eat = dsz % dmax;
> +
> + _skb = tipc_buf_acquire(INT_H_SIZE + eat, GFP_ATOMIC);
> + if (!_skb)
> + goto error;
> +
> + skb_orphan(_skb);
> + __skb_queue_tail(frags, _skb);
> +
> + skb_copy_to_linear_data(_skb, hdr, INT_H_SIZE);
> + skb_copy_to_linear_data_offset(_skb, INT_H_SIZE, data, eat);
> + data += eat;
> +
> + _hdr = buf_msg(_skb);
> + msg_set_fragm_no(_hdr, pktno);
> + msg_set_nof_fragms(_hdr, nof_fragms);
> + msg_set_size(_hdr, INT_H_SIZE + eat);
> + }
> + return 0;
> +

In fact we have similar code in tipc_msg_build() where we also fragment
packet if necessary. In order to eliminate redundant code, I suggest we
should extract the common code into a separate function and then
tipc_msg_build() and tipc_msg_fragment() call it.



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH RFC 2/2] tipc: fix changeover issues due to large packet

2019-06-17 Thread Tuong Lien Tong
Hi Ying,

Thanks for your comments!
Regarding your last statement, yes when making the patch, I noticed that the 
"tipc_msg_build()" and "tipc_msg_fragment()" do a similar task, also I tried to 
think a way to combine them but didn't because of the reasons:
1- The "core" functions to copy the data are different since the 
"tipc_msg_build()" plays with user data in the iov buffers, whereas, for the 
other, it's skb data.
Also, the outputs are different, the first function will set the messages' type 
in header such as "FIRST_FRAGMENT", "FRAGMENT" or "LAST_FRAGMENT", but not with 
the other because it will overwrite the tunnel messages' type... that I had to 
use the other field (fragm_no/nof_fragms) to determine this at the receiving 
side...
2- I don't want to touch the old code that can be risky :(

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Sunday, June 16, 2019 1:42 PM
To: Tuong Lien ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com
Subject: Re: [PATCH RFC 2/2] tipc: fix changeover issues due to large packet

> 2) The same scenario above can happen more easily in case the MTU of
> the links is set differently or when changing. In that case, as long as
> a large message in the failure link's transmq queue was built and
> fragmented with its link's MTU > the other link's one, the issue will
> happen (there is no need of a link synching in advance).
> 
> 3) The link synching procedure also faces with the same issue but since
> the link synching is only started upon receipt of a SYNCH_MSG, dropping
> the message will not result in a state deadlock, but it is not expected
> as design.
> 
> The 1) & 3) issues are resolved by the previous commit 81e4dd94b214

This is the same as previous commit. The commit ID might be invalid
after it's merged into upstream.

> ("tipc: optimize link synching mechanism") by generating only a dummy
> SYNCH_MSG (i.e. without data) at the link synching, so the size of a
> FAILOVER_MSG if any then will never exceed the link's MTU.

>  /**
> + * tipc_msg_fragment - build a fragment skb list for TIPC message
> + *
> + * @skb: TIPC message skb
> + * @hdr: internal msg header to be put on the top of the fragments
> + * @pktmax: max size of a fragment incl. the header
> + * @frags: returned fragment skb list
> + *
> + * Returns 0 if the fragmentation is successful, otherwise: -EINVAL
> + * or -ENOMEM
> + */
> +int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr,
> +   int pktmax, struct sk_buff_head *frags)
> +{
> + int pktno, nof_fragms, dsz, dmax, eat;
> + struct tipc_msg *_hdr;
> + struct sk_buff *_skb;
> + u8 *data;
> +
> + /* Non-linear buffer? */
> + if (skb_linearize(skb))
> + return -ENOMEM;
> +
> + data = (u8 *)skb->data;
> + dsz = msg_size(buf_msg(skb));
> + dmax = pktmax - INT_H_SIZE;
> +
> + if (dsz <= dmax || !dmax)
> + return -EINVAL;
> +
> + nof_fragms = dsz / dmax + 1;
> +
> + for (pktno = 1; pktno <= nof_fragms; pktno++) {
> + if (pktno < nof_fragms)
> + eat = dmax;
> + else
> + eat = dsz % dmax;
> +
> + _skb = tipc_buf_acquire(INT_H_SIZE + eat, GFP_ATOMIC);
> + if (!_skb)
> + goto error;
> +
> + skb_orphan(_skb);
> + __skb_queue_tail(frags, _skb);
> +
> + skb_copy_to_linear_data(_skb, hdr, INT_H_SIZE);
> + skb_copy_to_linear_data_offset(_skb, INT_H_SIZE, data, eat);
> + data += eat;
> +
> + _hdr = buf_msg(_skb);
> + msg_set_fragm_no(_hdr, pktno);
> + msg_set_nof_fragms(_hdr, nof_fragms);
> + msg_set_size(_hdr, INT_H_SIZE + eat);
> + }
> + return 0;
> +

In fact we have similar code in tipc_msg_build() where we also fragment
packet if necessary. In order to eliminate redundant code, I suggest we
should extract the common code into a separate function and then
tipc_msg_build() and tipc_msg_fragment() call it.




___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH RFC 2/2] tipc: fix changeover issues due to large packet

2019-06-16 Thread Ying Xue
> 2) The same scenario above can happen more easily in case the MTU of
> the links is set differently or when changing. In that case, as long as
> a large message in the failure link's transmq queue was built and
> fragmented with its link's MTU > the other link's one, the issue will
> happen (there is no need of a link synching in advance).
> 
> 3) The link synching procedure also faces with the same issue but since
> the link synching is only started upon receipt of a SYNCH_MSG, dropping
> the message will not result in a state deadlock, but it is not expected
> as design.
> 
> The 1) & 3) issues are resolved by the previous commit 81e4dd94b214

This is the same as previous commit. The commit ID might be invalid
after it's merged into upstream.

> ("tipc: optimize link synching mechanism") by generating only a dummy
> SYNCH_MSG (i.e. without data) at the link synching, so the size of a
> FAILOVER_MSG if any then will never exceed the link's MTU.

>  /**
> + * tipc_msg_fragment - build a fragment skb list for TIPC message
> + *
> + * @skb: TIPC message skb
> + * @hdr: internal msg header to be put on the top of the fragments
> + * @pktmax: max size of a fragment incl. the header
> + * @frags: returned fragment skb list
> + *
> + * Returns 0 if the fragmentation is successful, otherwise: -EINVAL
> + * or -ENOMEM
> + */
> +int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr,
> +   int pktmax, struct sk_buff_head *frags)
> +{
> + int pktno, nof_fragms, dsz, dmax, eat;
> + struct tipc_msg *_hdr;
> + struct sk_buff *_skb;
> + u8 *data;
> +
> + /* Non-linear buffer? */
> + if (skb_linearize(skb))
> + return -ENOMEM;
> +
> + data = (u8 *)skb->data;
> + dsz = msg_size(buf_msg(skb));
> + dmax = pktmax - INT_H_SIZE;
> +
> + if (dsz <= dmax || !dmax)
> + return -EINVAL;
> +
> + nof_fragms = dsz / dmax + 1;
> +
> + for (pktno = 1; pktno <= nof_fragms; pktno++) {
> + if (pktno < nof_fragms)
> + eat = dmax;
> + else
> + eat = dsz % dmax;
> +
> + _skb = tipc_buf_acquire(INT_H_SIZE + eat, GFP_ATOMIC);
> + if (!_skb)
> + goto error;
> +
> + skb_orphan(_skb);
> + __skb_queue_tail(frags, _skb);
> +
> + skb_copy_to_linear_data(_skb, hdr, INT_H_SIZE);
> + skb_copy_to_linear_data_offset(_skb, INT_H_SIZE, data, eat);
> + data += eat;
> +
> + _hdr = buf_msg(_skb);
> + msg_set_fragm_no(_hdr, pktno);
> + msg_set_nof_fragms(_hdr, nof_fragms);
> + msg_set_size(_hdr, INT_H_SIZE + eat);
> + }
> + return 0;
> +

In fact we have similar code in tipc_msg_build() where we also fragment
packet if necessary. In order to eliminate redundant code, I suggest we
should extract the common code into a separate function and then
tipc_msg_build() and tipc_msg_fragment() call it.



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH RFC 2/2] tipc: fix changeover issues due to large packet

2019-06-03 Thread Tuong Lien
In conjunction with changing the interfaces' MTU (e.g. especially in
the case of a bonding) where the TIPC links are brought up and down
in a short time, a couple of issues were detected with the current link
changeover mechanism:

1) When one link is up but immediately forced down again, the failover
procedure will be carried out in order to failover all the messages in
the link's transmq queue onto the other working link. The link and node
state is also set to FAILINGOVER as part of the process. The message
will be transmited in form of a FAILOVER_MSG, so its size is plus of 40
bytes (= the message header size). There is no problem if the original
message size is not larger than the link's MTU - 40, and indeed this is
the max size of a normal payload messages. However, in the situation
above, because the link has just been up, the messages in the link's
transmq are almost SYNCH_MSGs which had been generated by the link
synching procedure, then their size might reach the max value already!
When the FAILOVER_MSG is built on the top of such a SYNCH_MSG, its size
will exceed the link's MTU. As a result, the messages are dropped
silently and the failover procedure will never end up, the link will
not be able to exit the FAILINGOVER state, so cannot be re-established.

2) The same scenario above can happen more easily in case the MTU of
the links is set differently or when changing. In that case, as long as
a large message in the failure link's transmq queue was built and
fragmented with its link's MTU > the other link's one, the issue will
happen (there is no need of a link synching in advance).

3) The link synching procedure also faces with the same issue but since
the link synching is only started upon receipt of a SYNCH_MSG, dropping
the message will not result in a state deadlock, but it is not expected
as design.

The 1) & 3) issues are resolved by the previous commit 81e4dd94b214
("tipc: optimize link synching mechanism") by generating only a dummy
SYNCH_MSG (i.e. without data) at the link synching, so the size of a
FAILOVER_MSG if any then will never exceed the link's MTU.

For the 2) issue, the only solution is trying to fragment the messages
in the failure link's transmq queue according to the working link's MTU
so they can be failovered then. A new function is made to accomplish
this, it will still be a TUNNEL PROTOCOL/FAILOVER MSG but if the
original message size is too large, it will be fragmented & reassembled
at the receiving side.

Signed-off-by: Tuong Lien 
---
 net/tipc/link.c | 92 +
 net/tipc/msg.c  | 62 ++
 net/tipc/msg.h  | 18 ++-
 3 files changed, 158 insertions(+), 14 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 6924cf1e526f..3d2d2e4e4f14 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -182,6 +182,7 @@ struct tipc_link {
 
/* Fragmentation/reassembly */
struct sk_buff *reasm_buf;
+   struct sk_buff *reasm_tnlmsg;
 
/* Broadcast */
u16 ackers;
@@ -899,8 +900,10 @@ void tipc_link_reset(struct tipc_link *l)
l->backlog[TIPC_CRITICAL_IMPORTANCE].len = 0;
l->backlog[TIPC_SYSTEM_IMPORTANCE].len = 0;
kfree_skb(l->reasm_buf);
+   kfree_skb(l->reasm_tnlmsg);
kfree_skb(l->failover_reasm_skb);
l->reasm_buf = NULL;
+   l->reasm_tnlmsg = NULL;
l->failover_reasm_skb = NULL;
l->rcv_unacked = 0;
l->snd_nxt = 1;
@@ -943,6 +946,9 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head 
*list,
int rc = 0;
 
if (unlikely(msg_size(hdr) > mtu)) {
+   pr_warn("Purging list (len = %d), head msg <%d, %d>: %d\n",
+   skb_queue_len(list), msg_user(hdr),
+   msg_type(hdr), msg_size(hdr));
skb_queue_purge(list);
return -EMSGSIZE;
}
@@ -1212,6 +1218,7 @@ static int tipc_link_tnl_rcv(struct tipc_link *l, struct 
sk_buff *skb,
 struct sk_buff_head *inputq)
 {
struct sk_buff **reasm_skb = >failover_reasm_skb;
+   struct sk_buff **reasm_tnlmsg = >reasm_tnlmsg;
struct sk_buff_head *fdefq = >failover_deferdq;
struct tipc_msg *hdr = buf_msg(skb);
struct sk_buff *iskb;
@@ -1219,25 +1226,44 @@ static int tipc_link_tnl_rcv(struct tipc_link *l, 
struct sk_buff *skb,
int rc = 0;
u16 seqno;
 
-   /* SYNCH_MSG */
-   if (msg_type(hdr) == SYNCH_MSG)
-   goto drop;
+   if (msg_type(hdr) == SYNCH_MSG) {
+   kfree_skb(skb);
+   return 0;
+   }
 
-   /* FAILOVER_MSG */
-   if (!tipc_msg_extract(skb, , )) {
-   pr_warn_ratelimited("Cannot extract FAILOVER_MSG, defq: %d\n",
-   skb_queue_len(fdefq));
-   return rc;
+   /* No fragmentation? */
+   if (likely(!msg_nof_fragms(hdr))) {
+