Re: [tipc-discussion] [PATCH RFC 1/2] tipc: add Gap ACK blocks support for broadcast link

2020-03-18 Thread Tuong Lien Tong
rom :2@0/0:3415801530@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, 
BC 0) OK

Report #31 from :2@0/0:2647656161@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, 
BC 0) OK

Report #32 from :2@0/0:0628455908@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, 
BC 0) OK

>> Multicast Test SUCCESSFUL

 

>> Starting Broadcast Test

Commander: Scaling out to 1 Workers with Id 0/1

Commander: Received 1 UP Events for Member Id 0

Commander: Scaling out to 16 Workers with Id 1/1

Commander: Received 16 UP Events for Member Id 1

Commander: Scaling out to 16 Workers with Id 2/2

Commander: Received 16 UP Events for Member Id 2

:1@0/0:0695351808@1001002: Sent UC 0, AC 0, MC 0, BC 44, throughput last 
intv 5 Mb/s

:1@0/0:0695351808@1001002: Sent UC 0, AC 0, MC 0, BC 88, throughput last 
intv 5 Mb/s

:1@0/0:0695351808@1001002: Sent UC 0, AC 0, MC 0, BC 115, throughput last 
intv 2 Mb/s

:1@0/0:0695351808@1001002: Sent UC 0, AC 0, MC 0, BC 149, throughput last 
intv 4 Mb/s

Commander: Scaling in to 0 Workers with Cmd Member Id 1

Commander: Scaling in to 0 Workers with Cmd Member Id 2

Commander: Scaling in to 0 Workers with Cmd Member Id 0

Report #0  from :1@0/0:0695351808@1001002: Sent 183 [0,182] (UC 0, AC 0, MC 
0, BC 183) OK

Report #1  from :1@0/0:2378798558@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #2  from :2@0/0:4138497570@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #3  from :2@0/0:1578004322@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #4  from :2@0/0:1963550269@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #5  from :1@0/0:0758742723@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #6  from :2@0/0:2249538355@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #7  from :1@0/0:2618664889@1001004: Recv 181 [0,180] (UC 0, AC 0, MC 
0, BC 181) OK

Report #8  from :2@0/0:0089095375@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #9  from :2@0/0:2509170385@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #10 from :1@0/0:2696073811@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #11 from :1@0/0:3980318671@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #12 from :1@0/0:0106403114@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #13 from :2@0/0:3523511402@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #14 from :2@0/0:1617008759@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #15 from :1@0/0:2862823240@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #16 from :1@0/0:2426779766@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #17 from :2@0/0:1044722113@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #18 from :2@0/0:2464695570@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #19 from :2@0/0:0091087500@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #20 from :2@0/0:3515828561@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #21 from :2@0/0:3633045121@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #22 from :1@0/0:1262462104@1001004: Recv 181 [0,180] (UC 0, AC 0, MC 
0, BC 181) OK

Report #23 from :1@0/0:3238164678@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #24 from :2@0/0:3734069867@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #25 from :2@0/0:2038920765@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #26 from :1@0/0:2642249832@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #27 from :1@0/0:0375382704@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #28 from :1@0/0:0878979536@1001003: Recv 181 [0,180] (UC 0, AC 0, MC 
0, BC 181) OK

Report #29 from :1@0/0:3519740202@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #30 from :1@0/0:2767769890@1001003: Recv 181 [0,180] (UC 0, AC 0, MC 
0, BC 181) OK

Report #31 from :1@0/0:3714610817@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

Report #32 from :2@0/0:0482654234@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 
0, BC 180) OK

>> Broadcast Test SUCCESSFUL

 

*** TIPC Group Messaging Test Finished 

 

Note: with the patch, the broadcast link bug has been eliminated because the 
‘tipc_link_bc_retrans()’ is completely removed.

BR/Tuong

 

From: Jon Maloy  
Sent: Wednesday, March 18, 2020 1:12 AM
To: Tuong Lien Tong ; 
tipc-discussion@lists.sourceforge.net; ma...@donjonn.com; 
ying@windriver.com; l...@redhat.com
Subject: Re: [tipc-discussion] [PATCH RFC 1/2] tipc: add Gap ACK blocks support 
for broadcast link

 

 

On 3/17/20 4:15 AM, Tuong Lien Tong wrote:

Hi Jon,

 

In terms of scalability, yes, the design was indeed focusing on it, the new 
stuffs are per i

Re: [tipc-discussion] [net-next 3/3] tipc: introduce variable window congestion control

2020-03-17 Thread Tuong Lien Tong
Hi Jon,

Ok, that makes sense (but we should have covered the case a broadcast packet is 
released too...).
However, I have another concern about the logic here:

> + /* Enter fast recovery */
> + if (unlikely(retransmitted)) {
> + l->ssthresh = max_t(u16, l->window / 2, 300);
> + l->window = l->ssthresh;
> + return;
> + }

What will if we have a retransmission when it's still in the slow-start phase? 
For example:
l->ssthresh = 300
l-> window = 60
==> retransmitted = true, then: l->ssthresh = 300; l->window = 300???

This looks not correct?
Should it be:

> + /* Enter fast recovery */
> + if (unlikely(retransmitted)) {
> + l->ssthresh = max_t(u16, l->window / 2, 300);
> - l->window = l->ssthresh;
> + l->window = min_t(u16, l->window, l->ssthresh);
> + return;
> + }

So will fix the issue with broadcast case as well?

BR/Tuong

-----Original Message-
From: Jon Maloy  
Sent: Wednesday, March 18, 2020 1:38 AM
To: Tuong Lien Tong ; 'Jon Maloy' 
; 'Jon Maloy' 
Cc: tipc-discussion@lists.sourceforge.net; 
mohan.krishna.ghanta.krishnamur...@ericsson.com
Subject: Re: [tipc-discussion] [net-next 3/3] tipc: introduce variable window 
congestion control



On 3/17/20 6:55 AM, Tuong Lien Tong wrote:
> Hi Jon,
>
> For the "variable window congestion control" patch, if I remember correctly,
> it is for unicast link only? Why did you apply it for broadcast link, a
> mistake or ...?
I did it so the code would be the same everywhere. Then, by setting both
min_win and max_win to the same value BC_LINK_WIN_DEFAULT (==50)
in the broadcast send link this window should never change.

> It now causes user messages disordered on the receiving side, because on the
> sending side, the broadcast link's window is suddenly increased to 300 (i.e.
> max_t(u16, l->window / 2, 300)) at a packet retransmission, leaving some
> gaps between the link's 'transmq' & 'backlogq' unexpectedly... Will we fix
> this by removing it?
That is clearly a bug that breaks the above stated limitation.
It should be sufficient to check that also l->ssthresh never exceeds
l->max_win to remedy this.

///jon

>
> @@ -1160,7 +1224,6 @@ static int tipc_link_bc_retrans(struct tipc_link *l,
> struct tipc_link *r,
>   continue;
>   if (more(msg_seqno(hdr), to))
>   break;
> -
>   if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
>   continue;
>   TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM;
> @@ -1173,11 +1236,12 @@ static int tipc_link_bc_retrans(struct tipc_link *l,
> struct tipc_link *r,
>   _skb->priority = TC_PRIO_CONTROL;
>   __skb_queue_tail(xmitq, _skb);
>   l->stats.retransmitted++;
> -
> + retransmitted++;
>   /* Increase actual retrans counter & mark first time */
>   if (!TIPC_SKB_CB(skb)->retr_cnt++)
>   TIPC_SKB_CB(skb)->retr_stamp = jiffies;
>   }
> + tipc_link_update_cwin(l, 0, retransmitted); // ???
>   return 0;
>   }
>
> +static void tipc_link_update_cwin(struct tipc_link *l, int released,
> +   bool retransmitted)
> +{
> + int bklog_len = skb_queue_len(>backlogq);
> + struct sk_buff_head *txq = >transmq;
> + int txq_len = skb_queue_len(txq);
> + u16 cwin = l->window;
> +
> + /* Enter fast recovery */
> + if (unlikely(retransmitted)) {
> + l->ssthresh = max_t(u16, l->window / 2, 300);
> + l->window = l->ssthresh;
> + return;
> + }
>
> BR/Tuong
>
> -Original Message-
> From: Jon Maloy 
> Sent: Monday, December 2, 2019 7:33 AM
> To: Jon Maloy ; Jon Maloy 
> Cc: mohan.krishna.ghanta.krishnamur...@ericsson.com;
> parthasarathy.bhuvara...@gmail.com; tung.q.ngu...@dektech.com.au;
> hoang.h...@dektech.com.au; tuong.t.l...@dektech.com.au;
> gordan.mihalje...@dektech.com.au; ying@windriver.com;
> tipc-discussion@lists.sourceforge.net
> Subject: [net-next 3/3] tipc: introduce variable window congestion control
>
> We introduce a simple variable window congestion control for links.
> The algorithm is inspired by the Reno algorithm, covering both 'slow
> start', 'congestion avoidance', and 'fast recovery' modes.
>
> - We introduce hard lower and upper window limits per link, still
>different and configurable per bearer type.
>
> - We introduce as 'slow start theshold' variable, initially set to
>the maximum window size.
>
> - We let a link start at the mini

Re: [tipc-discussion] [net-next 3/3] tipc: introduce variable window congestion control

2020-03-17 Thread Tuong Lien Tong
Hi Jon,

For the "variable window congestion control" patch, if I remember correctly,
it is for unicast link only? Why did you apply it for broadcast link, a
mistake or ...?
It now causes user messages disordered on the receiving side, because on the
sending side, the broadcast link's window is suddenly increased to 300 (i.e.
max_t(u16, l->window / 2, 300)) at a packet retransmission, leaving some
gaps between the link's 'transmq' & 'backlogq' unexpectedly... Will we fix
this by removing it?

@@ -1160,7 +1224,6 @@ static int tipc_link_bc_retrans(struct tipc_link *l,
struct tipc_link *r,
continue;
if (more(msg_seqno(hdr), to))
break;
-
if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
continue;
TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM;
@@ -1173,11 +1236,12 @@ static int tipc_link_bc_retrans(struct tipc_link *l,
struct tipc_link *r,
_skb->priority = TC_PRIO_CONTROL;
__skb_queue_tail(xmitq, _skb);
l->stats.retransmitted++;
-
+   retransmitted++;
/* Increase actual retrans counter & mark first time */
if (!TIPC_SKB_CB(skb)->retr_cnt++)
TIPC_SKB_CB(skb)->retr_stamp = jiffies;
}
+   tipc_link_update_cwin(l, 0, retransmitted); // ???
return 0;
 }

+static void tipc_link_update_cwin(struct tipc_link *l, int released,
+ bool retransmitted)
+{
+   int bklog_len = skb_queue_len(>backlogq);
+   struct sk_buff_head *txq = >transmq;
+   int txq_len = skb_queue_len(txq);
+   u16 cwin = l->window;
+
+   /* Enter fast recovery */
+   if (unlikely(retransmitted)) {
+   l->ssthresh = max_t(u16, l->window / 2, 300);
+   l->window = l->ssthresh;
+   return;
+   }

BR/Tuong

-Original Message-
From: Jon Maloy  
Sent: Monday, December 2, 2019 7:33 AM
To: Jon Maloy ; Jon Maloy 
Cc: mohan.krishna.ghanta.krishnamur...@ericsson.com;
parthasarathy.bhuvara...@gmail.com; tung.q.ngu...@dektech.com.au;
hoang.h...@dektech.com.au; tuong.t.l...@dektech.com.au;
gordan.mihalje...@dektech.com.au; ying@windriver.com;
tipc-discussion@lists.sourceforge.net
Subject: [net-next 3/3] tipc: introduce variable window congestion control

We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, covering both 'slow
start', 'congestion avoidance', and 'fast recovery' modes.

- We introduce hard lower and upper window limits per link, still
  different and configurable per bearer type.

- We introduce as 'slow start theshold' variable, initially set to
  the maximum window size.

- We let a link start at the minimum congestion window, i.e. in slow
  start mode, and then let is grow rapidly (+1 per rceived ACK) until
  it reaches the slow start threshold and enters congestion avoidance
  mode.

- In congestion avoidance mode we increment the congestion window for
  each window_size number of acked packets, up to a possible maximum
  equal to the configured maximum window.

- For each non-duplicate NACK received, we drop back to fast recovery
  mode, by setting the both the slow start threshold to and the
  congestion window to (current_congestion_window / 2).

- If the timeout handler finds that the transmit queue has not moved
  timeout, it drops the link back to slow start and forces a probe
  containing the last sent sequence number to the sent to the peer.

This change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.

This algorithm seems to give a 50-100% throughput improvement for
messages larger than MTU.

Suggested-by: Xin Long 
Acked-by: Xin Long 
Signed-off-by: Jon Maloy 
---
 net/tipc/bcast.c |  11 ++--
 net/tipc/bearer.c|  11 ++--
 net/tipc/bearer.h|   6 +-
 net/tipc/eth_media.c |   3 +-
 net/tipc/ib_media.c  |   5 +-
 net/tipc/link.c  | 175
+++
 net/tipc/link.h  |   9 +--
 net/tipc/node.c  |  16 ++---
 net/tipc/udp_media.c |   3 +-
 9 files changed, 160 insertions(+), 79 deletions(-)







___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH RFC 1/2] tipc: add Gap ACK blocks support for broadcast link

2020-03-17 Thread Tuong Lien Tong
Member Id 2

Commander: Scaling in to 0 Workers with Cmd Member Id 0

Report #0  from :1@0/0:3890883707@1001002: Sent 91 [0,90] (UC 0, AC 0, MC 
0, BC 91) OK

Report #1  from :2@0/0:1098659974@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #2  from :1@0/0:3441709654@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 
0, BC 80) OK

Report #3  from :2@0/0:0018441197@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #4  from :2@0/0:0584054290@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #5  from :2@0/0:4244201461@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #6  from :2@0/0:1307600351@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #7  from :2@0/0:3941241491@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #8  from :2@0/0:2927828986@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #9  from :1@0/0:1743241608@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 
0, BC 80) OK

Report #10 from :2@0/0:1115515409@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #11 from :1@0/0:0815130796@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 
0, BC 80) OK

Report #12 from :1@0/0:2618044568@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #13 from :2@0/0:1424259027@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #14 from :1@0/0:1011077421@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 
0, BC 80) OK

Report #15 from :1@0/0:3249391177@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 
0, BC 80) OK

Report #16 from :1@0/0:277433@1001003: Recv 81 [0,80] (UC 0, AC 0, MC 
0, BC 81) OK

Report #17 from :1@0/0:0860766920@1001004: Recv 81 [0,80] (UC 0, AC 0, MC 
0, BC 81) OK

Report #18 from :1@0/0:0196231326@1001003: Recv 81 [0,80] (UC 0, AC 0, MC 
0, BC 81) OK

Report #19 from :1@0/0:4278611377@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #20 from :1@0/0:3464416884@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 
0, BC 80) OK

Report #21 from :2@0/0:1718387937@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #22 from :2@0/0:0267090087@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #23 from :2@0/0:1694243136@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #24 from :2@0/0:0918300899@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #25 from :2@0/0:0811475995@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #26 from :1@0/0:0388357605@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 
0, BC 80) OK

Report #27 from :1@0/0:1113395305@1001004: Recv 81 [0,80] (UC 0, AC 0, MC 
0, BC 81) OK

Report #28 from :1@0/0:3413026333@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 
0, BC 80) OK

Report #29 from :2@0/0:2907075331@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #30 from :1@0/0:1393297086@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 
0, BC 80) OK

Report #31 from :2@0/0:3493179185@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 
0, BC 79) OK

Report #32 from :1@0/0:3166541927@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 
0, BC 80) OK

>> Broadcast Test SUCCESSFUL

 

*** TIPC Group Messaging Test Finished 

 

# tipc l st sh

Link 

  Window:50 packets

  RX packets:0 fragments:0/0 bundles:0/0

  TX packets:5908 fragments:5904/123 bundles:0/0

  RX naks:0 defs:0 dups:0

  TX naks:0 acks:0 retrans:324

  Congestion link:32  Send queue max:0 avg:0

 

BR/Tuong

 

-Original Message-
From: Jon Maloy  
Sent: Tuesday, March 17, 2020 2:49 AM
To: Tuong Lien Tong ; 
tipc-discussion@lists.sourceforge.net; ma...@donjonn.com; 
ying@windriver.com; l...@redhat.com
Subject: Re: [tipc-discussion] [PATCH RFC 1/2] tipc: add Gap ACK blocks support 
for broadcast link

 

 

 

On 3/16/20 2:18 PM, Jon Maloy wrote:

> 

> 

> On 3/16/20 7:23 AM, Tuong Lien Tong wrote:

>> 

[...]

>> 

> The improvement shown here is truly impressive. However, you are only 

> showing tipc-pipe with small messages. How does this look when you 

> send full-size 66k messages? How does it scale when the number of 

> destinations grows up to tens or even hundreds? I am particularly 

> concerned that the use of unicast retransmission may become a 

> sub-optimization if the number of destinations is large.

> 

> ///jon

You should try the "multicast_blast" program under tipc-utils/test. That 

will give you

numbers both on throughput and loss rates as you let the number of nodes

grow.

 

///jon

 

> 

>> BR/Tuong

>> 

>> *From:* Jon Maloy < <mailto:jma...@redhat.com> jma...@redhat.com>

>> *Sent:* Friday, March 13, 2020 10:47 PM

>> *To:* Tuong Lien < <mailto:tuong.t.l...@dektech.com.au> 
>> tuong.t.l...@dektech.com.au>; 

>>  <mailto:tipc-discussion@lists.sourceforge.net> 
>> tipc-discussion@lists.sourceforge.net;  <mailto:ma...@donj

[tipc-discussion] FW: [PATCH] ptts: fix tipcTS failure in case of latency

2020-03-11 Thread Tuong Lien Tong
Resend this... It seemed to be dropped somehow...

BR/Tuong

-Original Message-
From: Tuong Lien  
Sent: Wednesday, February 19, 2020 2:42 PM
To: tipc-discussion@lists.sourceforge.net; jma...@redhat.com
Cc: tipc-...@dektech.com.au; Tuong Lien 
Subject: [PATCH] ptts: fix tipcTS failure in case of latency

The 'ptts' test keeps failed when testing under high traffic with some
network latency. This is because the 'tipcTS' server side doesn't wait
long enough at its 'select()' call, just 1s+ and gets timeout. The
time variable is also not re-initiated after the 1st timeout, so the
next attempts just return shortly i.e. timeout = 0:

./tipcTS -v
...
Received on 0 sockets in subtest 6, expected 2
Received on 0 sockets in subtest 6, expected 2
Received on 0 sockets in subtest 6, expected 2
===>Finished SubTest 7: received 0 msgs of sz -1 at 2 sockets (40 per
socket)
TEST FAILED Received wrong number of multicast messages

The commit fixes the issue by increasing the timeout value to 3s and also
re-initiating it correctly.

Signed-off-by: Tuong Lien 
---
 test/ptts/tipc_ts_server.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/test/ptts/tipc_ts_server.c b/test/ptts/tipc_ts_server.c
index 3a2f96f..e102c94 100644
--- a/test/ptts/tipc_ts_server.c
+++ b/test/ptts/tipc_ts_server.c
@@ -610,7 +610,7 @@ void server_mcast
rcvbuf = malloc(66000);
buf = rcvbuf;
recvSyncTIPC (TS_SYNC_ID_3);/* wait for client to tell us to
start */
-   timeout.tv_sec  = 1;
+   timeout.tv_sec  = 3;
timeout.tv_usec = 0;
dbg1("===>Starting SubTest %d\n", st);
 
@@ -625,12 +625,12 @@ void server_mcast
while (sk_cnt < exp_sks ) {
fds = *readfds; 
num_ready = select(FD_SETSIZE, , NULL, NULL,
);
+   timeout.tv_sec  = 3;
if (!num_ready) {
printf("Received on %u sockets in subtest
%u, expected %u\n",
   sk_cnt, st, exp_num[numSubTest]);
break;
}
-   timeout.tv_sec  = 1;
for (i = 0; i < TIPC_MCAST_SOCKETS; i++) {

if(!FD_ISSET(sd[i], ))
-- 
2.1.4




___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] Fwd: [net v2] tipc: fix wrong connect() return code

2020-01-22 Thread Tuong Lien Tong
An alternative is to allow the 'connect()' to return '0' in this particular 
case. The socket state is already 'DISCONNECTING', so any further write will 
get '-EPIPE' but the socket is still able to read the messages existing in its 
receive queue. This may seem a bit contrary to the previous patch but they are 
not mutually exclusive, the previous one would only deal with the case the 
socket state is 'DISCONNECTING' but the 'sk->err' != 0.
How does this look correct from a functional standpoint?

BR/Tuong

-Original Message-
From: Jon Maloy  
Sent: Tuesday, January 21, 2020 7:11 PM
To: ying@windriver.com
Cc: tipc-discussion@lists.sourceforge.net
Subject: [tipc-discussion] Fwd: [net v2] tipc: fix wrong connect() return code

Hi Ying,
Do you have any ideas regarding this? My best proposal is to att a time stamp 
to the new server socket at creation,
and then delay a 'close' if the socket is < 5ms old. 
But I am not quite happy with this solution, so any better idea would be 
welcome.

///jon


> 
> 
>- Forwarded Message ----- From: Tuong Lien Tong
>To: 'Ying Xue' ;
>"jma...@redhat.com" ; "ma...@donjonn.com"
>Cc: "tipc-...@dektech.com.au"
>Sent: Monday, January 13, 2020, 04:57:58 AM
>GMT-5Subject: FW: [net v2] tipc: fix wrong connect() return code
>  
> Hi Ying, Jon,
> 
>   
> 
> The last change has exposed a different issue with our “lightweight”
> connection setup mechanism,
> 
> Please see the scenario below:
> 
>   
> 
>   
> 
>server process client process
> 
>   ---
>   --
> 
>  create()|  |
> 
>  listen()*tskL  |
> 
>bind()|  |create()
> 
>   +- accept()*  tskC*
> 
>   |  |  |connect() ~+
> 
> wait_for_rcv()|  |<-[SYN]---|   |
> 
>   |  |  |//tskC: CONNECTING
>   |  |  | *wait_for_conn()
> 
>   +->*tskS  |
>  |->add to wait queue
> 
>  |//tskS: ESTABLISHED   |   |
> 
>  |--[ACK]-->|//tskC: ESTABLISHED
>  |   ->wakeup(): SLEEPING -> run queue
> 
>send()|--[DATA]->|\
> ->wakeup(): already woken
> 
>send()|--[DATA]->| |
>->wakeup(): already woken
> 
>  .   .. . |-> receive queue .
> 
>  .   .. . | .
> 
>send()|--[DATA]->|/
> ->wakeup(): already woken
> 
>   close()|--[FIN]-->|//tskC: DISCONNECTING  |
> 
>  |  |   |
> 
>  |
> ~~>schedule():
>  |run queue -> RUNNING
> 
> 
> |->add
> 
> |back
> 
> |to
> 
> |wait
> 
> |queue
> 
> |(state
> 
> |!=
> 
> |ESTABLISHED)
> 
> .
> 
> .
> 
> |
> 
> 
> *timed
> 
> out
> 
>  

Re: [tipc-discussion] [net] tipc: fix wrong connect() return code

2019-12-30 Thread Tuong Lien Tong
Hi Ying,

Unfortunately, the 'tipc_wait_for_cond()' macro is not applicable here... It's 
for the sendmsg() cases only which expects the socket in the 'ESTABLISHED' 
state first, otherwise it will return '- ENOTCONN' (see the 
'tipc_sk_sock_err()'). Of course, we could adjust it for the case but looks not 
so good..., so I have decided to keep the function and just change the wait 
condition to 'tipc_sk_connected()'.
Please see the patch v2 I just sent out.

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Friday, December 27, 2019 2:04 PM
To: Tuong Lien Tong ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com
Subject: Re: [tipc-discussion] [net] tipc: fix wrong connect() return code

On 12/27/19 10:22 AM, Tuong Lien Tong wrote:
> Hi Ying,
> 
> Thinking more about this...
> I suppose we can even remove the function and utilize the generic macro
> 'tipc_wait_for_cond()' but with the new condition, that is much simpler and
> also saves some footprint.

Agreed.

> The 'tipc_sk_connected()' condition is really what we should expect in this
> case instead.

Yes.

> So, in the case of 'TIPC_DISCONNECTING', regardless of whether we have a
> 'sk->sk_err', it will return that error or '-ETIMEOUT' but never '0'.

But I think it's better to return the error attached on 'sk->sk_err' to
user because it can really reflect why connect() is failed.

> I will send the patch v2 for your review.
> 
> BR/Tuong
> 
> -Original Message-
> From: Tuong Lien Tong  
> Sent: Thursday, December 26, 2019 5:39 PM
> To: 'Ying Xue' ;
> tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com;
> ma...@donjonn.com
> Subject: Re: [tipc-discussion] [net] tipc: fix wrong connect() return code
> 
> What about this?
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 6ebd809ef207..04f035bcc272 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -2446,7 +2446,7 @@ static int tipc_wait_for_connect(struct socket *sock,
> long *timeo_p)
> done = sk_wait_event(sk, timeo_p,
>  sk->sk_state != TIPC_CONNECTING,
> );
> remove_wait_queue(sk_sleep(sk), );
> -   } while (!done);
> +   } while (!tipc_sk_connected(sk));
> return 0;
>  }
> 
> BR/Tuong
> 
> -Original Message-
> From: Ying Xue  
> Sent: Thursday, December 26, 2019 2:55 PM
> To: Tuong Lien Tong ;
> tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com;
> ma...@donjonn.com
> Subject: Re: [net] tipc: fix wrong connect() return code
> 
> On 12/26/19 3:16 PM, Tuong Lien Tong wrote:
>> Sorry, my mistake! I thought it was "while (!done || !sk->sk_err)"...
>> But, this is really confusing and one might ask why we continue the loop
> while the socket has encountered an error (sk-> sk_err! = 0)???
> 
> This is totally reasonable because it can make code simpler.
> [Tuong]: but a performance hit since it almost looks like the sk_err has to
> be read twice?
> 
>> Moreover, it's not necessarily the "sk_err" will be the only case,
>> For example:
>>  sk->sk_state != TIPC_CONNECTING but sk->sk_err = 0 (somehow) and a
> pending/interrupt signal
> 
> If sk->sk_state is changed from TIPC_CONNECTING to TIPC_ESTABLISHED,
> sk->sk_err should be kept 0, but if sk->sk_state is changed to other
> states, sk->sk_err must be set with a proper error code in socket
> receive path, otherwise, it's a bug.
> 
> However, there might be one below race condition:
> 
> sk->sk_state is set to TIPC_DISCONNECTING, but before sk->sk_err is set
> with an error code, the process who calls connect() is woken up by one
> single. Even so, the process is still blocked when it tries to hold sock
> lock with lock_sock() in sk_wait_event() because the socket lock is
> taken in socket receive path. After socket lock is released, the process
> will continue being run. But at that moment, sk->sk_err has been set
> with an error code. So, this is no problem for us.
> 
>>
>> then we should return '-EINTR'?
> 
> 
> 
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
> 
> 



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [net] tipc: fix wrong connect() return code

2019-12-26 Thread Tuong Lien Tong
Hi Ying,

Thinking more about this...
I suppose we can even remove the function and utilize the generic macro
'tipc_wait_for_cond()' but with the new condition, that is much simpler and
also saves some footprint.
The 'tipc_sk_connected()' condition is really what we should expect in this
case instead.
So, in the case of 'TIPC_DISCONNECTING', regardless of whether we have a
'sk->sk_err', it will return that error or '-ETIMEOUT' but never '0'.
I will send the patch v2 for your review.

BR/Tuong

-Original Message-
From: Tuong Lien Tong  
Sent: Thursday, December 26, 2019 5:39 PM
To: 'Ying Xue' ;
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com;
ma...@donjonn.com
Subject: Re: [tipc-discussion] [net] tipc: fix wrong connect() return code

What about this?

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 6ebd809ef207..04f035bcc272 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2446,7 +2446,7 @@ static int tipc_wait_for_connect(struct socket *sock,
long *timeo_p)
done = sk_wait_event(sk, timeo_p,
 sk->sk_state != TIPC_CONNECTING,
);
remove_wait_queue(sk_sleep(sk), );
-   } while (!done);
+   } while (!tipc_sk_connected(sk));
return 0;
 }

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Thursday, December 26, 2019 2:55 PM
To: Tuong Lien Tong ;
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com;
ma...@donjonn.com
Subject: Re: [net] tipc: fix wrong connect() return code

On 12/26/19 3:16 PM, Tuong Lien Tong wrote:
> Sorry, my mistake! I thought it was "while (!done || !sk->sk_err)"...
> But, this is really confusing and one might ask why we continue the loop
while the socket has encountered an error (sk-> sk_err! = 0)???

This is totally reasonable because it can make code simpler.
[Tuong]: but a performance hit since it almost looks like the sk_err has to
be read twice?

> Moreover, it's not necessarily the "sk_err" will be the only case,
> For example:
>   sk->sk_state != TIPC_CONNECTING but sk->sk_err = 0 (somehow) and a
pending/interrupt signal

If sk->sk_state is changed from TIPC_CONNECTING to TIPC_ESTABLISHED,
sk->sk_err should be kept 0, but if sk->sk_state is changed to other
states, sk->sk_err must be set with a proper error code in socket
receive path, otherwise, it's a bug.

However, there might be one below race condition:

sk->sk_state is set to TIPC_DISCONNECTING, but before sk->sk_err is set
with an error code, the process who calls connect() is woken up by one
single. Even so, the process is still blocked when it tries to hold sock
lock with lock_sock() in sk_wait_event() because the socket lock is
taken in socket receive path. After socket lock is released, the process
will continue being run. But at that moment, sk->sk_err has been set
with an error code. So, this is no problem for us.

> 
> then we should return '-EINTR'?



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [net] tipc: fix wrong connect() return code

2019-12-26 Thread Tuong Lien Tong
What about this?

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 6ebd809ef207..04f035bcc272 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2446,7 +2446,7 @@ static int tipc_wait_for_connect(struct socket *sock, 
long *timeo_p)
done = sk_wait_event(sk, timeo_p,
 sk->sk_state != TIPC_CONNECTING, );
remove_wait_queue(sk_sleep(sk), );
-   } while (!done);
+   } while (!tipc_sk_connected(sk));
return 0;
 }

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Thursday, December 26, 2019 2:55 PM
To: Tuong Lien Tong ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com
Subject: Re: [net] tipc: fix wrong connect() return code

On 12/26/19 3:16 PM, Tuong Lien Tong wrote:
> Sorry, my mistake! I thought it was "while (!done || !sk->sk_err)"...
> But, this is really confusing and one might ask why we continue the loop 
> while the socket has encountered an error (sk-> sk_err! = 0)???

This is totally reasonable because it can make code simpler.
[Tuong]: but a performance hit since it almost looks like the sk_err has to be 
read twice?

> Moreover, it's not necessarily the "sk_err" will be the only case,
> For example:
>   sk->sk_state != TIPC_CONNECTING but sk->sk_err = 0 (somehow) and a 
> pending/interrupt signal

If sk->sk_state is changed from TIPC_CONNECTING to TIPC_ESTABLISHED,
sk->sk_err should be kept 0, but if sk->sk_state is changed to other
states, sk->sk_err must be set with a proper error code in socket
receive path, otherwise, it's a bug.

However, there might be one below race condition:

sk->sk_state is set to TIPC_DISCONNECTING, but before sk->sk_err is set
with an error code, the process who calls connect() is woken up by one
single. Even so, the process is still blocked when it tries to hold sock
lock with lock_sock() in sk_wait_event() because the socket lock is
taken in socket receive path. After socket lock is released, the process
will continue being run. But at that moment, sk->sk_err has been set
with an error code. So, this is no problem for us.

> 
> then we should return '-EINTR'?



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [net] tipc: fix wrong connect() return code

2019-12-25 Thread Tuong Lien Tong
Hi Ying,

Sorry, my mistake! I thought it was "while (!done || !sk->sk_err)"...
But, this is really confusing and one might ask why we continue the loop while 
the socket has encountered an error (sk-> sk_err! = 0)???
Moreover, it's not necessarily the "sk_err" will be the only case,
For example:
sk->sk_state != TIPC_CONNECTING but sk->sk_err = 0 (somehow) and a 
pending/interrupt signal

then we should return '-EINTR'?

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Thursday, December 26, 2019 1:17 PM
To: Tuong Lien Tong ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com
Subject: Re: [net] tipc: fix wrong connect() return code

On 12/26/19 9:04 AM, Tuong Lien Tong wrote:
> Hi Ying,
> 
> Actually, the 'done' flag has been set in the particular case (since the
> 'sk->sk_state' changed to 'TIPC_DISCONNECTING' after receiving a rejection
> of its SYN from server...) and what we want to achieve is the error code
> from the 'sock_error(sk)' to be returned to user correctly.
> For your code, there is no difference, the function would still return '0'
> for the said case.

--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2435,7 +2435,7 @@ static int tipc_wait_for_connect(struct socket *sock,
long *timeo_p)
done = sk_wait_event(sk, timeo_p,
 sk->sk_state != TIPC_CONNECTING,
);
remove_wait_queue(sk_sleep(sk), );
-   } while (!done);
+   } while (!done || sk->sk_err);
return 0;
 }

Sorry, in my understanding, if this case you mentioned above occurs,
"done" will be really set to 1, which means "while loop" will exit and
then 0 will be returned to 0. This is our current problem.

But, if we add "sk->sk_err" as another while statement condition, the
while loop will not exit because "sk->sk_err" is not 0. As a
consequence, in next loop, sock error code will be returned to user
because sock_error() is not 0.

> I considered an alternative that:
> 
> done = sk_wait_event(sk, timeo_p,
>  sk->sk_state != TIPC_CONNECTING,
> );
> remove_wait_queue(sk_sleep(sk), );
> } while (!done);
> -   return 0;
> +  return sock_err(sk);
> 
> but this could get more concerns or we should check the socket state at the
> return e.g. 
> return (sk->sk_state != TIPC_ESTABLISHED) ? sock_err(sk) : 0;
> 
> and the fact is that we have this code already in the while statement, so I
> have decided to go with the code below.



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [net] tipc: fix wrong connect() return code

2019-12-25 Thread Tuong Lien Tong
Hi Ying,

Actually, the 'done' flag has been set in the particular case (since the
'sk->sk_state' changed to 'TIPC_DISCONNECTING' after receiving a rejection
of its SYN from server...) and what we want to achieve is the error code
from the 'sock_error(sk)' to be returned to user correctly.
For your code, there is no difference, the function would still return '0'
for the said case.
I considered an alternative that:

done = sk_wait_event(sk, timeo_p,
 sk->sk_state != TIPC_CONNECTING,
);
remove_wait_queue(sk_sleep(sk), );
} while (!done);
-   return 0;
+  return sock_err(sk);

but this could get more concerns or we should check the socket state at the
return e.g. 
return (sk->sk_state != TIPC_ESTABLISHED) ? sock_err(sk) : 0;

and the fact is that we have this code already in the while statement, so I
have decided to go with the code below.

BR/Tuong

-Original Message-
From: Xue, Ying  
Sent: Wednesday, December 25, 2019 8:48 PM
To: Tuong Lien ;
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com;
ma...@donjonn.com
Subject: RE: [net] tipc: fix wrong connect() return code

Probably below change is more easily understandable:

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 6552f98..358cc55 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2435,7 +2435,7 @@ static int tipc_wait_for_connect(struct socket *sock,
long *timeo_p)
done = sk_wait_event(sk, timeo_p,
 sk->sk_state != TIPC_CONNECTING,
);
remove_wait_queue(sk_sleep(sk), );
-   } while (!done);
+   } while (!done || sk->sk_err);
return 0;
 }

-Original Message-
From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Tuesday, December 24, 2019 4:06 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com;
ma...@donjonn.com; Xue, Ying
Subject: [net] tipc: fix wrong connect() return code

The current 'tipc_wait_for_connect()' function makes a loop and waits
for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the
connecting has done. However, when the condition is met, it always
returns '0' even in the case the connecting was actually failed (e.g.
refused because the server socket has closed...) and the socket state
was set to 'TIPC_DISCONNECTING'.
This results in a wrong return code for the 'connect()' call from user,
making it believe that the connection is established and goes ahead
with more actions e.g. building & sending a message but then finally
gets an unexpected result (e.g. '-EPIPE').

This commit fixes the issue by returning the corresponding error code
if any when the wait process is waken up.

Signed-off-by: Tuong Lien 
---
 net/tipc/socket.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 8b1daf3634b0..2e5faf89ef80 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2428,7 +2428,7 @@ static int tipc_wait_for_connect(struct socket *sock,
long *timeo_p)
 {
DEFINE_WAIT_FUNC(wait, woken_wake_function);
struct sock *sk = sock->sk;
-   int done;
+   int done = 0;
 
do {
int err = sock_error(sk);
@@ -2438,12 +2438,14 @@ static int tipc_wait_for_connect(struct socket
*sock, long *timeo_p)
return -ETIMEDOUT;
if (signal_pending(current))
return sock_intr_errno(*timeo_p);
+   if (done)
+   return 0;
 
add_wait_queue(sk_sleep(sk), );
done = sk_wait_event(sk, timeo_p,
 sk->sk_state != TIPC_CONNECTING,
);
remove_wait_queue(sk_sleep(sk), );
-   } while (!done);
+   } while (1);
return 0;
 }
 
-- 
2.13.7




___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH RFC] tipc: add automatic rekeying for TIPC encryption

2019-12-24 Thread Tuong Lien Tong
Hi Jon,

Please see more feedback inline.

BR/Tuong

-Original Message-
From: Jon Maloy  
Sent: Monday, December 23, 2019 10:20 PM
To: Tuong Tong Lien ; tipc-...@dektech.com.au; 
tipc-discussion@lists.sourceforge.net
Cc: 'Xin Long' ; 'Xin Long' ; 'Ying Xue' 

Subject: RE: [PATCH RFC] tipc: add automatic rekeying for TIPC encryption



> -Original Message-
> From: Tuong Lien Tong 
> Sent: 23-Dec-19 06:49
> To: Jon Maloy ; tipc-...@dektech.com.au; tipc-
> discuss...@lists.sourceforge.net
> Cc: 'Xin Long' ; 'Xin Long' ; 'Ying 
> Xue'
> 
> Subject: RE: [PATCH RFC] tipc: add automatic rekeying for TIPC encryption
> 
> Hi Jon,
> 
> Please see my comments below.
> 
> BR/Tuong
> 
> -Original Message-
> From: Jon Maloy 
> Sent: Saturday, December 21, 2019 11:03 PM
> To: Tuong Tong Lien ; tipc-...@dektech.com.au; 
> tipc-
> discuss...@lists.sourceforge.net
> Cc: Xin Long ; Xin Long ; 'Ying Xue' 
> 
> Subject: RE: [PATCH RFC] tipc: add automatic rekeying for TIPC encryption
> 
> Adding tipc-discusson to the recipient list of this disacussion.
> 
> ///jon
> 
> > -Original Message-
> > From: Jon Maloy
> > Sent: 21-Dec-19 10:56
> > To: Tuong Lien ; tipc-...@dektech.com.au
> > Subject: RE: [PATCH RFC] tipc: add automatic rekeying for TIPC encryption
> >
> > Hi Tuong,
> > I am getting more and more ambivalent about this.
> > Before introducing this we must weigh the pro and cons correctly:
> >
> > Pros:
> > - It will give less time for a sniffer to crack the key, since we change it 
> > often.
> > - If a sniffer cracks a key, he only gets access to messages from the node 
> > owning that key, not the whole
> > cluster.
> 
> [Tuong]: - implicit authentication (since the initial key for a node or the 
> cluster is set by user...).
> 
> >
> > Cons:
> > - Larger code footprint.
> > - No forward security. If a sniffer gets hold of a node key, he can listen 
> > forever, as you have already
> noted.
> [Tuong]: Yes and this can be vital from a security perspective!
> 
> > - New nodes cannot be added to the cluster unless there is a way for the 
> > user land framework to read the
> > current key on each node and distribute to the new node.
> > - What happens when a node restarts, and forgets about all current keys, 
> > both its own and those of its
> > peers.
> >
> > The last two bullets are showstoppers,  in my view. We must find a way to 
> > add/re-insert nodes as easily as
> > we do now, at least.
> [Tuong]: The last two bullets are actually the same, so the trouble lies here 
> as to how a "new" VM joins the
> cluster.
> 
> >
> > This means we must come back to the solution with a "master"  cluster key.
> > I suspect a typical user will just aim for adding the key or keys to
> > the cluster configuration file, and at a maximum update that key at regular 
> > intervals.
> >
> > So, maybe, apart from the 'current' node key we also keep a 'cluster'  
> > master key that can always be used
> > for attaching new nodes, but allow for this to be updated via the tipc tool.
> 
> [Tuong]: This will require changes/updates to the current key handling 
> mechanism since we support
> automatic key switching but at a time there is only one active key for TX & 
> RX and it never "remembers" or
> hold any keys for others purpose...
> 
> I have an idea that we can turn it into a simple "rule" here: when a new VM 
> joins the cluster, a new key will
> have to be generated & set for the entire cluster incl. the new VM(s) by user 
> (or the framework which had
> set the initial key in advance...). The automatic key switching will then 
> take over to make the new key active
> and the new VM(s) can join the cluster as well, this is what we have already 
> supported... Later on, the
> automatic rekeying will start to work for all the VMs in the cluster as by 
> this patch...

Not a bad idea. Basically, we replace all keys with a "cluster key" every time 
a new node is added, if I understand you correctly, and after a short while 
this key will become invalid.
So, we are back at an improved version of the cluster key option.
[Tuong]: Yes, this is what I want to say, actually we have supported the things 
so there's almost no change.
This also partially solves the "forward secrecy" issue since a completely 
new/independent key is entered that a sniffer has no way to derive it from the 
previous ones. Let's say this is both authentication & forward secrecy achieved 
simply through user (or the framework)...

Another approach is to add

Re: [tipc-discussion] [PATCH RFC] tipc: add automatic rekeying for TIPC encryption

2019-12-23 Thread Tuong Lien Tong
Hi Jon,

Please see my comments below.

BR/Tuong

-Original Message-
From: Jon Maloy  
Sent: Saturday, December 21, 2019 11:03 PM
To: Tuong Tong Lien ; tipc-...@dektech.com.au; 
tipc-discussion@lists.sourceforge.net
Cc: Xin Long ; Xin Long ; 'Ying Xue' 

Subject: RE: [PATCH RFC] tipc: add automatic rekeying for TIPC encryption

Adding tipc-discusson to the recipient list of this disacussion.

///jon

> -Original Message-
> From: Jon Maloy
> Sent: 21-Dec-19 10:56
> To: Tuong Lien ; tipc-...@dektech.com.au
> Subject: RE: [PATCH RFC] tipc: add automatic rekeying for TIPC encryption
> 
> Hi Tuong,
> I am getting more and more ambivalent about this.
> Before introducing this we must weigh the pro and cons correctly:
> 
> Pros:
> - It will give less time for a sniffer to crack the key, since we change it 
> often.
> - If a sniffer cracks a key, he only gets access to messages from the node 
> owning that key, not the whole
> cluster.

[Tuong]: - implicit authentication (since the initial key for a node or the 
cluster is set by user...).

> 
> Cons:
> - Larger code footprint.
> - No forward security. If a sniffer gets hold of a node key, he can listen 
> forever, as you have already noted.
[Tuong]: Yes and this can be vital from a security perspective!

> - New nodes cannot be added to the cluster unless there is a way for the user 
> land framework to read the
> current key on each node and distribute to the new node.
> - What happens when a node restarts, and forgets about all current keys, both 
> its own and those of its
> peers.
> 
> The last two bullets are showstoppers,  in my view. We must find a way to 
> add/re-insert nodes as easily as
> we do now, at least.
[Tuong]: The last two bullets are actually the same, so the trouble lies here 
as to how a "new" VM joins the cluster.

> 
> This means we must come back to the solution with a "master"  cluster key. 
> I suspect a typical user will just aim for adding the key or keys to
> the cluster configuration file, and at a maximum update that key at regular 
> intervals.
> 
> So, maybe, apart from the 'current' node key we also keep a 'cluster'  master 
> key that can always be used
> for attaching new nodes, but allow for this to be updated via the tipc tool.

[Tuong]: This will require changes/updates to the current key handling 
mechanism since we support automatic key switching but at a time there is only 
one active key for TX & RX and it never "remembers" or hold any keys for others 
purpose...

I have an idea that we can turn it into a simple "rule" here: when a new VM 
joins the cluster, a new key will have to be generated & set for the entire 
cluster incl. the new VM(s) by user (or the framework which had set the initial 
key in advance...). The automatic key switching will then take over to make the 
new key active and the new VM(s) can join the cluster as well, this is what we 
have already supported... Later on, the automatic rekeying will start to work 
for all the VMs in the cluster as by this patch...
In this way, we can also solve the authentication issue which could happen in 
the 2nd model (- where a key agreement protocol like DH or ECDH has to be used 
to establish a secure communication channel first...).
What do you think?

> The user may then, of he wants, replace/update this key at regular intervals.
> At the same time this key must be reserved for new/restarting nodes only, and 
> not be permitted for use by
> existing nodes which already are using 'current' keys.
> 
> This would at least make it somewhat more acceptable for the user than the 
> current approach.
> 
> Regards
> ///jon
> 
> 
> 
> 
> > -Original Message-
> > From: Jon Maloy
> > Sent: 18-Dec-19 19:33
> > To: Tuong Lien ; tipc-...@dektech.com.au
> > Subject: RE: [PATCH RFC] tipc: add automatic rekeying for TIPC encryption
> >
> > I have a couple comments below, but in general it looks good.
> > So, after some hesitation I suggest you send this to tipc-discussion for 
> > further review.
> >
> > ///jon
> >
> >
> > > -Original Message-
> > > From: Tuong Lien 
> > > Sent: 10-Dec-19 06:48
> > > To: Jon Maloy ; tipc-...@dektech.com.au
> > > Subject: [PATCH RFC] tipc: add automatic rekeying for TIPC encryption
> > >
> > > This commit adds functionality for TIPC encryption to automatically
> > > generate a new symmetric key and attach as new TX key for the own node.
> > > The new key will also be distributed to peer nodes and set as RX keys,
> > > i.e. the per-node key mode. This process of rekeying will be repeated,
> > > for example every ~ 10 minutes (from the time a certain key becomes
> > > active), so new session keys will be created and applied regularly to
> > > the cluster.
> > >
> > > Since the key exchange needs a secure communication channel in advance,
> > > the commit simplifies this by only starting with an initial key (either
> > > in cluster or per-node key mode) which is set by user first (e.g. via
> > > the "tipc node set 

Re: [tipc-discussion] [PATCH net/tipc] Replace rcu_swap_protected() with rcu_replace_pointer()

2019-12-10 Thread Tuong Lien Tong
Hi Ying, Paul,

Please see my comments inline. Thanks!

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Wednesday, December 11, 2019 8:32 AM
To: paul...@kernel.org
Cc: net...@vger.kernel.org; linux-ker...@vger.kernel.org; mi...@kernel.org;
tipc-discussion@lists.sourceforge.net; kernel-t...@fb.com;
torva...@linux-foundation.org; da...@davemloft.net
Subject: Re: [tipc-discussion] [PATCH net/tipc] Replace rcu_swap_protected()
with rcu_replace_pointer()

On 12/11/19 6:38 AM, Paul E. McKenney wrote:
> commit 4ee8e2c68b076867b7a5af82a38010fffcab611c
> Author: Paul E. McKenney 
> Date:   Mon Dec 9 19:13:45 2019 -0800
> 
> net/tipc: Replace rcu_swap_protected() with rcu_replace_pointer()
> 
> This commit replaces the use of rcu_swap_protected() with the more
> intuitively appealing rcu_replace_pointer() as a step towards removing
> rcu_swap_protected().
> 
> Link:
https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4g
g...@mail.gmail.com/
> Reported-by: Linus Torvalds 
> Reported-by: kbuild test robot 
> Signed-off-by: Paul E. McKenney 
> Cc: Jon Maloy 
> Cc: Ying Xue 
> Cc: "David S. Miller" 
> Cc: 
> Cc: 
> 

Acked-by: Ying Xue 

> diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c
> index 990a872..978d2db 100644
> --- a/net/tipc/crypto.c
> +++ b/net/tipc/crypto.c
> @@ -257,9 +257,6 @@ static char *tipc_key_change_dump(struct tipc_key old,
struct tipc_key new,
>  #define tipc_aead_rcu_ptr(rcu_ptr, lock) \
>   rcu_dereference_protected((rcu_ptr), lockdep_is_held(lock))
>  
> -#define tipc_aead_rcu_swap(rcu_ptr, ptr, lock)
\
> - rcu_swap_protected((rcu_ptr), (ptr), lockdep_is_held(lock))
> -
>  #define tipc_aead_rcu_replace(rcu_ptr, ptr, lock)\
>  do { \
>   typeof(rcu_ptr) __tmp = rcu_dereference_protected((rcu_ptr),\
> @@ -1189,7 +1186,7 @@ static bool tipc_crypto_key_try_align(struct
tipc_crypto *rx, u8 new_pending)
>  
>   /* Move passive key if any */
>   if (key.passive) {
> - tipc_aead_rcu_swap(rx->aead[key.passive], tmp2, >lock);
> + tmp2 = rcu_replace_pointer(rx->aead[key.passive], tmp2,
>lock);
The 3rd parameter should be the lockdep condition checking instead of the
spinlock's pointer i.e. "lockdep_is_held(>lock)"?
That's why I'd prefer to use the 'tipc_aead_rcu_swap ()' macro, which is
clear & concise at least for the context here. It might be re-used later as
well...

>   x = (key.passive - key.pending + new_pending) % KEY_MAX;
>   new_passive = (x <= 0) ? x + KEY_MAX : x;
>   }
> 


___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [net] tipc: fix memory leak in socket streaming

2019-11-25 Thread Tuong Lien Tong
Hi Tung,

Okie, got it. I lost track when your patch was sent in August... ☹.
Anyway, I think you can consider my commit message which highlights this as a 
memory leak issue seriously.

BR/Tuong

-Original Message-
From: tung quang nguyen  
Sent: Monday, November 25, 2019 3:23 PM
To: 'Tuong Lien' ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; ying@windriver.com
Subject: RE: [tipc-discussion] [net] tipc: fix memory leak in socket streaming

Hi Tuong,

I fixed it in this commit "[tipc-discussion] [net v1 2/3] tipc: fix wrong
socket reference counter after tipc_sk_timeout() returns" but I have not
sent to David yet.
I intended to send it in a series of patch for fixing bugs at socket layer.

Thanks.

Best regards,
Tung Nguyen

-Original Message-
From: Tuong Lien  
Sent: Monday, November 25, 2019 3:13 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com;
ma...@donjonn.com; ying@windriver.com
Subject: [tipc-discussion] [net] tipc: fix memory leak in socket streaming

In case of stream sockets, a per-socket timer is set up for either the
SYN attempt or connection supervision mechanism. When the socket timer
expires, an appropriate action (i.e. sending SYN or PROBE message)
would be taken just in the case the socket is not being owned by user
(e.g. via the 'lock_sock()').

In the latter case, there is nothing but the timer is re-scheduled for
a short period of time (~ 50ms) to try again. The function just makes a
'return' immediately without decreasing the socket 'refcnt' which was
held in advance for the timer callback itself! The same happens if at
the next time, the socket is still busy...

As a result, the socket 'refcnt' is increased without decreasing, so
the sock object cannot be freed at all (due to 'refcnt' != 0) even when
the connection is closed and user releases all related resources.

The memory leak is hard to detect because the probe interval is set to
1 hour since the connection is established, but in the case of a SYN
attempt, that can be much more likely.

The commit fixes the bug by calling the 'sock_put()' in the case
mentioned above, then the socket 'refcnt' will be increased & decreased
correctly and the sock object can be released later.

Fixes: 0d5fcebf3c37 ("tipc: refactor tipc_sk_timeout() function")
Signed-off-by: Tuong Lien 
---
 net/tipc/socket.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index a1c8d722ca20..d67c3747d2c3 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2757,7 +2757,7 @@ static void tipc_sk_timeout(struct timer_list *t)
if (sock_owned_by_user(sk)) {
sk_reset_timer(sk, >sk_timer, jiffies + HZ / 20);
bh_unlock_sock(sk);
-   return;
+   goto exit;
}
 
if (sk->sk_state == TIPC_ESTABLISHED)
@@ -2775,6 +2775,8 @@ static void tipc_sk_timeout(struct timer_list *t)
tipc_dest_push(>cong_links, pnode, 0);
tsk->cong_link_cnt = 1;
}
+
+exit:
sock_put(sk);
 }
 
-- 
2.13.7



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion




___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [net-next v2] tipc: support in-order name publication events

2019-11-20 Thread Tuong Lien Tong
Hi David,

The fact is we still want to keep it with that explicit meaning, so make the
code easy to understand. Yes, the 'time_after32()' or another macro can give
the same result but makes no sense in this particular scenario. Otherwise,
do you like something such as:

#define publication_after(...) time_after32(...)

BR/Tuong

-Original Message-
From: David Miller  
Sent: Thursday, November 21, 2019 1:14 PM
To: tuong.t.l...@dektech.com.au
Cc: jon.ma...@ericsson.com; ma...@donjonn.com; ying@windriver.com;
net...@vger.kernel.org; tipc-discussion@lists.sourceforge.net
Subject: Re: [net-next v2] tipc: support in-order name publication events

From: Tuong Lien 
Date: Thu, 21 Nov 2019 09:53:25 +0700

> +static inline int publication_after(struct publication *pa,
> + struct publication *pb)
> +{
> + return ((int)(pb->id - pa->id) < 0);
> +}

Juse use time32_after() et al. instead of reinventing the same exact
code please.



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [iproute2] tipc: add new commands to set TIPC AEAD key

2019-11-19 Thread Tuong Lien Tong
Hi Jon/Ying,

We still have this patch (i.e. for the 'tipc node set/flush key...' commands), 
may I put your ACK on it before sending to iproute2-next?
Many thanks!

BR/Tuong

-Original Message-
From: Jon Maloy  
Sent: Wednesday, October 16, 2019 9:51 PM
To: Ying Xue ; Tuong Tong Lien 
; tipc-discussion@lists.sourceforge.net; 
ma...@donjonn.com
Subject: RE: [iproute2] tipc: add new commands to set TIPC AEAD key



> -Original Message-
> From: Ying Xue 
> Sent: 16-Oct-19 08:30
> To: Tuong Tong Lien ; tipc-
> discuss...@lists.sourceforge.net; Jon Maloy ;
> ma...@donjonn.com
> Subject: Re: [iproute2] tipc: add new commands to set TIPC AEAD key
> 
> Tt looks like we will use "tipc node" command to configure static key to TIPC
> module, right?

The key is static in the sense that TIPC itself cannot change the key. But the 
protocol ensures that keys can be replaced without any traffic disturbances.

> 
> Do we plan to support dynamic key setting? If yes, what kinds of key exchange
> protocol would we use? For example, in IPSEC, it uses IKEv2 as its key
> exchange protocol.

At the moment we assume there is an external user land framework where node 
authentication is done and where keys are generated and distributed (via TLS) 
to the nodes.
When we want to replace a key (probably at fix pre-defined intervals), the 
framework has to generate new keys and distribute/inject those to TIPC.

> 
> Will key be expired after a specific lifetime? For instance, in
> IPSEC/Raccoon2 or strongswan, they use rekey feature to provide this
> function to make security association safer.

We are considering this, so that the external framework can be kept simpler or 
even be eliminated. That would be the next step, once this series is applied.

Regards
///jon


> 
> On 10/14/19 7:36 PM, Tuong Lien wrote:
> > Two new commands are added as part of 'tipc node' command:
> >
> >  $tipc node set key KEY [algname ALGNAME] [nodeid NODEID]  $tipc node
> > flush key
> >
> > which enable user to set and remove AEAD keys in kernel TIPC.
> >
> > For the 'set key' command, the given 'nodeid' parameter decides the
> > mode to be applied to the key, particularly:
> >
> > - If NODEID is empty, the key is a 'cluster' key which will be used
> > for all message encryption/decryption from/to the node (i.e. both TX & RX).
> > The same key needs to be set in the other nodes i.e. the 'cluster key'
> > mode.
> >
> > - If NODEID is own node, the key is used for message encryption (TX)
> > from the node. Whereas, if NODEID is a peer node, the key is for
> > message decryption (RX) from that peer node.
> > This is the 'per-node-key' mode that each nodes in the cluster has its
> > specific (TX) key.
> >
> > Signed-off-by: Tuong Lien 
> > ---
> >  include/uapi/linux/tipc.h |  21 ++
> >  include/uapi/linux/tipc_netlink.h |   4 ++
> >  tipc/misc.c   |  38 +++
> >  tipc/misc.h   |   1 +
> >  tipc/node.c   | 133
> +-
> >  5 files changed, 195 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/tipc.h b/include/uapi/linux/tipc.h
> > index e16cb4e2..b118ce9b 100644
> > --- a/include/uapi/linux/tipc.h
> > +++ b/include/uapi/linux/tipc.h
> > @@ -232,6 +232,27 @@ struct tipc_sioc_nodeid_req {
> > char node_id[TIPC_NODEID_LEN];
> >  };
> >
> > +/*
> > + * TIPC Crypto, AEAD mode
> > + */
> > +#define TIPC_AEAD_MAX_ALG_NAME (32)
> > +#define TIPC_AEAD_MIN_KEYLEN   (16 + 4)
> > +#define TIPC_AEAD_MAX_KEYLEN   (32 + 4)
> > +
> > +struct tipc_aead_key {
> > +   char alg_name[TIPC_AEAD_MAX_ALG_NAME];
> > +   unsigned int keylen;/* in bytes */
> > +   char key[];
> > +};
> > +
> > +#define TIPC_AEAD_KEY_MAX_SIZE (sizeof(struct tipc_aead_key) + \
> > +   TIPC_AEAD_MAX_KEYLEN)
> > +
> > +static inline int tipc_aead_key_size(struct tipc_aead_key *key) {
> > +   return sizeof(*key) + key->keylen;
> > +}
> > +
> >  /* The macros and functions below are deprecated:
> >   */
> >
> > diff --git a/include/uapi/linux/tipc_netlink.h
> > b/include/uapi/linux/tipc_netlink.h
> > index efb958fd..6c2194ab 100644
> > --- a/include/uapi/linux/tipc_netlink.h
> > +++ b/include/uapi/linux/tipc_netlink.h
> > @@ -63,6 +63,8 @@ enum {
> > TIPC_NL_PEER_REMOVE,
> > TIPC_NL_BEARER_ADD,
> > TIPC_NL_UDP_GET_REMOTEIP,
> > +   TIPC_NL_KEY_SET,
> > +   TIPC_NL_KEY_FLUSH,
> >
> > __TIPC_NL_CMD_MAX,
> > TIPC_NL_CMD_MAX = __TIPC_NL_CMD_MAX - 1 @@ -160,6 +162,8
> @@ enum {
> > TIPC_NLA_NODE_UNSPEC,
> > TIPC_NLA_NODE_ADDR, /* u32 */
> > TIPC_NLA_NODE_UP,   /* flag */
> > +   TIPC_NLA_NODE_ID,   /* data */
> > +   TIPC_NLA_NODE_KEY,  /* data */
> >
> > __TIPC_NLA_NODE_MAX,
> > TIPC_NLA_NODE_MAX = __TIPC_NLA_NODE_MAX - 1 diff --git
> a/tipc/misc.c
> > b/tipc/misc.c index e4b1cd0c..1daf3072 100644
> > --- 

Re: [tipc-discussion] [net-next v3 1/1] tipc: introduce variable window congestion control

2019-11-18 Thread Tuong Lien Tong
Hi Jon,

@@ -1045,7 +1056,13 @@ static void tipc_link_advance_backlog(struct
tipc_link *l,
u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
u32 imp;
 
-   while (skb_queue_len(>transmq) < l->window) {
+   qlen = skb_queue_len(txq);
+   if (qlen >= cwin && (l->snd_nxt - buf_seqno(skb_peek(txq)) == qlen))
{

[Tuong]: now it should be: 'qlen + backlogq_len >= cwin' instead since we
might have released some packets from the transmq just before calling this
function...?

+   add = l->cong_acks++ % 32 ? 0 : 1;
+   cwin = min_t(u16, cwin + add, l->max_win);
+   l->window = cwin;
+   }
+   while (skb_queue_len(txq) < cwin) {

BR/Tuong

-Original Message-
From: Jon Maloy  
Sent: Tuesday, November 19, 2019 6:33 AM
To: Jon Maloy ; Jon Maloy 
Cc: mohan.krishna.ghanta.krishnamur...@ericsson.com;
parthasarathy.bhuvara...@gmail.com; tung.q.ngu...@dektech.com.au;
hoang.h...@dektech.com.au; tuong.t.l...@dektech.com.au;
gordan.mihalje...@dektech.com.au; ying@windriver.com;
tipc-discussion@lists.sourceforge.net
Subject: [net-next v3 1/1] tipc: introduce variable window congestion
control

We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, and can best be
descibed as working in permanent "congestion avoidance" mode, within
strict limits.

- We introduce hard lower and upper window limits per link, still
  different and configurable per bearer type.

- Next, we let a link start at the minimum window, and then slowly
  increment it for each 32 received non-duplicate ACK. This goes on
  until it either reaches the upper limit, or until it receives a
  NACK message.

- For each non-duplicate NACK received, we let the window decrease by
  intervals of 1/2 of the current window, but not below the minimum
  window.

The change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.

This algorithm seems to give a ~25% throughput improvement for large
messages, while it has no effect on throughput for small messages.

Suggested-by: Xin Long 
Acked-by: Xin Long 
Signed-off-by: Jon Maloy 

---
v2: - Moved window increment in tipc_advance_backlogq() to before
  the transfer loop, as suggested Tuong.
- Introduced logic for incrementing the window even for the
  broadcast send link, also suggested by Tuong.
v3: - Rebased to latest net-next
---
 net/tipc/bcast.c | 11 
 net/tipc/bearer.c| 11 
 net/tipc/bearer.h|  6 +++--
 net/tipc/eth_media.c |  6 -
 net/tipc/ib_media.c  |  5 +++-
 net/tipc/link.c  | 76
++--
 net/tipc/link.h  |  9 ---
 net/tipc/node.c  | 13 +
 net/tipc/udp_media.c |  3 ++-
 9 files changed, 90 insertions(+), 50 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index f41096a..84da317 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -562,18 +562,18 @@ int tipc_bclink_reset_stats(struct net *net)
return 0;
 }
 
-static int tipc_bc_link_set_queue_limits(struct net *net, u32 limit)
+static int tipc_bc_link_set_queue_limits(struct net *net, u32 max_win)
 {
struct tipc_link *l = tipc_bc_sndlink(net);
 
if (!l)
return -ENOPROTOOPT;
-   if (limit < BCLINK_WIN_MIN)
-   limit = BCLINK_WIN_MIN;
-   if (limit > TIPC_MAX_LINK_WIN)
+   if (max_win < BCLINK_WIN_MIN)
+   max_win = BCLINK_WIN_MIN;
+   if (max_win > TIPC_MAX_LINK_WIN)
return -EINVAL;
tipc_bcast_lock(net);
-   tipc_link_set_queue_limits(l, limit);
+   tipc_link_set_queue_limits(l, BCLINK_WIN_MIN, max_win);
tipc_bcast_unlock(net);
return 0;
 }
@@ -683,6 +683,7 @@ int tipc_bcast_init(struct net *net)
if (!tipc_link_bc_create(net, 0, 0,
 FB_MTU,
 BCLINK_WIN_DEFAULT,
+BCLINK_WIN_DEFAULT,
 0,
 >inputq,
 NULL,
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index d7ec26b..34ca7b7 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -311,7 +311,8 @@ static int tipc_enable_bearer(struct net *net, const
char *name,
 
b->identity = bearer_id;
b->tolerance = m->tolerance;
-   b->window = m->window;
+   b->min_win = m->min_win;
+   b->max_win = m->max_win;
b->domain = disc_domain;
b->net_plane = bearer_id + 'A';
b->priority = prio;
@@ -796,7 +797,7 @@ static int __tipc_nl_add_bearer(struct tipc_nl_msg *msg,
goto prop_msg_full;
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_TOL, bearer->tolerance))
goto 

Re: [tipc-discussion] [bug report] tipc: introduce TIPC encryption & authentication

2019-11-13 Thread Tuong Lien Tong
Thanks Dan,
This is however expected, the function will be shortly returned at line 1727
i.e. the '-ENOKEY' case, so never be dereferenced inside the
tipc_crypto_rcv_complete()!

BR/Tuong

-Original Message-
From: Dan Carpenter  
Sent: Thursday, November 14, 2019 1:33 AM
To: tuong.t.l...@dektech.com.au
Cc: tipc-discussion@lists.sourceforge.net
Subject: [bug report] tipc: introduce TIPC encryption & authentication

Hello Tuong Lien,

This is a semi-automatic email about new static checker warnings.

The patch fc1b6d6de220: "tipc: introduce TIPC encryption & 
authentication" from Nov 8, 2019, leads to the following Smatch 
complaint:

net/tipc/crypto.c:1734 tipc_crypto_rcv()
 error: we previously assumed 'aead' could be null (see line 1697)

net/tipc/crypto.c
  1696  aead = tipc_crypto_key_pick_tx(tx, rx, *skb);
  1697  if (aead)
  1698  goto decrypt;
  1699  goto exit;
^^
"aead" is NULL here.

  1700  
  1701  decrypt:
  1702  rcu_read_lock();
  1703  if (!aead)
  1704  aead = tipc_aead_get(rx->aead[tx_key]);
  1705  rc = tipc_aead_decrypt(net, aead, *skb, b);
  1706  rcu_read_unlock();
  1707  
  1708  exit:
  1709  stats = ((rx) ?: tx)->stats;
  1710  switch (rc) {
  1711  case 0:
  1712  this_cpu_inc(stats->stat[STAT_OK]);
  1713  break;
  1714  case -EINPROGRESS:
  1715  case -EBUSY:
  1716  this_cpu_inc(stats->stat[STAT_ASYNC]);
  1717  *skb = NULL;
  1718  return rc;
  1719  default:
  1720  this_cpu_inc(stats->stat[STAT_NOK]);
  1721  if (rc == -ENOKEY) {
  1722  kfree_skb(*skb);
  1723  *skb = NULL;
  1724  if (rx)
  1725  tipc_node_put(rx->node);
  1726  this_cpu_inc(stats->stat[STAT_NOKEYS]);
  1727  return rc;
  1728  } else if (rc == -EBADMSG) {
  1729  this_cpu_inc(stats->stat[STAT_BADMSGS]);
  1730  }
  1731  break;
  1732  }
  1733  
  1734  tipc_crypto_rcv_complete(net, aead, b, skb, rc);
  
Dereferenced inside function.

  1735  return rc;
  1736  }

regards,
dan carpenter



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [net-next 1/3] tipc: introduce variable window congestion control

2019-11-08 Thread Tuong Lien Tong
Hi Jon,

I just had a look into the patch, a few concerns:
1) Do we apply the algorithm to the broadcast sender link as well? If so, I
guess we will have to decrease the link window at the bc_retrans()...?
2) Why don't we try to increase the link window just prior to the backlog
advancing (its criteria might need to change a bit...)? Then, we can just go
ahead at the link_xmit() (i.e. no worry about a gap between the two
queues...), so even xmit more packets somehow (i.e. no need to put into the
backlogq and wait for next chances...)?

BR/Tuong

-Original Message-
From: Jon Maloy  
Sent: Tuesday, November 5, 2019 1:39 AM
To: Jon Maloy ; Jon Maloy 
Cc: mohan.krishna.ghanta.krishnamur...@ericsson.com;
parthasarathy.bhuvara...@gmail.com; tung.q.ngu...@dektech.com.au;
hoang.h...@dektech.com.au; tuong.t.l...@dektech.com.au;
gordan.mihalje...@dektech.com.au; ying@windriver.com;
tipc-discussion@lists.sourceforge.net
Subject: [net-next 1/3] tipc: introduce variable window congestion control

We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, and can best be
descibed as working in permanent "congestion avoidance" mode, within
strict limits.

- We introduce hard lower and upper window limits per link, still
  different and configurable per bearer type.

- Next, we let a link start at the minimum window, and then slowly
  increment it for each 32 received non-duplicate ACK. This goes on
  until it either reaches the upper limit, or until it receives a
  NACK message.

- For each non-duplicate NACK received, we let the window decrease by
  intervals of 1/2 of the current window, but not below the minimum
  window.

The change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window size for the UDP bearer.

This algorithm seems to give a ~25% throughput improvement for large
messages, while it has no effect on small message throughput.

Suggested-by: Xin Long 
Signed-off-by: Jon Maloy 
---
 net/tipc/bcast.c | 11 +
 net/tipc/bearer.c| 11 +
 net/tipc/bearer.h|  6 +++--
 net/tipc/eth_media.c |  6 -
 net/tipc/ib_media.c  |  5 +++-
 net/tipc/link.c  | 70
++--
 net/tipc/link.h  |  9 ---
 net/tipc/node.c  | 13 ++
 net/tipc/udp_media.c |  3 ++-
 9 files changed, 86 insertions(+), 48 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 6ef1abd..12fde9a 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -562,18 +562,18 @@ int tipc_bclink_reset_stats(struct net *net)
return 0;
 }
 
-static int tipc_bc_link_set_queue_limits(struct net *net, u32 limit)
+static int tipc_bc_link_set_queue_limits(struct net *net, u32 max_win)
 {
struct tipc_link *l = tipc_bc_sndlink(net);
 
if (!l)
return -ENOPROTOOPT;
-   if (limit < BCLINK_WIN_MIN)
-   limit = BCLINK_WIN_MIN;
-   if (limit > TIPC_MAX_LINK_WIN)
+   if (max_win < BCLINK_WIN_MIN)
+   max_win = BCLINK_WIN_MIN;
+   if (max_win > TIPC_MAX_LINK_WIN)
return -EINVAL;
tipc_bcast_lock(net);
-   tipc_link_set_queue_limits(l, limit);
+   tipc_link_set_queue_limits(l, BCLINK_WIN_MIN, max_win);
tipc_bcast_unlock(net);
return 0;
 }
@@ -683,6 +683,7 @@ int tipc_bcast_init(struct net *net)
if (!tipc_link_bc_create(net, 0, 0,
 FB_MTU,
 BCLINK_WIN_DEFAULT,
+BCLINK_WIN_DEFAULT,
 0,
 >inputq,
 NULL,
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 0214aa1..f994961 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -310,7 +310,8 @@ static int tipc_enable_bearer(struct net *net, const
char *name,
 
b->identity = bearer_id;
b->tolerance = m->tolerance;
-   b->window = m->window;
+   b->min_win = m->min_win;
+   b->max_win = m->max_win;
b->domain = disc_domain;
b->net_plane = bearer_id + 'A';
b->priority = prio;
@@ -765,7 +766,7 @@ static int __tipc_nl_add_bearer(struct tipc_nl_msg *msg,
goto prop_msg_full;
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_TOL, bearer->tolerance))
goto prop_msg_full;
-   if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bearer->window))
+   if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bearer->max_win))
goto prop_msg_full;
if (bearer->media->type_id == TIPC_MEDIA_TYPE_UDP)
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, bearer->mtu))
@@ -1057,7 +1058,7 @@ int __tipc_nl_bearer_set(struct sk_buff *skb, struct
genl_info *info)
if (props[TIPC_NLA_PROP_PRIO])
b->priority =

Re: [tipc-discussion] [net-next 0/5] TIPC encryption

2019-11-07 Thread Tuong Lien Tong
You are right, David.
I am going to resend the v2 series with an update for it.

Thanks/Tuong

-Original Message-
From: David Miller  
Sent: Friday, November 8, 2019 11:07 AM
To: tuong.t.l...@dektech.com.au
Cc: jon.ma...@ericsson.com; ma...@donjonn.com; ying@windriver.com;
net...@vger.kernel.org; tipc-discussion@lists.sourceforge.net
Subject: Re: [net-next 0/5] TIPC encryption

From: Tuong Lien 
Date: Fri,  8 Nov 2019 08:42:08 +0700

> This series provides TIPC encryption feature, kernel part. There will be
> another one in the 'iproute2/tipc' for user space to set key.

If gcm(aes) is the only algorithm you accept, you will need to express
this dependency in the Kconfig file.  Otherwise it is pointless to
turn on the TIPC crypto Kconfig option.



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH v2 4/5] tipc: introduce TIPC encryption & authentication

2019-11-06 Thread Tuong Lien Tong
Hi Ying,

Ok, I will remove that "ifdef ... else", I don't like it either, just worried 
about a compiler warning but I have tested & it is ok.
I think we can pass the node pointer to the "__dnode" anyway (don't need a 
NULL), so can remove the other "ifdef ... else" at every function calls (total 
= 8), do you agree?

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Wednesday, November 6, 2019 3:42 PM
To: Tuong Lien ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com
Subject: Re: [PATCH v2 4/5] tipc: introduce TIPC encryption & authentication

On 11/5/19 6:50 PM, Tuong Lien wrote:
>  void tipc_bearer_xmit(struct net *net, u32 bearer_id,
> struct sk_buff_head *xmitq,
> -   struct tipc_media_addr *dst)
> +#ifdef CONFIG_TIPC_CRYPTO
> +   struct tipc_media_addr *dst,
> +   struct tipc_node *__dnode
> +#else
> +   struct tipc_media_addr *dst
> +#endif
> +   )

To make code simpler, I think we can directly add one new parameter (ie,
__dnode) no matter whether CONFIG_TIPC_CRYPTO is enabled or not.

When CONFIG_TIPC_CRYPTO is disabled, we just pass NULL to "__dnode"
parameter. As a consequence, we don't need to use CONFIG_TIPC_CRYPTO
config to separate the two different situations of whether
CONFIG_TIPC_CRYPTO is enabled or not whenever tipc_bearer_xmit() is called.



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH RFC 0/5] TIPC encryption

2019-11-04 Thread Tuong Lien Tong
Hi Ying,

Thanks a lot for reviewing the series!
Your idea of a new kernel option is fine, but I'm not sure what its goal is. 
The new code is already "disabled" by default unless there's a key set by user, 
so it's generally still under user's control... The advantage I can see is the 
module's size but it is not that much (compared to the whole kernel). On the 
other hand, we will need to custom the kernel to get the feature on and some 
additional code for the "ifdef...else..." instructions. Do we really need the 
option?

@Jon: What is your opinion about this?

BR/Tuong

-Original Message-
From: Xue, Ying  
Sent: Friday, November 1, 2019 9:20 PM
To: Tuong Lien ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com
Subject: RE: [PATCH RFC 0/5] TIPC encryption

Good job. 

This is a big and complex feature. Particularly for most of users who might not 
consider to use this feature, please consider to give them a choice to 
completely disable it by adding a new kernel option like TIPC_CRYPTO.

Thanks,
Ying

-Original Message-
From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Monday, October 14, 2019 7:07 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying
Subject: [PATCH RFC 0/5] TIPC encryption

This series provides TIPC encryption feature, kernel part. There will be
another one in the 'iproute2/tipc' for user space to set key.

Tuong Lien (5):
  tipc: add reference counter to bearer
  tipc: enable creating a "preliminary" node
  tipc: add new AEAD key structure for user API
  tipc: introduce TIPC encryption & authentication
  tipc: add support for AEAD key setting via netlink

 include/uapi/linux/tipc.h |   21 +
 include/uapi/linux/tipc_netlink.h |4 +
 net/tipc/Makefile |2 +-
 net/tipc/bcast.c  |2 +-
 net/tipc/bearer.c |   52 +-
 net/tipc/bearer.h |6 +-
 net/tipc/core.c   |   10 +
 net/tipc/core.h   |4 +
 net/tipc/crypto.c | 1986 +
 net/tipc/crypto.h |  166 
 net/tipc/link.c   |   16 +-
 net/tipc/link.h   |1 +
 net/tipc/msg.c|   24 +-
 net/tipc/msg.h|   44 +-
 net/tipc/netlink.c|   16 +-
 net/tipc/node.c   |  314 +-
 net/tipc/node.h   |   10 +
 net/tipc/sysctl.c |9 +
 net/tipc/udp_media.c  |1 +
 19 files changed, 2604 insertions(+), 84 deletions(-)
 create mode 100644 net/tipc/crypto.c
 create mode 100644 net/tipc/crypto.h

-- 
2.13.7




___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [net-next v3 1/1] tipc: add smart nagle feature

2019-10-30 Thread Tuong Lien Tong
Hi Jon,

Please see below one comment from my side.

BR/Tuong

@@ -1437,16 +1492,17 @@ static int __tipc_sendstream(struct socket *sock,
struct msghdr *m, size_t dlen)
struct sock *sk = sock->sk;
DECLARE_SOCKADDR(struct sockaddr_tipc *, dest, m->msg_name);
long timeout = sock_sndtimeo(sk, m->msg_flags & MSG_DONTWAIT);
+   struct sk_buff_head *txq = >sk_write_queue;
struct tipc_sock *tsk = tipc_sk(sk);
struct tipc_msg *hdr = >phdr;
struct net *net = sock_net(sk);
-   struct sk_buff_head pkts;
u32 dnode = tsk_peer_node(tsk);
+   int blocks = tsk->snd_backlog;
+   int maxnagle = tsk->maxnagle;
+   int maxpkt = tsk->max_pkt;
int send, sent = 0;
int rc = 0;
 
-   __skb_queue_head_init();
-
if (unlikely(dlen > INT_MAX))
return -EMSGSIZE;
 
@@ -1467,21 +1523,35 @@ static int __tipc_sendstream(struct socket *sock,
struct msghdr *m, size_t dlen)
 tipc_sk_connected(sk)));
if (unlikely(rc))
break;
-
send = min_t(size_t, dlen - sent, TIPC_MAX_USER_MSG_SIZE);
-   rc = tipc_msg_build(hdr, m, sent, send, tsk->max_pkt,
);
-   if (unlikely(rc != send))
-   break;
 
-   trace_tipc_sk_sendstream(sk, skb_peek(),

[Tuong]: Should we set the 'blocks' here instead i.e. blocks =
tsk->snd_backlog' as it can be changed if we have to release the sock &
sleep in advance (e.g. tipc_wait_for_cond), also the 'while' statement can
be re-run?

+   if (tsk->oneway++ >= 4 && send <= maxnagle) {
+   rc = tipc_msg_append(hdr, m, send, maxnagle, txq);
+   if (rc < 0)
+   break;
+   blocks += rc;
+   if (blocks <= 64 && tsk->expect_ack) {
+   tsk->snd_backlog = blocks;
+   sent += send;
+   break;
+   }
+   tsk->expect_ack = true;
+   } else {
+   rc = tipc_msg_build(hdr, m, sent, send, maxpkt,
txq);
+   if (unlikely(rc != send))
+   break;
+   blocks += tsk_inc(tsk, send + MIN_H_SIZE);
+   }
+   trace_tipc_sk_sendstream(sk, skb_peek(txq),
 TIPC_DUMP_SK_SNDQ, " ");
-   rc = tipc_node_xmit(net, , dnode, tsk->portid);
+   rc = tipc_node_xmit(net, txq, dnode, tsk->portid);
if (unlikely(rc == -ELINKCONG)) {
tsk->cong_link_cnt = 1;
rc = 0;
}
if (likely(!rc)) {
-   tsk->snt_unacked += tsk_inc(tsk, send + MIN_H_SIZE);
+   tsk->snt_unacked += blocks;
+   tsk->snd_backlog = 0;
sent += send;
}
} while (sent < dlen && !rc);
@@ -1528,6 +1598,7 @@ static void tipc_sk_finish_conn(struct tipc_sock *tsk,
u32 peer_port,
tipc_node_add_conn(net, peer_node, tsk->portid, peer_port);
tsk->max_pkt = tipc_node_get_mtu(net, peer_node, tsk->portid);




___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH RFC 2/2] tipc: improve message bundling algorithm

2019-10-14 Thread Tuong Lien Tong
Hi Ying,

Agree, it's hard to trace...
I've changed the way we approach, will post it as a new patch, please take a
look from there!
Thanks a lot!

BR/Tuong

-Original Message-
From: Xue, Ying  
Sent: Friday, October 11, 2019 9:52 PM
To: Tuong Lien ;
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com;
ma...@donjonn.com
Subject: RE: [PATCH RFC 2/2] tipc: improve message bundling algorithm

I can recognize this is a good improvement except that the following switch
cases of return values of tipc_msg_try_bundle() are not very friendly for
code reader. Although I do understand their real meanings, I have to spend
time checking its context back and forth. At least we should the meaningless
hard code case numbers or we try to change return value numbers of
tipc_msg_try_bundle().

+   n = tipc_msg_try_bundle(>backlog[imp].target_bskb, skb,
+   mtu - INT_H_SIZE,
+   l->addr);
+   switch (n) {
+   case 0:
+   break;
+   case 1:
+   __skb_queue_tail(backlogq, skb);
l->backlog[imp].len++;
-   l->stats.sent_bundled++;
+   continue;
+   case 2:
l->stats.sent_bundles++;
+   l->stats.sent_bundled++;
+   default:
+   kfree_skb(skb);
+   l->stats.sent_bundled++;
continue;






___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH RFC 1/2] tipc: fix unlimited bundling of small messages

2019-09-09 Thread Tuong Lien Tong
Hi Jon,

Agree, the patch doesn't show an improvement in throughput. However, I
believe this is due to the fact our test tool e.g. the benchmark just runs
tests of the same messages size in each cases. In practice, when there are
different sized messages sent by applications at the same time, the new
bundling strategy will take effect.

By the way, since now we limit exactly the number of packets in the backlog
at each levels, this obviously reduces the throughput of small messages (as
bundles), but I wonder why we need to set the limits quite small? When
trying to set a larger value, I have observed a significant improvement in
the throughputs, for large messages as well. Our current approach at the
link layer doesn't seem very good as the limit is fixed without considering
the actual number of users or connections...

BR/Tuong

-Original Message-
From: Jon Maloy  
Sent: Saturday, August 31, 2019 3:57 AM
To: Tuong Tong Lien ;
tipc-discussion@lists.sourceforge.net; ma...@donjonn.com;
ying@windriver.com
Subject: RE: [PATCH RFC 1/2] tipc: fix unlimited bundling of small messages

Hi Tuong,
Both patches are good with me. Unfortunately I could not register any
measurable performance improvement, but I still think this is the right
thing to do.

Acked-by: Jon



> -Original Message-
> From: Tuong Lien 
> Sent: 29-Aug-19 07:36
> To: tipc-discussion@lists.sourceforge.net; Jon Maloy
> ; ma...@donjonn.com; ying@windriver.com
> Subject: [PATCH RFC 1/2] tipc: fix unlimited bundling of small messages
> 
> We have identified a problem with the "oversubscription" policy in the
link
> transmission code.
> 
> When small messages are transmitted, and the sending link has reached the
> transmit window limit, those messages will be bundled and put into the
link
> backlog queue. However, bundles of data messages are counted at the
> 'CRITICAL' level, so that the counter for that level, instead of the
counter for
> the real, bundled message's level is the one being increased.
> Subsequent, to-be-bundled data messages at non-CRITICAL levels continue to
> be tested against the unchanged counter for their own level, while
> contributing to an unrestrained increase at the CRITICAL backlog level.
> 
> This leaves a gap in congestion control algorithm for small messages that
can
> result in starvation for other users or a "real" CRITICAL user. Even that
> eventually can lead to buffer exhaustion & link reset.
> 
> We fix this by keeping a 'target_bskb' buffer pointer at each levels, then
when
> bundling, we only bundle messages at the same importance level only. This
> way, we know exactly how many slots a certain level have occupied in the
> queue, so can manage level congestion accurately.
> 
> By bundling messages at the same level, we even have more benefits. Let
> consider this:
> - One socket sends 64-byte messages at the 'CRITICAL' level;
> - Another sends 4096-byte messages at the 'LOW' level;
> 
> When a 64-byte message comes and is bundled the first time, we put the
> overhead of message bundle to it (+ 40-byte header, data copy, etc.) for
later
> use, but the next message can be a 4096-byte one that cannot be bundled to
> the previous one. This means the last bundle carries only one payload
message
> which is totally inefficient, as for the receiver also! Later on, another
64-byte
> message comes, now we make a new bundle and the same story repeats...
> 
> With the new bundling algorithm, this will not happen, the 64-byte
messages
> will be bundled together even when the 4096-byte message(s) comes in
> between. However, if the 4096-byte messages are sent at the same level
i.e.
> 'CRITICAL', the bundling algorithm will again cause the same overhead.
> 
> Also, the same will happen even with only one socket sending small
messages
> at a rate close to the link transmit's one, so that, when one message is
> bundled, it's transmitted shortly. Then, another message comes, a new
bundle
> is created and so on...
> 
> We will solve this issue radically by the 2nd patch of this series.
> 
> Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link
> congestion")
> Reported-by: Hoang Le 
> Acked-by: Jon Maloy 
> Signed-off-by: Tuong Lien 
> ---
>  net/tipc/link.c | 29 ++---  net/tipc/msg.c  |  5
+
>  2 files changed, 19 insertions(+), 15 deletions(-)
> 
> diff --git a/net/tipc/link.c b/net/tipc/link.c index
6cc75ffd9e2c..999eab592de8
> 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -160,6 +160,7 @@ struct tipc_link {
>   struct {
>   u16 len;
>   u16 limit;
> + struct sk_buff *target_bskb;
>   } backlog[5];
>   u16 snd_nxt;
>   u16 window;
> @@ -880,6 +881,7 @@ static void link_prepare_wakeup(struct tipc_link *l)
> void tipc_link_reset(struct tipc_link *l)  {
>   struct sk_buff_head list;
> + u32 imp;
> 
>   __skb_queue_head_init();
> 
> @@ -901,11 +903,10 @@ void tipc_link_reset(struct 

Re: [tipc-discussion] [net-next 1/1] tipc: reduce risk of wakeup queue starvation

2019-08-26 Thread Tuong Lien Tong
Hi Jon,

Yes, you are right, my previous patch was not complete (sorry, I have not
verified it but just wanted to give a general idea...). Actually, we could
still preserve the necessary data/header of the original message for
building the wakeup message later as needed (look, just the message 'dport'
is enough). However, I don't really like this approach because there is
still an issue there (see below).

Your patch can fix the bug I mentioned earlier (i.e. unlimited bundles of
small messages...), but looks like it has a side-effect. We may again
encounter the starvation issue that we have tried to overcome by the
previous patches, that is, a socket user with a certain importance level
messages can make the others starved, in this case it's the 'CRITICAL'
level? With the additional condition at the link_xmit(), we will limit the
other level users (i.e. report link congestion & cause them to wait...) just
due to the congestion at the 'CRITICAL' level (i.e. not their own levels) of
the backlog queue. Even, a "true" CRITICAL user that wants to send a message
will face the issue because the bundles of small messages at lower levels
occupy all the 'CRITICAL' slots...

Really, I don't understand the purpose we set the importance level of a
bundle to 'CRITICAL', which even gives more slots for the "less important"
users with small messages... Isn't it by dividing & increasing the backlog
level limits, we want to give more chances for higher level users in message
sending? I think we should improve the bundle algorithm a little bit to
reflect the backlog level usages accurately instead. I will send you another
patch...
 
BR/Tuong

-Original Message-
From: Jon Maloy  
Sent: Saturday, August 24, 2019 9:29 PM
To: Jon Maloy 
Cc: Mohan Krishna Ghanta Krishnamurthy
;
parthasarathy.bhuvara...@gmail.com; Tung Quang Nguyen
; Hoang Huu Le ;
Tuong Tong Lien ; Gordan Mihaljevic
; ying@windriver.com;
tipc-discussion@lists.sourceforge.net
Subject: RE: [net-next 1/1] tipc: reduce risk of wakeup queue starvation

Hi Tuong,
While experimenting with byte-oriented flow control I realized that this is
a very real problem that has to be fixed.
I first tried your suggestion with putting the congestion test at the end of
tipc_link_xmit(), but realized that we need access to the original message
header when we are scheduling a user to the wakeup queue. But this header is
already gone if original the message was bundled and deleted!
Also, there is no more space in the CB area for storing the per-level
counter in the bundle packets, as was my first suggestion.

So, this was the simplest solution I could come up with. It seems to work
well, but seems to give a slight performance degradation. I am afraid we
will have to accept that for now.

Please give feedback.

///jon



> -Original Message-
> From: Jon Maloy 
> Sent: 24-Aug-19 10:19
> To: Jon Maloy ; Jon Maloy
> 
> Cc: Mohan Krishna Ghanta Krishnamurthy
> ;
> parthasarathy.bhuvara...@gmail.com; Tung Quang Nguyen
> ; Hoang Huu Le
> ; Tuong Tong Lien
> ; Gordan Mihaljevic
> ; ying@windriver.com; tipc-
> discuss...@lists.sourceforge.net
> Subject: [net-next 1/1] tipc: reduce risk of wakeup queue starvation
> 
> We have identified a problem with the "oversubscription" policy in the
link
> transmission code.
> 
> When small messages are transmitted, and the sending link has reached the
> transmit window limit, those messages will be bundled and put into the
link
> backlog queue. However, bundles of data messages are counted at the
> 'CRITICAL' level, so that the counter for that level, instead of the
counter for
> the real, bundled message's level is the one being increased.
> Subsequent, to-be-bundled data messagea at non-CRITICAL levels continue to
> be tested against the unchanged counter for their own level, while
> contributing to an unrestrained increase at the CRITICAL backlog level.
> 
> This leaves a gap in congestion control algorithm for small messages, and
may
> eventually lead to buffer exhaustion and link reset.
> 
> We fix this by adding a test for congestion at the CRITICAL level, as well
as the
> existing testing for the message's own level, whenever a message is
> transmitted. We also refuse to notify any waiting users as long as
congestion at
> the CRITICAL level exists.
> 
> Reported-by: Tuong Lien 
> Signed-off-by: Jon Maloy 
> ---
>  net/tipc/link.c | 18 +-
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/net/tipc/link.c b/net/tipc/link.c index 6cc75ff..25a6acb
100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -77,6 +77,11 @@ struct tipc_stats {
>   u32 msg_length_profile[7]; /* used for msg. length profiling */  };
> 
> +struct tipc_backlog {
> + u16 len;
> + u16 limit;
> +};
> +
>  /**
>   * struct tipc_link - TIPC link data structure
>   * @addr: network address of link's peer node @@ -157,10 +162,7 @@
> struct tipc_link {
>   /* Sending */
>   struct sk_buff_head transmq;
>   

Re: [tipc-discussion] [net] tipc: fix false detection of retransmit failures

2019-07-31 Thread Tuong Lien Tong
Hi Ying,

See below my answers inline.

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Wednesday, July 31, 2019 8:21 PM
To: Tuong Lien ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com
Subject: Re: [net] tipc: fix false detection of retransmit failures

On 7/19/19 11:56 AM, Tuong Lien wrote:
> This commit eliminates the use of the link 'stale_limit' & 'prev_from'
> (besides the already removed - 'stale_cnt') variables in the detection
> of repeated retransmit failures as there is no proper way to initialize
> them to avoid a false detection, i.e. it is not really a retransmission
> failure but due to a garbage values in the variables.

Sorry, I couldn't understand the reason why we have no proper way to
initialize 'stale_limit' & 'prev_from' variables of tipc_link struct.

As far as I understand, the two variables have been set to zero when
tipc_link object is allocated with kzalloc() in tipc_link_create().

Can you please help me understand what real scenario we cannot properly
set them.

[Tuong]: Yes, these two variables have been initialized to zero at the link 
create but zero or any other value is not 'safe' for them, the retransmit 
failure criteria can be met without a real failure. Specifically, the 
'time_after()' can return 'true' unexpectedly due to a garbage value (like 
zero...) of the 'stale_limit' unless it's intentionally set in the 1st 
condition. However, the 1st condition with the 'prev_from' is not always 
satisfied. In case the next 'from' is coincidentally identical to the 
'prev_from', we will miss it.

> - if (r->prev_from != from) {
> - r->prev_from = from;
> - r->stale_limit = jiffies + msecs_to_jiffies(r->tolerance);
> - } else if (time_after(jiffies, r->stale_limit)) {
> - pr_warn("Retransmission failure on link <%s>\n", l->name);

For example:
1) n-th retransmit: (prev_from = x, from = 100)
==> ok, we set the variables: prev_from = 100, stale_limit = jiffies + 1.5s
2) now, this pkt #100 was retransmitted successfully...
3) Later on, n+1-th retransmit: (prev_from = 100, from = 100)
-> We don’t set the 'stale_limit' but do the “else if” and bump!

Now, we can try to reset or re-initialize the 'prev_from' somehow, e.g. when 
the pkt #100 is ack-ed & released, but what value will be for it? Note, any 
value is a valid sequence number, and if the next 'from' equals that value, we 
will face with the same trouble again.
Trying to set the 'stale_limit' to very far in the future is irrelevant too 
because it turns out that we will disable the feature if the same 'from' is 
really faced with the repeated retransmit failure!

> 
> Instead, a jiffies variable will be added to individual skbs (like the
> way we restrict the skb retransmissions) in order to mark the first skb
> retransmit time. Later on, at the next retransmissions, the timestamp
> will be checked to see if the skb in the link transmq is "too stale",
> that is, the link tolerance time has passed, so that a link reset will
> be ordered. Note, just checking on the first skb in the queue is fine
> enough since it must be the oldest one.
> A counter is also added to keep track the actual skb retransmissions'
> number for later checking when the failure happens.
> 
> The downside of this approach is that the skb->cb[] buffer is about to
> be exhausted, however it is always able to allocate another memory area
> and keep a reference to it when needed.
> 
> Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria")
> Reported-by: Hoang Le 
> Signed-off-by: Tuong Lien 
> ---
>  net/tipc/link.c | 92 
> -
>  net/tipc/msg.h  |  8 +++--
>  2 files changed, 57 insertions(+), 43 deletions(-)
> 
> diff --git a/net/tipc/link.c b/net/tipc/link.c
> index 66d3a07bc571..c2c5c53cad22 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -106,8 +106,6 @@ struct tipc_stats {
>   * @transmitq: queue for sent, non-acked messages
>   * @backlogq: queue for messages waiting to be sent
>   * @snt_nxt: next sequence number to use for outbound messages
> - * @prev_from: sequence number of most previous retransmission request
> - * @stale_limit: time when repeated identical retransmits must force link 
> reset
>   * @ackers: # of peers that needs to ack each packet before it can be 
> released
>   * @acked: # last packet acked by a certain peer. Used for broadcast.
>   * @rcv_nxt: next sequence number to expect for inbound messages
> @@ -164,9 +162,7 @@ struct tipc_link {
>   u16 limit;
>   } backlog[5];
>   u16 snd_nxt;
> - u16 prev_from;
>   u16 window;
> - unsigned long stale_limit;
>  
>   /* Reception */
>   u16 rcv_nxt;
> @@ -1044,47 +1040,53 @@ static void tipc_link_advance_backlog(struct 
> tipc_link *l,
>   * link_retransmit_failure() - Detect repeated retransmit failures
>   * @l: tipc link sender
>   * @r: tipc link receiver (= l in case of 

Re: [tipc-discussion] [PATCH RFC 2/2] tipc: fix changeover issues due to large packet

2019-06-17 Thread Tuong Lien Tong
Hi Ying,

Thanks for your comments!
Regarding your last statement, yes when making the patch, I noticed that the 
"tipc_msg_build()" and "tipc_msg_fragment()" do a similar task, also I tried to 
think a way to combine them but didn't because of the reasons:
1- The "core" functions to copy the data are different since the 
"tipc_msg_build()" plays with user data in the iov buffers, whereas, for the 
other, it's skb data.
Also, the outputs are different, the first function will set the messages' type 
in header such as "FIRST_FRAGMENT", "FRAGMENT" or "LAST_FRAGMENT", but not with 
the other because it will overwrite the tunnel messages' type... that I had to 
use the other field (fragm_no/nof_fragms) to determine this at the receiving 
side...
2- I don't want to touch the old code that can be risky :(

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Sunday, June 16, 2019 1:42 PM
To: Tuong Lien ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com
Subject: Re: [PATCH RFC 2/2] tipc: fix changeover issues due to large packet

> 2) The same scenario above can happen more easily in case the MTU of
> the links is set differently or when changing. In that case, as long as
> a large message in the failure link's transmq queue was built and
> fragmented with its link's MTU > the other link's one, the issue will
> happen (there is no need of a link synching in advance).
> 
> 3) The link synching procedure also faces with the same issue but since
> the link synching is only started upon receipt of a SYNCH_MSG, dropping
> the message will not result in a state deadlock, but it is not expected
> as design.
> 
> The 1) & 3) issues are resolved by the previous commit 81e4dd94b214

This is the same as previous commit. The commit ID might be invalid
after it's merged into upstream.

> ("tipc: optimize link synching mechanism") by generating only a dummy
> SYNCH_MSG (i.e. without data) at the link synching, so the size of a
> FAILOVER_MSG if any then will never exceed the link's MTU.

>  /**
> + * tipc_msg_fragment - build a fragment skb list for TIPC message
> + *
> + * @skb: TIPC message skb
> + * @hdr: internal msg header to be put on the top of the fragments
> + * @pktmax: max size of a fragment incl. the header
> + * @frags: returned fragment skb list
> + *
> + * Returns 0 if the fragmentation is successful, otherwise: -EINVAL
> + * or -ENOMEM
> + */
> +int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr,
> +   int pktmax, struct sk_buff_head *frags)
> +{
> + int pktno, nof_fragms, dsz, dmax, eat;
> + struct tipc_msg *_hdr;
> + struct sk_buff *_skb;
> + u8 *data;
> +
> + /* Non-linear buffer? */
> + if (skb_linearize(skb))
> + return -ENOMEM;
> +
> + data = (u8 *)skb->data;
> + dsz = msg_size(buf_msg(skb));
> + dmax = pktmax - INT_H_SIZE;
> +
> + if (dsz <= dmax || !dmax)
> + return -EINVAL;
> +
> + nof_fragms = dsz / dmax + 1;
> +
> + for (pktno = 1; pktno <= nof_fragms; pktno++) {
> + if (pktno < nof_fragms)
> + eat = dmax;
> + else
> + eat = dsz % dmax;
> +
> + _skb = tipc_buf_acquire(INT_H_SIZE + eat, GFP_ATOMIC);
> + if (!_skb)
> + goto error;
> +
> + skb_orphan(_skb);
> + __skb_queue_tail(frags, _skb);
> +
> + skb_copy_to_linear_data(_skb, hdr, INT_H_SIZE);
> + skb_copy_to_linear_data_offset(_skb, INT_H_SIZE, data, eat);
> + data += eat;
> +
> + _hdr = buf_msg(_skb);
> + msg_set_fragm_no(_hdr, pktno);
> + msg_set_nof_fragms(_hdr, nof_fragms);
> + msg_set_size(_hdr, INT_H_SIZE + eat);
> + }
> + return 0;
> +

In fact we have similar code in tipc_msg_build() where we also fragment
packet if necessary. In order to eliminate redundant code, I suggest we
should extract the common code into a separate function and then
tipc_msg_build() and tipc_msg_fragment() call it.




___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH RFC] tipc: fix link "half-failover" issue

2019-04-24 Thread Tuong Lien Tong
Thanks a lot, Jon!
Please see my answers inline... I will send the patch to net-next then.

BR/Tuong

-Original Message-
From: Jon Maloy  
Sent: Wednesday, April 24, 2019 11:41 PM
To: Tuong Tong Lien ; ma...@donjonn.com; 
ying@windriver.com; tipc-discussion@lists.sourceforge.net
Subject: RE: [PATCH RFC] tipc: fix link "half-failover" issue

Acked-by: Jon Maloy 

Also see my comments below.

///jon

> -Original Message-
> From: Tuong Lien 
> Sent: 24-Apr-19 00:41
> To: Jon Maloy ; ma...@donjonn.com;
> ying@windriver.com; tipc-discussion@lists.sourceforge.net
> Subject: [PATCH RFC] tipc: fix link "half-failover" issue
> 
> TIPC link can temporarily fall into "half-establish" that only one of the link
> endpoints is ESTABLISHED and starts to send traffic, PROTOCOL messages,
> whereas the other link endpoint is not up (e.g. immediately when the
> endpoint receives ACTIVATE_MSG, the network interface goes down...).
> 
> This is a normal situation and will be settled because the link endpoint will 
> be
> eventually brought down after the link tolerance time.
> 
> However, the situation will become worse when the second link is
> established before the first link endpoint goes down, For example:
> 
>1. Both links <1A-1B>, <2A-2B> down

Confusing terminology here. How can there be a link <1A-1B>?
I think you want to say "Both link endpoints" 1A,1B down etc.

[Tuong]: I didn't realize this confusion until you said it here !
It should be "Both links <1A-2A>, <1B-2B> down" where the numeric part
presents the node number and the character part is the interface name.

>2. Link endpoint 1B up, but 1A still down (e.g. due to network
>   disturbance, wrong session, etc.)

>3. Link <2A-2B> up
Same here.
[Tuong]: Same above, it should be "Link <1B-2B> up"

>4. Link endpoint 1B down (e.g. due to link tolerance timeout)
>5. Node B starts failover onto link <2A-2B>
> 
>==> Node A does never start link failover.
> 
> When the "half-failover" situation happens, two consequences have been
> observed:
> 
> a) Peer link/node gets stuck in FAILINGOVER state;
> b) Traffic or user messages that peer node is trying to failover onto the
> second link can be partially or completely dropped by this node.
> 
> The consequence a) was actually solved by commit c140eb166d68 ("tipc:
> fix failover problem"), but that commit didn't cover the b). It's due to the 
> fact
> that the tunnel link endpoint has never been prepared for a failover, so the
> 'l->drop_point' (and the other data...) is not set correctly. When a
> TUNNLE_MSG 

s/TUNNLE_MSG/TUNNEL_MSG

[Tuong]: yep, a typo!

from peer node arrives on the link, depending on the inner
> message's seqno and the current 'l->drop_point'
> value, the message can be dropped (- treated as a duplicate message) or
> processed.
> At this early stage, the traffic messages from peer are likely to be
> NAME_DISTRIBUTORs, this means some name table entries will be missed on
> the node forever!
> 
> The commit resolves the issue by starting the FAILOVER process on this node
> as well. Another benefit from this solution is that we ensure the link will 
> not
> be re-established until the failover ends.
> 
> Signed-off-by: Tuong Lien 
> ---
>  net/tipc/link.c | 35 +++
>  net/tipc/link.h |  2 ++
>  net/tipc/node.c | 54
> +++---
>  3 files changed, 84 insertions(+), 7 deletions(-)
> 
> diff --git a/net/tipc/link.c b/net/tipc/link.c index 
> 6053489c8063..fa639054329d
> 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -1705,6 +1705,41 @@ void tipc_link_tnl_prepare(struct tipc_link *l,
> struct tipc_link *tnl,
>   }
>  }
> 
> +/**
> + * tipc_link_failover_prepare() - prepare tnl for link failover
> + *
> + * This is a special version of the precursor -
> +tipc_link_tnl_prepare(),
> + * see the __tipc_node_link_failover() for details
> + *
> + * @l: failover link
> + * @tnl: tunnel link
> + * @xmitq: queue for messages to be xmited  */ void
> +tipc_link_failover_prepare(struct tipc_link *l, struct tipc_link *tnl,
> + struct sk_buff_head *xmitq)
> +{
> + struct sk_buff_head *fdefq = >failover_deferdq;
> +
> + tipc_link_create_dummy_tnl_msg(tnl, xmitq);
> +
> + /* This failover link enpoint should never be established

s/should never be established before, so did not../was never established, so it 
has not ...

[Tuong]: ok, will update it.

> +  * before, so did not receive anything from peer.
> +  * Otherwise, it must be a normal failover situation or the
> +  * node has entered SELF_DOWN_PEER_LEAVING and both peer
> nodes
> +  * would have to start over from scratch instead.
> +  */
> + WARN_ON(l && tipc_link_is_up(l));
> + tnl->drop_point = 1;
> + tnl->failover_reasm_skb = NULL;
> +
> + /* Initiate the link's failover deferdq */
> + if (unlikely(!skb_queue_empty(fdefq))) {
> +

Re: [tipc-discussion] [net-next 1/3] tipc: improve TIPC throughput by Gap ACK blocks

2019-03-25 Thread Tuong Lien Tong
Hi Ying!

Correct, the idea was inspired from SACK in SCTP protocol but with simplicity.
Also, I see your suggestions about the "duplicated packets"... which is 
connected to the SCTP "delayed SACK" algorithm (i.e. in the case of no payload 
message loss). In TIPC, as I understand so far, we already have such a delay on 
acknowledgements by the link "rcv_unacked" (- Jon may correct me?) (but we 
don’t implement a timer for the SACK delay timeout i.e. the 200ms in the SCTP 
RFC). However, that duplicated TSN concept is based on DATA chunks _and_ "with 
no new DATA chunk(s)" which usually happens in case of SACK loss and the sender 
has tried to retransmit the same DATA chunks when its RTO timer expired..., 
obviously an immediate SACK is needed in this situation. In TIPC, we might not 
face with this situation because we do not have a retransmission timer at 
sender side, duplicates can occur almost due to the overactive NACK sending and 
should be covered by the 2nd patch of the series.
For me, in the case of packet loss, an immediate retransmission is important, 
otherwise it can reduce the performance. However, because we never know if the 
packet is really lost or just delayed, we have to apply the "1ms restriction" 
to reduce duplicates (- as you can also see in the 2nd patch). Fast 
retransmission was also tried, Jon and I had some discussions before... but the 
fact is, in TIPC, the sender is passive (due to no retransmission timer) and we 
could be in trouble if trying to wait for the 2nd or 3rd indications... 
Instead, the NACK sending criteria has been changed by the 2nd patch to both 
reduce duplicates but try to keep the performance...
Actually, in SCTP, the situation is a bit difference as they play with "chunks" 
and "multi-streaming" than individual packets like ours at the link layer, and 
many chunks can be optionally bundled in a single packet because of the 
slow-start or Nagle's algorithm...
Anyway, if you have any ideas to improve TIPC performance more, I will try to 
see what happens.
Thanks a lot!

BR/Tuong 

-Original Message-
From: Ying Xue  
Sent: Friday, March 22, 2019 7:52 PM
To: Tuong Lien ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com
Subject: Re: [net-next 1/3] tipc: improve TIPC throughput by Gap ACK blocks

Hi Tuong,

Great job! It's a very nice enhancement, and we should do the
improvement early.

On 3/20/19 11:28 AM, Tuong Lien wrote:
> During unicast link transmission, it's observed very often that because
> of one or a few lost/dis-ordered packets, the sending side will fastly
> reach the send window limit and must wait for the packets to be arrived
> at the receiving side or in the worst case, a retransmission must be
> done first. The sending side cannot release a lot of subsequent packets
> in its transmq even though all of them might have already been received
> by the receiving side.
> That is, one or two packets dis-ordered/lost and dozens of packets have
> to wait, this obviously reduces the overall throughput!
> 
> This commit introduces an algorithm to overcome this by using "Gap ACK
> blocks". Basically, a Gap ACK block will consist of  numbers
> that describes the link deferdq where packets have been got by the
> receiving side but with gaps, for example:
> 
>   link deferdq: [1 2 3 4  10 11  13 14 15   20]
> --> Gap ACK blocks:   <4, 5>,   <11, 1>,  <15, 4>, <20, 0>

This idea is the exactly same as SACK of SCTP:
https://tools.ietf.org/html/rfc4960#section-3.3.4


0   1   2   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Type = 3|Chunk  Flags   |  Chunk Length |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Cumulative TSN Ack   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Advertised Receiver Window Credit (a_rwnd)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Number of Gap Ack Blocks = N  |  Number of Duplicate TSNs = X |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Gap Ack Block #1 Start   |   Gap Ack Block #1 End|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   /   /
   \  ...  \
   /   /
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Gap Ack Block #N Start  |  Gap Ack Block #N End |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Duplicate TSN 1 

[tipc-discussion] FW: [tipc:tipcutils] New commit [f6ae0c] by Tuong Lien

2019-01-09 Thread Tuong Lien Tong
Hi all,

Just in case you didn't notice this commit, I'd like to notify that,
In addition to the TIPC trace-events in kernel, we now have a 'front-end' tool 
named 'tipc-trace' as part of the tipcutils which offers some options for us to 
do the traces more convenient.
Please feel free to try and enhance it!
Thanks a lot!

BR/Tuong

-Original Message-
From: TIPC Cluster Domain Sockets Git repository 
 
Sent: Tuesday, January 8, 2019 2:30 PM
To: TIPC Cluster Domain Sockets Git repository 

Subject: [tipc:tipcutils] New commit [f6ae0c] by Tuong Lien


## Branch: master  

tipcutils: introduce a front-end tool for TIPC traces

This commit adds 'tipc-trace' to tipcutils which is a front-end
tool for TIPC traces. The tool allows tracing TIPC by means of the
TIPC trace_events in kernel, it offers some options to do the traces
more conveniently:

- Print all available kernel TIPC trace-events on node with a short
  description, check the trace status;
- Provide some 'trace-suites' with a ‘built-in’ configurations in
  order to automatically enable other traces, ... when a particular
  event happens (e.g. link lost, etc.);
- Configure the traces: enable, disable, set filter/trigger for
  traces or save traces as a trace-suite for later use;
- Start, stop the traces, collect the trace outputs;
- Interpret the trace outputs online or offline;
- Etc.

Tested-by: Jon Maloy 
Acked-by: Jon Maloy 
Signed-off-by: Tuong Lien 

By Tuong Lien on 12/14/2018 12:00
[**View 
Changes**](https://sourceforge.net/p/tipc/tipcutils/ci/f6ae0ce5fba595b5e725e2b5adfde1791e860c29/)





---

Sent from sourceforge.net because you indicated interest in 




To unsubscribe from further messages, please visit 




___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH v2 0/5] tipc: tracepoints and trace_events in TIPC

2018-12-18 Thread Tuong Lien Tong
Thanks Jon & Ying!
I have just sent the patches to the net-next.

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Wednesday, December 19, 2018 7:28 AM
To: Tuong Lien ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com
Subject: Re: [PATCH v2 0/5] tipc: tracepoints and trace_events in TIPC


Your series is good to me.

Acked-by: Ying Xue 

On 12/18/18 6:11 PM, Tuong Lien wrote:
> The patch series is the first step of introducing a tracing framework in
> TIPC, which will assist in collecting complete & plentiful data for post
> analysis, even in the case of a single failure occurrence e.g. when the
> failure is unreproducible.
> 
> The tracing code in TIPC utilizes the powerful kernel tracepoints, trace
> events features along with particular dump functions to trace the TIPC
> object data and events (incl. bearer, link, socket, node, etc.).
> 
> The tracing code should generate zero-load to TIPC when the trace events
> are not enabled.
> 
> v2: improve trace output format;
> improve the socket traces, make it now convenient for user to do the
> socket traces on any specific socket when needed (by using the new
> sysctl file - '/proc/sys/net/tipc/sk_filter' with a socket 'tuple'
> for filtering);
> add a new dump option for link's wakeup queue, add trace for link
> congestion situation;
> improve the node and bearer traces;
> 
> Tuong Lien (5):
>   tipc: enable tracepoints in tipc
>   tipc: add trace_events for tipc link
>   tipc: add trace_events for tipc socket
>   tipc: add trace_events for tipc node
>   tipc: add trace_events for tipc bearer
> 
>  net/tipc/Makefile |   4 +-
>  net/tipc/bearer.c |   9 +-
>  net/tipc/bearer.h |   2 +-
>  net/tipc/link.c   | 153 ++-
>  net/tipc/link.h   |   2 +
>  net/tipc/node.c   |  88 ++-
>  net/tipc/node.h   |   1 +
>  net/tipc/socket.c | 230 -
>  net/tipc/socket.h |   4 +
>  net/tipc/sysctl.c |   8 +
>  net/tipc/trace.c  | 206 ++
>  net/tipc/trace.h  | 431 
> ++
>  12 files changed, 1122 insertions(+), 16 deletions(-)
>  create mode 100644 net/tipc/trace.c
>  create mode 100644 net/tipc/trace.h
> 



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH v1 0/5] tipc: tracepoints and trace_events in TIPC

2018-12-18 Thread Tuong Lien Tong
Hi Jon & Ying,

Thanks so much for your reviews and tests,
Also, I have just sent version 2 of the patch series with a few
improvements, so please help me check it out, after that I'll send it to the
net-next.
Many thanks!

BR/Tuong

-Original Message-
From: Jon Maloy  
Sent: Wednesday, December 12, 2018 3:51 AM
To: Tuong Tong Lien ;
tipc-discussion@lists.sourceforge.net; ying@windriver.com
Subject: RE: [PATCH v1 0/5] tipc: tracepoints and trace_events in TIPC

Hi Tuong,
Now I also installed and tested. I have no more comments, - this is a solid
work. You can go ahead and post it to netdev, just remember to add
'net-next' in the patch headers.

Acked-by: Jon

///jon


> -Original Message-
> From: Tuong Lien 
> Sent: 27-Nov-18 07:38
> To: tipc-discussion@lists.sourceforge.net; Jon Maloy
> ; ying@windriver.com
> Subject: [PATCH v1 0/5] tipc: tracepoints and trace_events in TIPC
> 
> The patch series is the first step of introducing a tracing framework in
TIPC,
> which will assist in collecting complete & plentiful data for post
analysis, even
> in the case of a single failure occurrence e.g. when the failure is
> unreproducible.
> 
> The tracing code in TIPC utilizes the powerful kernel tracepoints, trace
events
> features along with particular dump functions to trace the TIPC object
data
> and events (incl. bearer, link, socket, node, etc.).
> 
> The tracing code should generate zero-load to TIPC when the trace events
> are not enabled.
> 
> Tuong Lien (5):
>   tipc: enable tracepoints in tipc
>   tipc: add trace_events for tipc link
>   tipc: add trace_events for tipc socket
>   tipc: add trace_events for tipc node
>   tipc: add trace_events for tipc bearer
> 
>  net/tipc/Makefile |   4 +-
>  net/tipc/bearer.c |   3 +
>  net/tipc/link.c   | 160 -
>  net/tipc/link.h   |   2 +
>  net/tipc/node.c   |  94 +++-
>  net/tipc/node.h   |   1 +
>  net/tipc/socket.c | 162 -
>  net/tipc/socket.h |   4 +
>  net/tipc/trace.c  | 196 +  net/tipc/trace.h  |
> 418 ++
>  10 files changed, 1035 insertions(+), 9 deletions(-)  create mode 100644
> net/tipc/trace.c  create mode 100644 net/tipc/trace.h
> 
> --
> 2.13.7




___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH v1 1/5] tipc: enable tracepoints in tipc

2018-12-09 Thread Tuong Lien Tong
Hi Ying,

Thank you very much for having reviewed and tested the patch!
Regarding your observation about the "X" in the 'tipc/enable' file, it usually 
happens when we just enable some of the TIPC trace events (or enable all of 
them but then disable some...), so please double-check if this is correct.

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Saturday, December 8, 2018 4:14 PM
To: Tuong Lien ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com
Subject: Re: [PATCH v1 1/5] tipc: enable tracepoints in tipc

Great job!

On 11/27/18 8:37 PM, Tuong Lien wrote:
> cat trace_pipe > /trace.out &
> 
> To disable all the TIPC trace_events:
> 
> echo 0 > /sys/kernel/debug/tracing/events/tipc/enable
> 
> To clear the trace buffer:
> 
> echo > trace
> 
> d) Like the other trace_events, the feature like 'filter' or 'trigger'
> is also usable for the tipc trace_events.
> For more details, have a look at:
> 
> Documentation/trace/ftrace.txt
> 
> MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
> 
> Signed-off-by: Tuong Lien 

Acked-by: Ying Xue 
Tested-by: Ying Xue 

The only minor thing is that I found the default value of
/sys/kernel/debug/tracing/events/tipc/enable is not 0, but it's "X".
Please check why.

> ---
>  net/tipc/Makefile |   4 +-
>  net/tipc/link.c   | 115 +++
>  net/tipc/link.h   |   1 +
>  net/tipc/node.c   |  61 +++



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion