Hi Hoang,
See below.

> -----Original Message-----
> From: Hoang Le <[email protected]>
> Sent: 15-Nov-18 04:17
> To: [email protected]; Jon Maloy
> <[email protected]>; [email protected]; [email protected]
> Subject: RE: [tipc-discussion] [net-next] tipc: fix node keep alive interval
> calculation
> 
> Hi Jon,
> 
> I think the issue now is not minor as I concluded. Support we have a huge
> cluster 75 VMS node, this issue will caused traffic performance degradation.
> So, please take time to review the patch.
> 
> Thanks,
> Hoang
> -----Original Message-----
> From: Hoang Le <[email protected]>
> Sent: Monday, November 5, 2018 3:29 PM
> To: [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [tipc-discussion] [net-next] tipc: fix node keep alive interval
> calculation
> 
> When setting LINK tolerance, node timer interval will be calculated base on
> the LINK with lowest tolerance.
> 
> But when calculated, the old node timer interval only updated if current
> setting value (tolerance/4) less than old ones regardless of number of links 
> as
> well as links' lowest tolerance value.
> 
> This caused to two cases missing if tolerance changed as following:
> Case 1:
> 1.1/ There is one link (L1) available in the system 1.2/ Set L1's tolerance 
> from
> 1500ms => lower (i.e 500ms) 1.3/ Then, fallback to default (1500ms) or higher
> (i.e 2000ms)
> 
> Expected:
>     node timer interval is 1500/4=375ms after 1.3
> 
> Result:
> node timer interval will not being updated after changing tolerance at 1.3
> since its value 1500/4=375ms is not less than 500/4=125ms at 1.2.
> 
Yes, this is not good.

> Case 2:
> 2.1/ There are two links (L1, L2) available in the system 2.2/ L1 and L2
> tolerance value are 2000ms as initial 2.3/ Set L2's tolerance from 2000ms =>
> lower 1500ms 2.4/ Disable link L2 (bring down its bearer)

If we take down the bearer L2 should be deleted, and its value ignored during 
further calculations.

> 
> Expected:
>     node timer interval is 2000ms/4=500ms after 2.4
> 
Similar issue as above.

> Result:
> node timer interval will not being updated after disabling L2 since its value
> 2000ms/4=500ms is still not less than 1500/4=375ms at 2.3 although L2 is
> already not available in the system.
> 
> To fix this, we will update node timer interval according to link tolerance
> value setting if there is only one link available and recalculate the link 
> with
> lowest tolerance for each iteration checking.

I think you can do this much simpler:
Just set n->keepalive_intv to a high value, e.g., 10000 ms, before you enter 
the loop.
Then, in any iteration where a lower value is calculated (i.e., inclusive the 
first one) n->keepalive_intv will be updated accordingly.
No more changes needed 😉

Regards
///jon

> 
> Signed-off-by: Hoang Le <[email protected]>
> ---
>  net/tipc/node.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/net/tipc/node.c b/net/tipc/node.c index
> 2afc4f8c37a7..a0e5b2e2720d 100644
> --- a/net/tipc/node.c
> +++ b/net/tipc/node.c
> @@ -437,13 +437,14 @@ static struct tipc_node *tipc_node_create(struct
> net *net, u32 addr,
>       return n;
>  }
> 
> -static void tipc_node_calculate_timer(struct tipc_node *n, struct tipc_link
> *l)
> +static void tipc_node_calculate_timer(struct tipc_node *n, struct tipc_link
> *l,
> +                                   bool first_link)
>  {
>       unsigned long tol = tipc_link_tolerance(l);
>       unsigned long intv = ((tol / 4) > 500) ? 500 : tol / 4;
> 
>       /* Link with lowest tolerance determines timer interval */
> -     if (intv < n->keepalive_intv)
> +     if (intv < n->keepalive_intv || first_link || n->link_cnt == 1)
>               n->keepalive_intv = intv;
> 
>       /* Ensure link's abort limit corresponds to current tolerance */ @@ -
> 627,7 +628,9 @@ static void tipc_node_timeout(struct timer_list *t)
>               if (le->link) {
>                       spin_lock_bh(&le->lock);
>                       /* Link tolerance may change asynchronously: */
> -                     tipc_node_calculate_timer(n, le->link);
> +                     tipc_node_calculate_timer(n, le->link,
> +                                               (bearer_id == 0) ?
> +                                               true : false);
>                       rc = tipc_link_timeout(le->link, &xmitq);
>                       spin_unlock_bh(&le->lock);
>                       remains--;
> @@ -1018,7 +1021,8 @@ void tipc_node_check_dest(struct net *net, u32
> addr,
>                       tipc_link_fsm_evt(l, LINK_FAILOVER_BEGIN_EVT);
>               le->link = l;
>               n->link_cnt++;
> -             tipc_node_calculate_timer(n, l);
> +             tipc_node_calculate_timer(n, l, n->link_cnt == 1 ?
> +                                       true : false);
>               if (n->link_cnt == 1) {
>                       intv = jiffies + msecs_to_jiffies(n->keepalive_intv);
>                       if (!mod_timer(&n->timer, intv))
> --
> 2.17.1
> 
> 
> 
> _______________________________________________
> tipc-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion


_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to