Faster TCP keepalive
Greetings, I'm writing this to probe if there has been thoughts or efforts in allowing sub-second TCP keep alive interval? One application is for TCP connections between IP hosts connected by an internal backplane where a faster detection is a necessity and the increased traffic can be accommodated. Suggestions on other ways to quickly tearing down TCP connections to a rebooted host in the application above are welcomed. Thank you, Stephen.
[PATCH net,v2] ipv4: use new_gw for redirect neigh lookup
In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0 and then the state of the neigh for the new_gw is checked. If the state isn't valid then the redirected route is deleted. This behavior is maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway is assigned to peer->redirect_learned.a4 before calling ipv4_neigh_lookup(). After commit 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again."), ipv4_neigh_lookup() is performed without the rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw) isn't zero, the function uses it as the key. The neigh is most likely valid since the old_gw is the one that sends the ICMP redirect message. Then the new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may never gets resolved and the traffic is blackholed. So, use the new_gw for neigh lookup. Changes from v1: - use __ipv4_neigh_lookup instead (per Eric Dumazet). Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.") Signed-off-by: Stephen Suryaputra Lin <ssu...@ieee.org> --- net/ipv4/route.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 62d4d90c1389..2a57566e6e91 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow goto reject_redirect; } - n = ipv4_neigh_lookup(>dst, NULL, _gw); + n = __ipv4_neigh_lookup(rt->dst.dev, new_gw); + if (!n) + n = neigh_create(_tbl, _gw, rt->dst.dev); if (!IS_ERR(n)) { if (!(n->nud_state & NUD_VALID)) { neigh_event_send(n, NULL); -- 2.7.4
Re: [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
I did the temporary clearing/restoring rt_gateway following the deleted function check_peer_redir(). But, looking again at the function the assigning of peer->redirect_learned.a4 to rt_gateway can be permanent because restoring to the old_gw only happens on errors. I have updated the patch to use __ipv4_neigh_lookup(). Thank you. On Mon, Nov 07, 2016 at 11:20:16AM -0500, David Miller wrote: > From: Eric Dumazet> Date: Mon, 07 Nov 2016 08:08:52 -0800 > > > In any case, rt is a shared object at that time, so even temporarily > > clearing/restoring rt_gateway seems wrong to me. > > > > I would rather call __ipv4_neigh_lookup(dst->dev, new_gw) directly at > > this point. > > Agreed.
[PATCH net,v2] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
ICMP redirects behavior is different after the commit above. An email requesting the explanation on why the behavior needs to be different was sent earlier to netdev (https://patchwork.ozlabs.org/patch/687728/). Since there isn't a reply yet, I decided to prepare this formal patch. In v2.6 kernel, it used to be that ip_rt_redirect() calls arp_bind_neighbour() which returns 0 and then the state of the neigh for the new_gw is checked. If the state isn't valid then the redirected route is deleted. This behavior is maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway is assigned to peer->redirect_learned.a4 before calling ipv4_neigh_lookup(). After the commit, ipv4_neigh_lookup() is performed without the rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw) isn't zero, the function uses it as the key. The neigh is most likely valid since the old_gw is the one that sends the ICMP redirect message. Then the new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may never gets resolved and the traffic is blackholed. Changes from v1: - use __ipv4_neigh_lookup instead (per Eric Dumazet). Signed-off-by: Stephen Suryaputra Lin <ssu...@ieee.org> --- net/ipv4/route.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 62d4d90c1389..2a57566e6e91 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow goto reject_redirect; } - n = ipv4_neigh_lookup(>dst, NULL, _gw); + n = __ipv4_neigh_lookup(rt->dst.dev, new_gw); + if (!n) + n = neigh_create(_tbl, _gw, rt->dst.dev); if (!IS_ERR(n)) { if (!(n->nud_state & NUD_VALID)) { neigh_event_send(n, NULL); -- 2.7.4
[PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
ICMP redirects behavior is different after the commit above. An email requesting the explanation on why the behavior needs to be different was sent earlier to netdev (https://patchwork.ozlabs.org/patch/687728/). Since there isn't a reply yet, I decided to prepare this formal patch. In v2.6 kernel, it used to be that ip_rt_redirect() calls arp_bind_neighbour() which returns 0 and then the state of the neigh for the new_gw is checked. If the state isn't valid then the redirected route is deleted. This behavior is maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway is assigned to peer->redirect_learned.a4 before calling ipv4_neigh_lookup(). After the commit, ipv4_neigh_lookup() is performed without the rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw) isn't zero, the function uses it as the key. The neigh is most likely valid since the old_gw is the one that sends the ICMP redirect message. Then the new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may never gets resolved and the traffic is blackholed. Signed-off-by: Stephen Suryaputra Lin <ssu...@ieee.org> --- net/ipv4/route.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 62d4d90c1389..510045cefcab 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow goto reject_redirect; } + rt->rt_gateway = 0; n = ipv4_neigh_lookup(>dst, NULL, _gw); + rt->rt_gateway = old_gw; if (!IS_ERR(n)) { if (!(n->nud_state & NUD_VALID)) { neigh_event_send(n, NULL); -- 2.7.4
ICMP redirects behavior
Hi, All, I noticed through code inspection that ICMP redirects behavior is different after commit 5943634fc5592037db0693b261f7f4bea6bb9457. In v2.6 kernel, it used to be that ip_rt_redirect() calls arp_bind_neighbour() which returns 0 and then the state of the neigh for the new_gw is checked. If the state isn't valid then the redirected route is deleted. From what I can tell, this behavior is maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway is assigned to peer->redirect_learned.a4 before calling ipv4_neigh_lookup(). After the commit, ipv4_neigh_lookup() is performed without the rt_gateway assigned to the new_gw. In my case since rt_gateway (old_gw) isn't zero, the function uses it as the key. The neigh is valid since that gateway is the one that sends the ICMP redirect message. Then the new_gw is assigned. The problem is: the new_gw ARP never gets resolved and the traffic is blackholed. My version is v3.18.24. Is there a justification for this behavioral change? I traced the origin of the code to v2.1.15 where the check is performed when rfc1620_redirects is set. I propose the following patch to restore the previous behavior. diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 62d4d90..510045c 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow goto reject_redirect; } + rt->rt_gateway = 0; n = ipv4_neigh_lookup(>dst, NULL, _gw); + rt->rt_gateway = old_gw; if (!IS_ERR(n)) { if (!(n->nud_state & NUD_VALID)) { neigh_event_send(n, NULL); Regards, Stephen.