Faster TCP keepalive

2017-06-23 Thread Stephen Suryaputra Lin
Greetings,

I'm writing this to probe if there has been thoughts or efforts in
allowing sub-second TCP keep alive interval? One application is for TCP
connections between IP hosts connected by an internal backplane where a
faster detection is a necessity and the increased traffic can be
accommodated.

Suggestions on other ways to quickly tearing down TCP connections to a
rebooted host in the application above are welcomed.

Thank you,

Stephen.


[PATCH net,v2] ipv4: use new_gw for redirect neigh lookup

2016-11-10 Thread Stephen Suryaputra Lin
In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
and then the state of the neigh for the new_gw is checked. If the state
isn't valid then the redirected route is deleted. This behavior is
maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
is assigned to peer->redirect_learned.a4 before calling
ipv4_neigh_lookup().

After commit 5943634fc559 ("ipv4: Maintain redirect and PMTU info in
struct rtable again."), ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely
valid since the old_gw is the one that sends the ICMP redirect message.
Then the new_gw is assigned to fib_nh_exception. The problem is: the
new_gw ARP may never gets resolved and the traffic is blackholed.

So, use the new_gw for neigh lookup.

Changes from v1:
 - use __ipv4_neigh_lookup instead (per Eric Dumazet).

Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable 
again.")
Signed-off-by: Stephen Suryaputra Lin <ssu...@ieee.org>
---
 net/ipv4/route.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 62d4d90c1389..2a57566e6e91 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct 
sk_buff *skb, struct flow
goto reject_redirect;
}
 
-   n = ipv4_neigh_lookup(>dst, NULL, _gw);
+   n = __ipv4_neigh_lookup(rt->dst.dev, new_gw);
+   if (!n)
+   n = neigh_create(_tbl, _gw, rt->dst.dev);
if (!IS_ERR(n)) {
if (!(n->nud_state & NUD_VALID)) {
neigh_event_send(n, NULL);
-- 
2.7.4



Re: [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")

2016-11-07 Thread Stephen Suryaputra Lin
I did the temporary clearing/restoring rt_gateway following the deleted
function check_peer_redir(). But, looking again at the function the
assigning of peer->redirect_learned.a4 to rt_gateway can be permanent
because restoring to the old_gw only happens on errors.

I have updated the patch to use __ipv4_neigh_lookup().

Thank you.

On Mon, Nov 07, 2016 at 11:20:16AM -0500, David Miller wrote:
> From: Eric Dumazet 
> Date: Mon, 07 Nov 2016 08:08:52 -0800
> 
> > In any case, rt is a shared object at that time, so even temporarily
> > clearing/restoring rt_gateway seems wrong to me.
> > 
> > I would rather call __ipv4_neigh_lookup(dst->dev, new_gw) directly at
> > this point.
> 
> Agreed.


[PATCH net,v2] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")

2016-11-07 Thread Stephen Suryaputra Lin
ICMP redirects behavior is different after the commit above. An email
requesting the explanation on why the behavior needs to be different
was sent earlier to netdev (https://patchwork.ozlabs.org/patch/687728/).
Since there isn't a reply yet, I decided to prepare this formal patch.

In v2.6 kernel, it used to be that ip_rt_redirect() calls
arp_bind_neighbour() which returns 0 and then the state of the neigh for
the new_gw is checked. If the state isn't valid then the redirected
route is deleted. This behavior is maintained up to v3.5.7 by
check_peer_redirect() because rt->rt_gateway is assigned to
peer->redirect_learned.a4 before calling ipv4_neigh_lookup().

After the commit, ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely valid
since the old_gw is the one that sends the ICMP redirect message. Then the
new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may
never gets resolved and the traffic is blackholed.

Changes from v1:
 - use __ipv4_neigh_lookup instead (per Eric Dumazet).

Signed-off-by: Stephen Suryaputra Lin <ssu...@ieee.org>
---
 net/ipv4/route.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 62d4d90c1389..2a57566e6e91 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct 
sk_buff *skb, struct flow
goto reject_redirect;
}
 
-   n = ipv4_neigh_lookup(>dst, NULL, _gw);
+   n = __ipv4_neigh_lookup(rt->dst.dev, new_gw);
+   if (!n)
+   n = neigh_create(_tbl, _gw, rt->dst.dev);
if (!IS_ERR(n)) {
if (!(n->nud_state & NUD_VALID)) {
neigh_event_send(n, NULL);
-- 
2.7.4



[PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")

2016-11-07 Thread Stephen Suryaputra Lin
ICMP redirects behavior is different after the commit above. An email
requesting the explanation on why the behavior needs to be different
was sent earlier to netdev (https://patchwork.ozlabs.org/patch/687728/).
Since there isn't a reply yet, I decided to prepare this formal patch.

In v2.6 kernel, it used to be that ip_rt_redirect() calls
arp_bind_neighbour() which returns 0 and then the state of the neigh for
the new_gw is checked. If the state isn't valid then the redirected
route is deleted. This behavior is maintained up to v3.5.7 by
check_peer_redirect() because rt->rt_gateway is assigned to
peer->redirect_learned.a4 before calling ipv4_neigh_lookup().

After the commit, ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely valid
since the old_gw is the one that sends the ICMP redirect message. Then the
new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may
never gets resolved and the traffic is blackholed.

Signed-off-by: Stephen Suryaputra Lin <ssu...@ieee.org>
---
 net/ipv4/route.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 62d4d90c1389..510045cefcab 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct 
sk_buff *skb, struct flow
goto reject_redirect;
}
 
+   rt->rt_gateway = 0;
n = ipv4_neigh_lookup(>dst, NULL, _gw);
+   rt->rt_gateway = old_gw;
if (!IS_ERR(n)) {
if (!(n->nud_state & NUD_VALID)) {
neigh_event_send(n, NULL);
-- 
2.7.4



ICMP redirects behavior

2016-10-27 Thread Stephen Suryaputra Lin
Hi, All,

I noticed through code inspection that ICMP redirects behavior is
different after commit 5943634fc5592037db0693b261f7f4bea6bb9457.

In v2.6 kernel, it used to be that ip_rt_redirect() calls
arp_bind_neighbour() which returns 0 and then the state of the neigh for
the new_gw is checked. If the state isn't valid then the redirected
route is deleted. From what I can tell, this behavior is maintained up
to v3.5.7 by check_peer_redirect() because rt->rt_gateway is assigned to
peer->redirect_learned.a4 before calling ipv4_neigh_lookup().

After the commit, ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In my case since rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is valid since
that gateway is the one that sends the ICMP redirect message. Then the
new_gw is assigned. The problem is: the new_gw ARP never gets resolved 
and the traffic is blackholed. My version is v3.18.24.

Is there a justification for this behavioral change? I traced the origin
of the code to v2.1.15 where the check is performed when
rfc1620_redirects is set. I propose the following patch to restore the
previous behavior.

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 62d4d90..510045c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt,
struct sk_buff *skb, struct flow
goto reject_redirect;
}

+   rt->rt_gateway = 0;
n = ipv4_neigh_lookup(>dst, NULL, _gw);
+   rt->rt_gateway = old_gw;
if (!IS_ERR(n)) {
if (!(n->nud_state & NUD_VALID)) {
neigh_event_send(n, NULL);

Regards,

Stephen.