Re: net/arp: ARP cache aging failed.
On 2016/12/15 4:15, Julian Anastasov wrote: > > Hello, > > On Wed, 14 Dec 2016, YueHaibing wrote: > >> On 2016/11/26 4:40, Julian Anastasov wrote: >>> >>> So, the idea is to move TCP and other similar >>> users to the new dst_confirm_sk() method. If other >>> dst_confirm() users are left, they should be checked >>> if dsts with rt_gateway = 0 can be wrongly used. >> >> Sorry for so late. > > In fact, I'm late too because I almost finished > my changes, the only remaining part is the cxgb files... > >> Based on your ideas, I make a patch and test it for a while. > > The problem is that it is valid only for TCP. > Also, this flag should be reset sometimes, eg. when sk dst > changes... > >> It works for me. >> >> @@ -847,7 +847,7 @@ static int ping_v4_sendmsg(struct sock *sk, struct >> msghdr *msg, size_t len) >> return err; >> >> do_confirm: >> -dst_confirm(>dst); >> +dst_confirm_sk(sk); > > MSG_CONFIRM from sendmsg needs special treatment. The > problem is that UDP sending does not lock the socket, so I also > added a skb flag to handle this situation in ip*_append_data. > We do not want threaded application firing at different > destinations to confirm the wrong neighbour. MSG_PROBE is > another issue, the XFRM dst chaining, etc... > > I hope, I'll be ready this weekend with few patches > that change all dst_confirm users... There I'll explain > everything in detail. > > Regards > > -- > Julian Anastasov> > . > Great, I'll be glad to do a testing.
Re: net/arp: ARP cache aging failed.
Hello, On Wed, 14 Dec 2016, YueHaibing wrote: > On 2016/11/26 4:40, Julian Anastasov wrote: > > > > So, the idea is to move TCP and other similar > > users to the new dst_confirm_sk() method. If other > > dst_confirm() users are left, they should be checked > > if dsts with rt_gateway = 0 can be wrongly used. > > Sorry for so late. In fact, I'm late too because I almost finished my changes, the only remaining part is the cxgb files... > Based on your ideas, I make a patch and test it for a while. The problem is that it is valid only for TCP. Also, this flag should be reset sometimes, eg. when sk dst changes... > It works for me. > > @@ -847,7 +847,7 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr > *msg, size_t len) > return err; > > do_confirm: > - dst_confirm(>dst); > + dst_confirm_sk(sk); MSG_CONFIRM from sendmsg needs special treatment. The problem is that UDP sending does not lock the socket, so I also added a skb flag to handle this situation in ip*_append_data. We do not want threaded application firing at different destinations to confirm the wrong neighbour. MSG_PROBE is another issue, the XFRM dst chaining, etc... I hope, I'll be ready this weekend with few patches that change all dst_confirm users... There I'll explain everything in detail. Regards -- Julian Anastasov
Re: net/arp: ARP cache aging failed.
On 2016/11/26 4:40, Julian Anastasov wrote: > > Hello, > > On Fri, 25 Nov 2016, Hannes Frederic Sowa wrote: > >> On 25.11.2016 09:18, Julian Anastasov wrote: >>> >>> Another option would be to add similar bit to >>> struct sock (sk_dst_pending_confirm), may be ip_finish_output2 >>> can propagate it to dst_neigh_output via new arg and then to >>> reset it. >> >> I don't understand how this can help? Maybe I understood it wrong? > > The idea is, the indication from highest level (TCP) to > be propagated to lowest level (neigh). > >>> Or to change n->confirmed via some new inline sock >>> function instead of changing dst_neigh_output. At this place >>> skb->sk is optional, may be we need (skb->sk && dst == >>> skb->sk->sk_dst_cache) check. Also, when sk_dst_cache changes >>> we should clear this flag. >> >> In "(skb->sk && dst == skb->sk->sk_dst_cache)" where does dst come from? > > This is in case we do not want to propagate indication > from TCP to tunnels, see below... > >> I don't see a possibility besides using mac_header or inner_mac_header >> to look up the incoming MAC address and confirm that one in the neighbor >> cache so far (we could try to optimize this case for rt_gateway though). > > My idea is as follows: > > - TCP instead of calling dst_confirm should call new func > dst_confirm_sk(sk) where sk->sk_dst_pending_confirm is set, > not dst->pending_confirm. > > - ip_finish_output2: use skb->sk (TCP) to check for > sk_dst_pending_confirm and update n->confirmed in some > new inline func > > Correct me if I'm wrong but here is how I see the > situation: > > - TCP caches output dst in socket, for example, dst1, > sets it as skb->dst > > - if xfrm tunnels are used then dst1 can point to some tunnel, > i.e. it is not in all cases the same skb->dst, as seen by > ip_finish_output2 > > - netfilter can DNAT and assign different skb->dst > > - as result, before now, dst1->pending_confirm is set but > it is not used properly because ip_finish_output2 uses > the final skb->dst which can be different or with > rt_gateway = 0 > > - considering that tunnels do not use dst_confirm, > n->confirmed is not changed every time. There are some > interesting cases: > > 1. both dst1 and the final skb->dst point to same > dst with rt_gateway = 0: n->confirmed is updated but > as wee see it can be for wrong neigh. > > 2. dst1 is different from skb->dst, so n->confirmed is > not changed. This can happen on DNAT or when using > tunnel to secure gateway. > > - in ip_finish_output2 we have skb->sk (original TCP sk) and > sk arg (equal to skb->sk or second option is sock of some UDP > tunnel, etc). The idea is to use skb->sk->sk_dst_pending_confirm, > i.e. highest level sk because the sk arg, if different, does not > have such flag set (tunnels do not call dst_confirm). > > - ip_finish_output2 should call new func dst_neigh_confirm_sk > (which has nothing related to dst, hm... better name is needed): > > if (!IS_ERR(neigh)) { > - int res = dst_neigh_output(dst, neigh, skb); > + int res; > + > + dst_neigh_confirm_sk(skb->sk, neigh); > + res = dst_neigh_output(dst, neigh, skb); > > which should do something like this: > > if (sk && sk->sk_dst_pending_confirm) { > unsigned long now = jiffies; > > sk->sk_dst_pending_confirm = 0; > /* avoid dirtying neighbour */ > if (n->confirmed != now) > n->confirmed = now; > } > > Additional dst == skb->sk->sk_dst_cache check will > not propagate the indication on DNAT/tunnel. For me, > it is better without such check. > > So, the idea is to move TCP and other similar > users to the new dst_confirm_sk() method. If other > dst_confirm() users are left, they should be checked > if dsts with rt_gateway = 0 can be wrongly used. > > Regards > > -- > Julian Anastasov> > . > Sorry for so late. Based on your ideas, I make a patch and test it for a while. It works for me. >From dc033fe4bac8931590e15774a298f04ccea8ed4c Mon Sep 17 00:00:00 2001 From: YueHabing Date: Wed, 14 Dec 2016 02:43:51 -0500 Subject: [PATCH] arp: do neigh confirm based on sk arg Signed-off-by: YueHabing --- include/net/sock.h | 18 ++ net/ipv4/ip_output.c | 5 - net/ipv4/ping.c| 2 +- net/ipv4/raw.c | 2 +- net/ipv4/tcp_input.c | 8 ++-- net/ipv4/tcp_metrics.c | 5 +++-- net/ipv4/udp.c | 2 +- net/ipv6/raw.c | 2 +- net/ipv6/route.c | 6 +++--- net/ipv6/udp.c | 2 +- 10 files changed, 35 insertions(+), 17 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 92b2697..ece8dfa 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -434,6 +434,7 @@ struct sock { struct sk_buff *sk_send_head; __s32
Re: net/arp: ARP cache aging failed.
Hello, On Fri, 25 Nov 2016, Hannes Frederic Sowa wrote: > On 25.11.2016 09:18, Julian Anastasov wrote: > > > > Another option would be to add similar bit to > > struct sock (sk_dst_pending_confirm), may be ip_finish_output2 > > can propagate it to dst_neigh_output via new arg and then to > > reset it. > > I don't understand how this can help? Maybe I understood it wrong? The idea is, the indication from highest level (TCP) to be propagated to lowest level (neigh). > > Or to change n->confirmed via some new inline sock > > function instead of changing dst_neigh_output. At this place > > skb->sk is optional, may be we need (skb->sk && dst == > > skb->sk->sk_dst_cache) check. Also, when sk_dst_cache changes > > we should clear this flag. > > In "(skb->sk && dst == skb->sk->sk_dst_cache)" where does dst come from? This is in case we do not want to propagate indication from TCP to tunnels, see below... > I don't see a possibility besides using mac_header or inner_mac_header > to look up the incoming MAC address and confirm that one in the neighbor > cache so far (we could try to optimize this case for rt_gateway though). My idea is as follows: - TCP instead of calling dst_confirm should call new func dst_confirm_sk(sk) where sk->sk_dst_pending_confirm is set, not dst->pending_confirm. - ip_finish_output2: use skb->sk (TCP) to check for sk_dst_pending_confirm and update n->confirmed in some new inline func Correct me if I'm wrong but here is how I see the situation: - TCP caches output dst in socket, for example, dst1, sets it as skb->dst - if xfrm tunnels are used then dst1 can point to some tunnel, i.e. it is not in all cases the same skb->dst, as seen by ip_finish_output2 - netfilter can DNAT and assign different skb->dst - as result, before now, dst1->pending_confirm is set but it is not used properly because ip_finish_output2 uses the final skb->dst which can be different or with rt_gateway = 0 - considering that tunnels do not use dst_confirm, n->confirmed is not changed every time. There are some interesting cases: 1. both dst1 and the final skb->dst point to same dst with rt_gateway = 0: n->confirmed is updated but as wee see it can be for wrong neigh. 2. dst1 is different from skb->dst, so n->confirmed is not changed. This can happen on DNAT or when using tunnel to secure gateway. - in ip_finish_output2 we have skb->sk (original TCP sk) and sk arg (equal to skb->sk or second option is sock of some UDP tunnel, etc). The idea is to use skb->sk->sk_dst_pending_confirm, i.e. highest level sk because the sk arg, if different, does not have such flag set (tunnels do not call dst_confirm). - ip_finish_output2 should call new func dst_neigh_confirm_sk (which has nothing related to dst, hm... better name is needed): if (!IS_ERR(neigh)) { - int res = dst_neigh_output(dst, neigh, skb); + int res; + + dst_neigh_confirm_sk(skb->sk, neigh); + res = dst_neigh_output(dst, neigh, skb); which should do something like this: if (sk && sk->sk_dst_pending_confirm) { unsigned long now = jiffies; sk->sk_dst_pending_confirm = 0; /* avoid dirtying neighbour */ if (n->confirmed != now) n->confirmed = now; } Additional dst == skb->sk->sk_dst_cache check will not propagate the indication on DNAT/tunnel. For me, it is better without such check. So, the idea is to move TCP and other similar users to the new dst_confirm_sk() method. If other dst_confirm() users are left, they should be checked if dsts with rt_gateway = 0 can be wrongly used. Regards -- Julian Anastasov
Re: net/arp: ARP cache aging failed.
On 25.11.2016 09:18, Julian Anastasov wrote: > > Hello, > > On Thu, 24 Nov 2016, Hannes Frederic Sowa wrote: > >> I think some people are thinking about it already (me also ;) ). >> >> But it is not easy to come up with a solution. First of all, we need to >> look up the L2 address again in the neighbor cache and confirm the >> appropriate neighbor. Secondly we should only do that for packets which >> we can actually confirm (that means passing the TCP recv tests or some >> other kind of confirmation besides simply spamming the box etc). Also it >> needs to be fast. > > Another option would be to add similar bit to > struct sock (sk_dst_pending_confirm), may be ip_finish_output2 > can propagate it to dst_neigh_output via new arg and then to > reset it. I don't understand how this can help? Maybe I understood it wrong? > Or to change n->confirmed via some new inline sock > function instead of changing dst_neigh_output. At this place > skb->sk is optional, may be we need (skb->sk && dst == > skb->sk->sk_dst_cache) check. Also, when sk_dst_cache changes > we should clear this flag. In "(skb->sk && dst == skb->sk->sk_dst_cache)" where does dst come from? I don't see a possibility besides using mac_header or inner_mac_header to look up the incoming MAC address and confirm that one in the neighbor cache so far (we could try to optimize this case for rt_gateway though). Bye, Hannes
Re: net/arp: ARP cache aging failed.
Hello, On Thu, 24 Nov 2016, Hannes Frederic Sowa wrote: > I think some people are thinking about it already (me also ;) ). > > But it is not easy to come up with a solution. First of all, we need to > look up the L2 address again in the neighbor cache and confirm the > appropriate neighbor. Secondly we should only do that for packets which > we can actually confirm (that means passing the TCP recv tests or some > other kind of confirmation besides simply spamming the box etc). Also it > needs to be fast. Another option would be to add similar bit to struct sock (sk_dst_pending_confirm), may be ip_finish_output2 can propagate it to dst_neigh_output via new arg and then to reset it. Or to change n->confirmed via some new inline sock function instead of changing dst_neigh_output. At this place skb->sk is optional, may be we need (skb->sk && dst == skb->sk->sk_dst_cache) check. Also, when sk_dst_cache changes we should clear this flag. Regards -- Julian Anastasov
Re: net/arp: ARP cache aging failed.
On 24.11.2016 10:06, YueHaibing wrote: > On 2016/11/24 15:51, Julian Anastasov wrote: >> >> Hello, >> >> On Wed, 23 Nov 2016, Eric Dumazet wrote: >> >>> On Wed, 2016-11-23 at 15:37 +0100, Hannes Frederic Sowa wrote: >>> Irregardless about the question if bonding should keep the MAC address alive, a MAC address can certainly change below a TCP connection. >>> >>> Of course ;) >>> > > I configured bonding fail_over_mac=1 ,so the bonding MAC always be the MAC > address of the currently active slave. > dst_entry is 1:n to neigh_entry and as such we can end up confirming an aging neighbor while sending a reply with dst->pending_confirm set while the confirming packet actually came from a different neighbor. I agree with Julian, pending_confirm became useless in this way. >>> >>> Let's kill it then ;) >> >> It works for traffic via gateway. I now see that >> we can even avoid write in dst_confirm: >> >> if (!dst->pending_confirm) >> dst->pending_confirm = 1; >> >> because it is called by non-dup TCP ACKs. >> >> But for traffic to hosts on LAN we need different solution, >> i.e. for cached dsts with rt_gateway = 0 (last entry below). >> >> rt_uses_gateway rt_gateway DST_NOCACHE Description >> >> 1 nh_gw ANY Traffic via gateway >> 0 LAN_host 1 FLOWI_FLAG_KNOWN_NH (nexthop >>set by IPVS, hdrincl, xt_TEE) >> 0 0 0 1 dst for many subnet hosts >> >> Regards >> >> -- >> Julian Anastasov>> >> . >> > > As above,Is there a plan to fix the problem ? Should we just not call > dst_confirm > when in the case rt->rt_uses_gateway/DST_NOCACHE? I think some people are thinking about it already (me also ;) ). But it is not easy to come up with a solution. First of all, we need to look up the L2 address again in the neighbor cache and confirm the appropriate neighbor. Secondly we should only do that for packets which we can actually confirm (that means passing the TCP recv tests or some other kind of confirmation besides simply spamming the box etc). Also it needs to be fast. Bye, Hannes
Re: net/arp: ARP cache aging failed.
On 2016/11/24 15:51, Julian Anastasov wrote: > > Hello, > > On Wed, 23 Nov 2016, Eric Dumazet wrote: > >> On Wed, 2016-11-23 at 15:37 +0100, Hannes Frederic Sowa wrote: >> >>> Irregardless about the question if bonding should keep the MAC address >>> alive, a MAC address can certainly change below a TCP connection. >> >> Of course ;) >> I configured bonding fail_over_mac=1 ,so the bonding MAC always be the MAC address of the currently active slave. >>> >>> dst_entry is 1:n to neigh_entry and as such we can end up confirming an >>> aging neighbor while sending a reply with dst->pending_confirm set while >>> the confirming packet actually came from a different neighbor. >>> >>> I agree with Julian, pending_confirm became useless in this way. >> >> Let's kill it then ;) > > It works for traffic via gateway. I now see that > we can even avoid write in dst_confirm: > > if (!dst->pending_confirm) > dst->pending_confirm = 1; > > because it is called by non-dup TCP ACKs. > > But for traffic to hosts on LAN we need different solution, > i.e. for cached dsts with rt_gateway = 0 (last entry below). > > rt_uses_gateway rt_gateway DST_NOCACHE Description > > 1 nh_gw ANY Traffic via gateway > 0 LAN_host 1 FLOWI_FLAG_KNOWN_NH (nexthop >set by IPVS, hdrincl, xt_TEE) > 0 0 0 1 dst for many subnet hosts > > Regards > > -- > Julian Anastasov> > . > As above,Is there a plan to fix the problem ? Should we just not call dst_confirm when in the case rt->rt_uses_gateway/DST_NOCACHE?
Re: net/arp: ARP cache aging failed.
Hello, On Wed, 23 Nov 2016, Eric Dumazet wrote: > On Wed, 2016-11-23 at 15:37 +0100, Hannes Frederic Sowa wrote: > > > Irregardless about the question if bonding should keep the MAC address > > alive, a MAC address can certainly change below a TCP connection. > > Of course ;) > > > > > dst_entry is 1:n to neigh_entry and as such we can end up confirming an > > aging neighbor while sending a reply with dst->pending_confirm set while > > the confirming packet actually came from a different neighbor. > > > > I agree with Julian, pending_confirm became useless in this way. > > Let's kill it then ;) It works for traffic via gateway. I now see that we can even avoid write in dst_confirm: if (!dst->pending_confirm) dst->pending_confirm = 1; because it is called by non-dup TCP ACKs. But for traffic to hosts on LAN we need different solution, i.e. for cached dsts with rt_gateway = 0 (last entry below). rt_uses_gateway rt_gateway DST_NOCACHE Description 1 nh_gw ANY Traffic via gateway 0 LAN_host 1 FLOWI_FLAG_KNOWN_NH (nexthop set by IPVS, hdrincl, xt_TEE) 0 0 0 1 dst for many subnet hosts Regards -- Julian Anastasov
Re: net/arp: ARP cache aging failed.
On Wed, 2016-11-23 at 15:37 +0100, Hannes Frederic Sowa wrote: > Irregardless about the question if bonding should keep the MAC address > alive, a MAC address can certainly change below a TCP connection. Of course ;) > > dst_entry is 1:n to neigh_entry and as such we can end up confirming an > aging neighbor while sending a reply with dst->pending_confirm set while > the confirming packet actually came from a different neighbor. > > I agree with Julian, pending_confirm became useless in this way. Let's kill it then ;)
Re: net/arp: ARP cache aging failed.
On 23.11.2016 13:05, Eric Dumazet wrote: > On Wed, 2016-11-23 at 10:33 +0200, Julian Anastasov wrote: >> Hello, >> >> On Wed, 23 Nov 2016, yuehaibing wrote: >> >>> As to my topo,HOST1 and HOST3 share one route on HOST2, tcp connection >>> between HOST2 and HOST3 may call tcp_ack to set dst->pending_confirm. >>> >>> So dst_neigh_output may wrongly freshed n->confirmed which stands for >>> HOST1,however HOST1'MAC had been changed. >>> >>> The possibility of this occurred Significantly increases ,when ping and >>> TCP transaction are set the same processor affinity on the HOST2. >>> >>> It seems that the issue is brought in commit >>> 5110effee8fde2edfacac9cd12a9960ab2dc39ea ("net: Do delayed neigh >>> confirmation."). >> >> Bad news. Problem is not in delayed confirmation but >> in the mechanism to use same dst for different neighbours on >> LAN. We don't have a dst->neighbour reference anymore. >> >> For IPv4 this is related to rt->rt_uses_gateway but >> also to DST_NOCACHE. In the other cases we can not call >> dst_confirm, may be we should lookup the neigh entry instead. >> But we need a way to reduce such lookups on every packet, >> for example, by remembering in struct sock and checking if >> some bits of jiffies (at least 4-5) are changed from >> previous lookup. > > > I thought bonding would keep the MAC address 'alive'. I wonder about this, too. > If TCP packets are confirmed, this means the old MAC address is still > valid, what am I missing here ? Irregardless about the question if bonding should keep the MAC address alive, a MAC address can certainly change below a TCP connection. dst_entry is 1:n to neigh_entry and as such we can end up confirming an aging neighbor while sending a reply with dst->pending_confirm set while the confirming packet actually came from a different neighbor. I agree with Julian, pending_confirm became useless in this way. Bye, Hannes
Re: net/arp: ARP cache aging failed.
On Wed, 2016-11-23 at 10:33 +0200, Julian Anastasov wrote: > Hello, > > On Wed, 23 Nov 2016, yuehaibing wrote: > > > As to my topo,HOST1 and HOST3 share one route on HOST2, tcp connection > > between HOST2 and HOST3 may call tcp_ack to set dst->pending_confirm. > > > > So dst_neigh_output may wrongly freshed n->confirmed which stands for > > HOST1,however HOST1'MAC had been changed. > > > > The possibility of this occurred Significantly increases ,when ping and > > TCP transaction are set the same processor affinity on the HOST2. > > > > It seems that the issue is brought in commit > > 5110effee8fde2edfacac9cd12a9960ab2dc39ea ("net: Do delayed neigh > > confirmation."). > > Bad news. Problem is not in delayed confirmation but > in the mechanism to use same dst for different neighbours on > LAN. We don't have a dst->neighbour reference anymore. > > For IPv4 this is related to rt->rt_uses_gateway but > also to DST_NOCACHE. In the other cases we can not call > dst_confirm, may be we should lookup the neigh entry instead. > But we need a way to reduce such lookups on every packet, > for example, by remembering in struct sock and checking if > some bits of jiffies (at least 4-5) are changed from > previous lookup. I thought bonding would keep the MAC address 'alive'. If TCP packets are confirmed, this means the old MAC address is still valid, what am I missing here ?
Re: net/arp: ARP cache aging failed.
Hello, On Wed, 23 Nov 2016, yuehaibing wrote: > As to my topo,HOST1 and HOST3 share one route on HOST2, tcp connection > between HOST2 and HOST3 may call tcp_ack to set dst->pending_confirm. > > So dst_neigh_output may wrongly freshed n->confirmed which stands for > HOST1,however HOST1'MAC had been changed. > > The possibility of this occurred Significantly increases ,when ping and > TCP transaction are set the same processor affinity on the HOST2. > > It seems that the issue is brought in commit > 5110effee8fde2edfacac9cd12a9960ab2dc39ea ("net: Do delayed neigh > confirmation."). Bad news. Problem is not in delayed confirmation but in the mechanism to use same dst for different neighbours on LAN. We don't have a dst->neighbour reference anymore. For IPv4 this is related to rt->rt_uses_gateway but also to DST_NOCACHE. In the other cases we can not call dst_confirm, may be we should lookup the neigh entry instead. But we need a way to reduce such lookups on every packet, for example, by remembering in struct sock and checking if some bits of jiffies (at least 4-5) are changed from previous lookup. Regards -- Julian Anastasov
net/arp: ARP cache aging failed.
Hi, I've encountered a arp cache aging failed bug in 4.9 kernel.The topo is as follow: HOST1 - IP1 | Switch |IP2-| HOST2 | | - --Bonging | | | IP3 MAC1MAC2 | HOST3 | HOST1 have a bonding interface which including two NICs IP1:192.168.1.100/24 IP2:192.168.1.200/24 IP2:192.168.1.300/24 There are large numbers of TCP transaction between HOST2 and HOST3. The Host2 can ping HOST1 normally.However,It cannot ping after HOST1 bonding interface deactived a working NIC. on HOST2 ,use fowllow command: watch "ip -s neigh show|grep 192.168.1.100" I noticed the old HOST1 arp cache aging counter is gradually increased ,then reset before it reached 30.This process is repeated, thus the arp cache holding REACHABLE status.The new HOST1 MAC arp cache cannot been renewed ,and thus ping cannot sendto the correct HOST1 MAC. Then I found n->confirmed is freshed in dst_neigh_output while dst->pending_confirm is set to 1. include/net/dst.h static inline void dst_confirm(struct dst_entry *dst) { dst->pending_confirm = 1; } static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n, struct sk_buff *skb) { const struct hh_cache *hh; if (dst->pending_confirm) { unsigned long now = jiffies; dst->pending_confirm = 0; /* avoid dirtying neighbour */ if (n->confirmed != now) n->confirmed = now; } ... } dst_confirm can be called by tcp_ack in net/ipv4/tcp_input.c. /* This routine deals with incoming acks, but not outgoing ones. */ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) { . if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) { struct dst_entry *dst = __sk_dst_get(sk); if (dst) dst_confirm(dst); } . } As to my topo,HOST1 and HOST3 share one route on HOST2, tcp connection between HOST2 and HOST3 may call tcp_ack to set dst->pending_confirm. So dst_neigh_output may wrongly freshed n->confirmed which stands for HOST1,however HOST1'MAC had been changed. The possibility of this occurred Significantly increases ,when ping and TCP transaction are set the same processor affinity on the HOST2. It seems that the issue is brought in commit 5110effee8fde2edfacac9cd12a9960ab2dc39ea ("net: Do delayed neigh confirmation.").