Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
On Thu, 2017-03-16 at 16:13 -0700, Cong Wang wrote: > On Thu, Mar 16, 2017 at 2:33 PM, Eric Dumazetwrote: > > Have you backported the redirect fix ? > > > > commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 > > > > Or other fixes that went very recently (pick David Miller net tree) > > Why the commit above is relevant here? It fixes a double-release, > while Kevin's case is a double-hold... Not to mention it sk_rx_dst > instead of sk_dst_cache, according to Kevin. Yes you are right. I was lazy, the other fix I wanted to mention was 02b2faaf0af1d85585f In any case I wanted to make sure the reported issue is not for some old kernels.
Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
On Thu, Mar 16, 2017 at 2:33 PM, Eric Dumazetwrote: > Have you backported the redirect fix ? > > commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 > > Or other fixes that went very recently (pick David Miller net tree) Why the commit above is relevant here? It fixes a double-release, while Kevin's case is a double-hold... Not to mention it sk_rx_dst instead of sk_dst_cache, according to Kevin.
Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
On Thu, 2017-03-16 at 01:08 -0700, Kevin Xu wrote: > In some rare cases, inet_sk_rx_dst_set() may be called multiple times > on the same dst, causing reference count leakage. Eventually, it > prevents net_device to be destroyed. The bug then manifested as > > unregister_netdevice: waiting for lo to become free. Usage count = 1 > > in the kernel log, preventing new network namespace creation. > > The patch works around the issue by checking whether the socket already > has the same dst set. > > Signed-off-by: Kevin Xu> --- > net/ipv4/tcp_ipv4.c | 11 --- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index 575e19d..f425c14 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -1807,9 +1807,14 @@ void inet_sk_rx_dst_set(struct sock *sk, const struct > sk_buff *skb) > { > struct dst_entry *dst = skb_dst(skb); > > - if (dst && dst_hold_safe(dst)) { > - sk->sk_rx_dst = dst; > - inet_sk(sk)->rx_dst_ifindex = skb->skb_iif; > + if (dst) { > + if (unlikely(dst == sk->sk_rx_dst)) > + return; > + > + if (dst_hold_safe(dst)) { > + sk->sk_rx_dst = dst; > + inet_sk(sk)->rx_dst_ifindex = skb->skb_iif; > + } That should not be needed. Have you backported the redirect fix ? commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 Or other fixes that went very recently (pick David Miller net tree)
Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
On Thu, Mar 16, 2017 at 11:18 AM, Kaiwen Xuwrote: > On Thu, Mar 16, 2017 at 11:45:03AM +0100, Jakub Sitnicki wrote: >> On Thu, Mar 16, 2017 at 10:12 AM GMT, Kevin Xu wrote: >> > Do you mean the message looping endlessly? >> >> No, the message is emitted just once. Around 100 seconds after >> destroying a few namespaces. Occurs not so often, maybe once per ten >> runs. >> >> -Jakub >> > > I saw that happening from time to time during my test as well, I suspect > it was because some TCP sockets stays in either TCP_TIME_WAIT or > TCP_FIN_WAIT1. But eventually those sockets get destroyed and lo gets > deleted as well. But TIMEWAIT sockets are purged by inet_twsk_purge() during netns destroy, apparently before lo is destroyed. > > The patch was fixing an issue I am seeing, that the message gets looped > forever, and causing a deadlock on new network ns creation. > But as DaveM said, the race still could happen after your patch, the if check you add is not atomic at all. Also, we should already lock the sock at the time we call inet_sk_rx_dst_set(), but perhaps not for TCP_LISTEN case...
Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
On Thu, Mar 16, 2017 at 11:45:03AM +0100, Jakub Sitnicki wrote: > On Thu, Mar 16, 2017 at 10:12 AM GMT, Kevin Xu wrote: > > Do you mean the message looping endlessly? > > No, the message is emitted just once. Around 100 seconds after > destroying a few namespaces. Occurs not so often, maybe once per ten > runs. > > -Jakub > I saw that happening from time to time during my test as well, I suspect it was because some TCP sockets stays in either TCP_TIME_WAIT or TCP_FIN_WAIT1. But eventually those sockets get destroyed and lo gets deleted as well. The patch was fixing an issue I am seeing, that the message gets looped forever, and causing a deadlock on new network ns creation. Kevin > > If so, then I suppose it's a different bug. > > > > Kevin > > > >> On Mar 16, 2017, at 3:01 AM, Jakub Sitnickiwrote: > >> > >>> On Thu, Mar 16, 2017 at 08:08 AM GMT, Kevin Xu wrote: > >>> In some rare cases, inet_sk_rx_dst_set() may be called multiple times > >>> on the same dst, causing reference count leakage. Eventually, it > >>> prevents net_device to be destroyed. The bug then manifested as > >>> > >>> unregister_netdevice: waiting for lo to become free. Usage count = 1 > >>> > >>> in the kernel log, preventing new network namespace creation. > >>> > >>> The patch works around the issue by checking whether the socket already > >>> has the same dst set. > >>> > >>> Signed-off-by: Kevin Xu > >>> --- > >> > >> FWIW, with this patch applied I'm still sometimes seeing: > >> > >> [ 125.928095] unregister_netdevice: waiting for lo to become free. Usage > >> count = 1 > >> > >> -Jakub
Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
From: Kevin XuDate: Thu, 16 Mar 2017 01:08:30 -0700 > In some rare cases, inet_sk_rx_dst_set() may be called multiple times > on the same dst, causing reference count leakage. Eventually, it > prevents net_device to be destroyed. The bug then manifested as > > unregister_netdevice: waiting for lo to become free. Usage count = 1 > > in the kernel log, preventing new network namespace creation. > > The patch works around the issue by checking whether the socket already > has the same dst set. > > Signed-off-by: Kevin Xu You need to prevent this parallel execution of this function or use atomic compare-and-exchange to set the rx_dst in order to prevent the double refcounting. This patch by itself is even worse than a workaround, because depending upon how the compiler reloads values from memory, the problem can still occur.
Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
On Thu, Mar 16, 2017 at 10:12 AM GMT, Kevin Xu wrote: > Do you mean the message looping endlessly? No, the message is emitted just once. Around 100 seconds after destroying a few namespaces. Occurs not so often, maybe once per ten runs. -Jakub > If so, then I suppose it's a different bug. > > Kevin > >> On Mar 16, 2017, at 3:01 AM, Jakub Sitnickiwrote: >> >>> On Thu, Mar 16, 2017 at 08:08 AM GMT, Kevin Xu wrote: >>> In some rare cases, inet_sk_rx_dst_set() may be called multiple times >>> on the same dst, causing reference count leakage. Eventually, it >>> prevents net_device to be destroyed. The bug then manifested as >>> >>> unregister_netdevice: waiting for lo to become free. Usage count = 1 >>> >>> in the kernel log, preventing new network namespace creation. >>> >>> The patch works around the issue by checking whether the socket already >>> has the same dst set. >>> >>> Signed-off-by: Kevin Xu >>> --- >> >> FWIW, with this patch applied I'm still sometimes seeing: >> >> [ 125.928095] unregister_netdevice: waiting for lo to become free. Usage >> count = 1 >> >> -Jakub
Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
Do you mean the message looping endlessly? If so, then I suppose it's a different bug. Kevin > On Mar 16, 2017, at 3:01 AM, Jakub Sitnickiwrote: > >> On Thu, Mar 16, 2017 at 08:08 AM GMT, Kevin Xu wrote: >> In some rare cases, inet_sk_rx_dst_set() may be called multiple times >> on the same dst, causing reference count leakage. Eventually, it >> prevents net_device to be destroyed. The bug then manifested as >> >> unregister_netdevice: waiting for lo to become free. Usage count = 1 >> >> in the kernel log, preventing new network namespace creation. >> >> The patch works around the issue by checking whether the socket already >> has the same dst set. >> >> Signed-off-by: Kevin Xu >> --- > > FWIW, with this patch applied I'm still sometimes seeing: > > [ 125.928095] unregister_netdevice: waiting for lo to become free. Usage > count = 1 > > -Jakub
Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
On Thu, Mar 16, 2017 at 08:08 AM GMT, Kevin Xu wrote: > In some rare cases, inet_sk_rx_dst_set() may be called multiple times > on the same dst, causing reference count leakage. Eventually, it > prevents net_device to be destroyed. The bug then manifested as > > unregister_netdevice: waiting for lo to become free. Usage count = 1 > > in the kernel log, preventing new network namespace creation. > > The patch works around the issue by checking whether the socket already > has the same dst set. > > Signed-off-by: Kevin Xu> --- FWIW, with this patch applied I'm still sometimes seeing: [ 125.928095] unregister_netdevice: waiting for lo to become free. Usage count = 1 -Jakub
[PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
In some rare cases, inet_sk_rx_dst_set() may be called multiple times on the same dst, causing reference count leakage. Eventually, it prevents net_device to be destroyed. The bug then manifested as unregister_netdevice: waiting for lo to become free. Usage count = 1 in the kernel log, preventing new network namespace creation. The patch works around the issue by checking whether the socket already has the same dst set. Signed-off-by: Kevin Xu--- net/ipv4/tcp_ipv4.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 575e19d..f425c14 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1807,9 +1807,14 @@ void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb) { struct dst_entry *dst = skb_dst(skb); - if (dst && dst_hold_safe(dst)) { - sk->sk_rx_dst = dst; - inet_sk(sk)->rx_dst_ifindex = skb->skb_iif; + if (dst) { + if (unlikely(dst == sk->sk_rx_dst)) + return; + + if (dst_hold_safe(dst)) { + sk->sk_rx_dst = dst; + inet_sk(sk)->rx_dst_ifindex = skb->skb_iif; + } } } EXPORT_SYMBOL(inet_sk_rx_dst_set); -- 1.9.1