Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.

2017-03-16 Thread Eric Dumazet
On Thu, 2017-03-16 at 16:13 -0700, Cong Wang wrote:
> On Thu, Mar 16, 2017 at 2:33 PM, Eric Dumazet  wrote:
> > Have you backported the redirect fix ?
> >
> > commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0
> >
> > Or other fixes that went very recently (pick David Miller net tree)
> 
> Why the commit above is relevant here? It fixes a double-release,
> while Kevin's case is a double-hold... Not to mention it sk_rx_dst
> instead of sk_dst_cache, according to Kevin.

Yes you are right.

I was lazy, the other fix I wanted to mention was 
02b2faaf0af1d85585f

In any case I wanted to make sure the reported issue is not for some old
kernels.





Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.

2017-03-16 Thread Cong Wang
On Thu, Mar 16, 2017 at 2:33 PM, Eric Dumazet  wrote:
> Have you backported the redirect fix ?
>
> commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0
>
> Or other fixes that went very recently (pick David Miller net tree)

Why the commit above is relevant here? It fixes a double-release,
while Kevin's case is a double-hold... Not to mention it sk_rx_dst
instead of sk_dst_cache, according to Kevin.


Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.

2017-03-16 Thread Eric Dumazet
On Thu, 2017-03-16 at 01:08 -0700, Kevin Xu wrote:
> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> on the same dst, causing reference count leakage. Eventually, it
> prevents net_device to be destroyed. The bug then manifested as
> 
> unregister_netdevice: waiting for lo to become free. Usage count = 1
> 
> in the kernel log, preventing new network namespace creation.
> 
> The patch works around the issue by checking whether the socket already
> has the same dst set.
> 
> Signed-off-by: Kevin Xu 
> ---
>  net/ipv4/tcp_ipv4.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 575e19d..f425c14 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1807,9 +1807,14 @@ void inet_sk_rx_dst_set(struct sock *sk, const struct 
> sk_buff *skb)
>  {
>   struct dst_entry *dst = skb_dst(skb);
>  
> - if (dst && dst_hold_safe(dst)) {
> - sk->sk_rx_dst = dst;
> - inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
> + if (dst) {
> + if (unlikely(dst == sk->sk_rx_dst))
> + return;
> +
> + if (dst_hold_safe(dst)) {
> + sk->sk_rx_dst = dst;
> + inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
> + }


That should not be needed.

Have you backported the redirect fix ?

commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0

Or other fixes that went very recently (pick David Miller net tree)





Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.

2017-03-16 Thread Cong Wang
On Thu, Mar 16, 2017 at 11:18 AM, Kaiwen Xu  wrote:
> On Thu, Mar 16, 2017 at 11:45:03AM +0100, Jakub Sitnicki wrote:
>> On Thu, Mar 16, 2017 at 10:12 AM GMT, Kevin Xu wrote:
>> > Do you mean the message looping endlessly?
>>
>> No, the message is emitted just once. Around 100 seconds after
>> destroying a few namespaces. Occurs not so often, maybe once per ten
>> runs.
>>
>> -Jakub
>>
>
> I saw that happening from time to time during my test as well, I suspect
> it was because some TCP sockets stays in either TCP_TIME_WAIT or
> TCP_FIN_WAIT1. But eventually those sockets get destroyed and lo gets
> deleted as well.

But TIMEWAIT sockets are purged by inet_twsk_purge() during netns
destroy, apparently before lo is destroyed.

>
> The patch was fixing an issue I am seeing, that the message gets looped
> forever, and causing a deadlock on new network ns creation.
>

But as DaveM said, the race still could happen after your patch,
the if check you add is not atomic at all.

Also, we should already lock the sock at the time we call
inet_sk_rx_dst_set(), but perhaps not for TCP_LISTEN case...


Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.

2017-03-16 Thread Kaiwen Xu
On Thu, Mar 16, 2017 at 11:45:03AM +0100, Jakub Sitnicki wrote:
> On Thu, Mar 16, 2017 at 10:12 AM GMT, Kevin Xu wrote:
> > Do you mean the message looping endlessly?
> 
> No, the message is emitted just once. Around 100 seconds after
> destroying a few namespaces. Occurs not so often, maybe once per ten
> runs.
> 
> -Jakub
> 

I saw that happening from time to time during my test as well, I suspect
it was because some TCP sockets stays in either TCP_TIME_WAIT or
TCP_FIN_WAIT1. But eventually those sockets get destroyed and lo gets
deleted as well.

The patch was fixing an issue I am seeing, that the message gets looped
forever, and causing a deadlock on new network ns creation.

Kevin

> > If so, then I suppose it's a different bug.
> >
> > Kevin
> >
> >> On Mar 16, 2017, at 3:01 AM, Jakub Sitnicki  wrote:
> >>
> >>> On Thu, Mar 16, 2017 at 08:08 AM GMT, Kevin Xu wrote:
> >>> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> >>> on the same dst, causing reference count leakage. Eventually, it
> >>> prevents net_device to be destroyed. The bug then manifested as
> >>>
> >>> unregister_netdevice: waiting for lo to become free. Usage count = 1
> >>>
> >>> in the kernel log, preventing new network namespace creation.
> >>>
> >>> The patch works around the issue by checking whether the socket already
> >>> has the same dst set.
> >>>
> >>> Signed-off-by: Kevin Xu 
> >>> ---
> >>
> >> FWIW, with this patch applied I'm still sometimes seeing:
> >>
> >> [  125.928095] unregister_netdevice: waiting for lo to become free. Usage 
> >> count = 1
> >>
> >> -Jakub


Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.

2017-03-16 Thread David Miller
From: Kevin Xu 
Date: Thu, 16 Mar 2017 01:08:30 -0700

> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> on the same dst, causing reference count leakage. Eventually, it
> prevents net_device to be destroyed. The bug then manifested as
> 
> unregister_netdevice: waiting for lo to become free. Usage count = 1
> 
> in the kernel log, preventing new network namespace creation.
> 
> The patch works around the issue by checking whether the socket already
> has the same dst set.
> 
> Signed-off-by: Kevin Xu 

You need to prevent this parallel execution of this function or use
atomic compare-and-exchange to set the rx_dst in order to prevent
the double refcounting.

This patch by itself is even worse than a workaround, because depending
upon how the compiler reloads values from memory, the problem can still
occur.


Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.

2017-03-16 Thread Jakub Sitnicki
On Thu, Mar 16, 2017 at 10:12 AM GMT, Kevin Xu wrote:
> Do you mean the message looping endlessly?

No, the message is emitted just once. Around 100 seconds after
destroying a few namespaces. Occurs not so often, maybe once per ten
runs.

-Jakub

> If so, then I suppose it's a different bug.
>
> Kevin
>
>> On Mar 16, 2017, at 3:01 AM, Jakub Sitnicki  wrote:
>>
>>> On Thu, Mar 16, 2017 at 08:08 AM GMT, Kevin Xu wrote:
>>> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
>>> on the same dst, causing reference count leakage. Eventually, it
>>> prevents net_device to be destroyed. The bug then manifested as
>>>
>>> unregister_netdevice: waiting for lo to become free. Usage count = 1
>>>
>>> in the kernel log, preventing new network namespace creation.
>>>
>>> The patch works around the issue by checking whether the socket already
>>> has the same dst set.
>>>
>>> Signed-off-by: Kevin Xu 
>>> ---
>>
>> FWIW, with this patch applied I'm still sometimes seeing:
>>
>> [  125.928095] unregister_netdevice: waiting for lo to become free. Usage 
>> count = 1
>>
>> -Jakub


Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.

2017-03-16 Thread Kevin Xu
Do you mean the message looping endlessly?

If so, then I suppose it's a different bug.

Kevin

> On Mar 16, 2017, at 3:01 AM, Jakub Sitnicki  wrote:
> 
>> On Thu, Mar 16, 2017 at 08:08 AM GMT, Kevin Xu wrote:
>> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
>> on the same dst, causing reference count leakage. Eventually, it
>> prevents net_device to be destroyed. The bug then manifested as
>> 
>> unregister_netdevice: waiting for lo to become free. Usage count = 1
>> 
>> in the kernel log, preventing new network namespace creation.
>> 
>> The patch works around the issue by checking whether the socket already
>> has the same dst set.
>> 
>> Signed-off-by: Kevin Xu 
>> ---
> 
> FWIW, with this patch applied I'm still sometimes seeing:
> 
> [  125.928095] unregister_netdevice: waiting for lo to become free. Usage 
> count = 1
> 
> -Jakub


Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.

2017-03-16 Thread Jakub Sitnicki
On Thu, Mar 16, 2017 at 08:08 AM GMT, Kevin Xu wrote:
> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> on the same dst, causing reference count leakage. Eventually, it
> prevents net_device to be destroyed. The bug then manifested as
>
> unregister_netdevice: waiting for lo to become free. Usage count = 1
>
> in the kernel log, preventing new network namespace creation.
>
> The patch works around the issue by checking whether the socket already
> has the same dst set.
>
> Signed-off-by: Kevin Xu 
> ---

FWIW, with this patch applied I'm still sometimes seeing:

[  125.928095] unregister_netdevice: waiting for lo to become free. Usage count 
= 1

-Jakub


[PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.

2017-03-16 Thread Kevin Xu
In some rare cases, inet_sk_rx_dst_set() may be called multiple times
on the same dst, causing reference count leakage. Eventually, it
prevents net_device to be destroyed. The bug then manifested as

unregister_netdevice: waiting for lo to become free. Usage count = 1

in the kernel log, preventing new network namespace creation.

The patch works around the issue by checking whether the socket already
has the same dst set.

Signed-off-by: Kevin Xu 
---
 net/ipv4/tcp_ipv4.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 575e19d..f425c14 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1807,9 +1807,14 @@ void inet_sk_rx_dst_set(struct sock *sk, const struct 
sk_buff *skb)
 {
struct dst_entry *dst = skb_dst(skb);
 
-   if (dst && dst_hold_safe(dst)) {
-   sk->sk_rx_dst = dst;
-   inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
+   if (dst) {
+   if (unlikely(dst == sk->sk_rx_dst))
+   return;
+
+   if (dst_hold_safe(dst)) {
+   sk->sk_rx_dst = dst;
+   inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
+   }
}
 }
 EXPORT_SYMBOL(inet_sk_rx_dst_set);
-- 
1.9.1