Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.

2017-12-09 Thread Cong Wang
On Fri, Dec 8, 2017 at 9:27 PM, Tonghao Zhang  wrote:
> On Sat, Dec 9, 2017 at 6:09 AM, Cong Wang  wrote:
>> On Thu, Dec 7, 2017 at 9:28 PM, Tonghao Zhang  
>> wrote:
>>>
>>> Release the netlink sock created in kernel(not hold the _net_ namespace):
>>>
>>
>> You can avoid counting kernel sock by testing 'kern' in sk_alloc()
>> and testing 'sk->sk_net_refcnt' in __sk_free().
> Hi cong, if we do it in this way, we will not counter the sock created
> in kernel, right ?

Yes, it is not very useful for user-space to know how many kernel
sockets we create, IMHO, so not counting kernel sockets seems
fine.


Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.

2017-12-08 Thread Tonghao Zhang
On Sat, Dec 9, 2017 at 6:09 AM, Cong Wang  wrote:
> On Thu, Dec 7, 2017 at 9:28 PM, Tonghao Zhang  
> wrote:
>>
>> Release the netlink sock created in kernel(not hold the _net_ namespace):
>>
>
> You can avoid counting kernel sock by testing 'kern' in sk_alloc()
> and testing 'sk->sk_net_refcnt' in __sk_free().
Hi cong, if we do it in this way, we will not counter the sock created
in kernel, right ?


Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.

2017-12-08 Thread Tonghao Zhang
On Fri, Dec 8, 2017 at 9:24 PM, Eric Dumazet  wrote:
> On Fri, 2017-12-08 at 19:29 +0800, Tonghao Zhang wrote:
>> hi all. we can add synchronize_rcu and rcu_barrier in
>> sock_inuse_exit_net to
>> ensure there are no outstanding rcu callbacks using this network
>> namespace.
>> we will not have to test if net->core.sock_inuse is NULL or not from
>> sock_inuse_add(). :)
>>
>>  static void __net_exit sock_inuse_exit_net(struct net *net)
>>  {
>> free_percpu(net->core.prot_inuse);
>> +
>> +   synchronize_rcu();
>> +   rcu_barrier();
>> +
>> +   free_percpu(net->core.sock_inuse);
>>  }
>
>
> Oh well. Do you have any idea of the major problem this would add ?
>
> Try the following, before and after your patches :
>
> for i in `seq 1 40`
> do
>  (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
> done
> wait
>
> ( Check commit 8ca712c373a462cfa1b62272870b6c2c74aa83f9 )
>
Yes, I did the test. The patches drop the performance.
Before patch:
# time ./add_del_unshare.sh
net_namespace 97125   601658 : tunables00
  0 : slabdata 25 25  0

real 8m19.665s
user 0m4.268s
sys 0m6.477s

After :
# time ./add_del_unshare.sh
net_namespace102130   601658 : tunables00
  0 : slabdata 26 26  0

real 8m52.563s
user 0m4.040s
sys 0m7.558s

>
> This is a complex problem, we wont accept patches that kill network
> namespaces dismantling performance by adding brute force
> synchronize_rcu() or rcu_barrier() calls.
>
> Why not freeing net->core.sock_inuse right before feeing net itself in
> net_free() ?
I try this way, alloc  core.sock_inuse in net_alloc(), free it in net_free ().
It does not drop performance, and we will not always to check the
core.sock_inuse
in sock_inuse_add().

After :
# time ./add_del_unshare.sh
net_namespace109135   601658 : tunables00
  0 : slabdata 27 27  0

real 8m19.265s
user 0m4.090s
sys 0m8.185s

> You do not have to hijack sock_inuse_exit_net() just because it has a
> misleading name.
>
>


Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.

2017-12-08 Thread Cong Wang
On Thu, Dec 7, 2017 at 9:28 PM, Tonghao Zhang  wrote:
>
> Release the netlink sock created in kernel(not hold the _net_ namespace):
>

You can avoid counting kernel sock by testing 'kern' in sk_alloc()
and testing 'sk->sk_net_refcnt' in __sk_free().


Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.

2017-12-08 Thread Eric Dumazet
On Fri, 2017-12-08 at 19:29 +0800, Tonghao Zhang wrote:
> hi all. we can add synchronize_rcu and rcu_barrier in
> sock_inuse_exit_net to
> ensure there are no outstanding rcu callbacks using this network
> namespace.
> we will not have to test if net->core.sock_inuse is NULL or not from
> sock_inuse_add(). :)
> 
>  static void __net_exit sock_inuse_exit_net(struct net *net)
>  {
> free_percpu(net->core.prot_inuse);
> +
> +   synchronize_rcu();
> +   rcu_barrier();
> +
> +   free_percpu(net->core.sock_inuse);
>  }


Oh well. Do you have any idea of the major problem this would add ?

Try the following, before and after your patches :

for i in `seq 1 40`
do
 (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) & 
done
wait

( Check commit 8ca712c373a462cfa1b62272870b6c2c74aa83f9 )


This is a complex problem, we wont accept patches that kill network
namespaces dismantling performance by adding brute force
synchronize_rcu() or rcu_barrier() calls.

Why not freeing net->core.sock_inuse right before feeing net itself in
net_free() ?

You do not have to hijack sock_inuse_exit_net() just because it has a
misleading name.




Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.

2017-12-08 Thread Tonghao Zhang
hi all. we can add synchronize_rcu and rcu_barrier in sock_inuse_exit_net to
ensure there are no outstanding rcu callbacks using this network namespace.
we will not have to test if net->core.sock_inuse is NULL or not from
sock_inuse_add(). :)

 static void __net_exit sock_inuse_exit_net(struct net *net)
 {
free_percpu(net->core.prot_inuse);
+
+   synchronize_rcu();
+   rcu_barrier();
+
+   free_percpu(net->core.sock_inuse);
 }


On Fri, Dec 8, 2017 at 5:52 PM, Tonghao Zhang  wrote:
> On Fri, Dec 8, 2017 at 1:40 PM, Eric Dumazet  wrote:
>> On Fri, 2017-12-08 at 13:28 +0800, Tonghao Zhang wrote:
>>> On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet 
>>> wrote:
>>> > On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote:
>>> > > In some case, we want to know how many sockets are in use in
>>> > > different _net_ namespaces. It's a key resource metric.
>>> > >
>>> >
>>> > ...
>>> >
>>> > > +static void sock_inuse_add(struct net *net, int val)
>>> > > +{
>>> > > + if (net->core.prot_inuse)
>>> > > + this_cpu_add(*net->core.sock_inuse, val);
>>> > > +}
>>> >
>>> > This is very confusing.
>>> >
>>> > Why testing net->core.prot_inuse for NULL is needed at all ?
>>> >
>>> > Why not testing net->core.sock_inuse instead ?
>>> >
>>>
>>> Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there.
>>> Why
>>> we should check the net->core.sock_inuse
>>> Now show you the code:
>>>
>>> cleanup_net will call all of the network namespace exit methods,
>>> rcu_barrier, and then remove the _net_ namespace.
>>>
>>> cleanup_net:
>>> list_for_each_entry_reverse(ops, _list, list)
>>>  ops_exit_list(ops, _exit_list);
>>>
>>> rcu_barrier(); /* for netlink sock, the ‘deferred_put_nlk_sk’
>>> will
>>> be called. But sock_inuse has been released. */
>>
>>
>> Thats would be a bug.
>>
>> Please find another way, but we want ultimately to check that before
>> net->core.sock_inuse is freed, folding the inuse count on all cpus is
>> 0, to make sure we do not have a bug somewhere.
>
> Yes, I am aware of this issue even we will destroy the network namespace.
> By the way, we can counter the socket-inuse in sock_alloc or sock_release.
> In this way, we have to hold the network namespace again(via
> get_net()) while sock
> may hold it.
>
> what do you think of this idea?
>
>> We should not have to test if net->core.sock_inuse is NULL or not from
>> sock_inuse_add(). Pointer must be there all the time.
>>
>> The freeing should only happen once we are sure sock_inuse_add() can
>> not be called anymore.
>>
>>>
>>>
>>> /* Finally it is safe to free my network namespace structure */
>>> list_for_each_entry_safe(net, tmp, _exit_list, exit_list) {}
>>>
>>>
>>>
>>> Release the netlink sock created in kernel(not hold the _net_
>>> namespace):
>>>
>>> netlink_release
>>>call_rcu(>rcu, deferred_put_nlk_sk);
>>>
>>> deferred_put_nlk_sk
>>>sk_free(sk);
>>>
>>>
>>> I may add a comment for sock_inuse_add in v6.
>>
>>


Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.

2017-12-08 Thread Tonghao Zhang
On Fri, Dec 8, 2017 at 1:40 PM, Eric Dumazet  wrote:
> On Fri, 2017-12-08 at 13:28 +0800, Tonghao Zhang wrote:
>> On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet 
>> wrote:
>> > On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote:
>> > > In some case, we want to know how many sockets are in use in
>> > > different _net_ namespaces. It's a key resource metric.
>> > >
>> >
>> > ...
>> >
>> > > +static void sock_inuse_add(struct net *net, int val)
>> > > +{
>> > > + if (net->core.prot_inuse)
>> > > + this_cpu_add(*net->core.sock_inuse, val);
>> > > +}
>> >
>> > This is very confusing.
>> >
>> > Why testing net->core.prot_inuse for NULL is needed at all ?
>> >
>> > Why not testing net->core.sock_inuse instead ?
>> >
>>
>> Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there.
>> Why
>> we should check the net->core.sock_inuse
>> Now show you the code:
>>
>> cleanup_net will call all of the network namespace exit methods,
>> rcu_barrier, and then remove the _net_ namespace.
>>
>> cleanup_net:
>> list_for_each_entry_reverse(ops, _list, list)
>>  ops_exit_list(ops, _exit_list);
>>
>> rcu_barrier(); /* for netlink sock, the ‘deferred_put_nlk_sk’
>> will
>> be called. But sock_inuse has been released. */
>
>
> Thats would be a bug.
>
> Please find another way, but we want ultimately to check that before
> net->core.sock_inuse is freed, folding the inuse count on all cpus is
> 0, to make sure we do not have a bug somewhere.

Yes, I am aware of this issue even we will destroy the network namespace.
By the way, we can counter the socket-inuse in sock_alloc or sock_release.
In this way, we have to hold the network namespace again(via
get_net()) while sock
may hold it.

what do you think of this idea?

> We should not have to test if net->core.sock_inuse is NULL or not from
> sock_inuse_add(). Pointer must be there all the time.
>
> The freeing should only happen once we are sure sock_inuse_add() can
> not be called anymore.
>
>>
>>
>> /* Finally it is safe to free my network namespace structure */
>> list_for_each_entry_safe(net, tmp, _exit_list, exit_list) {}
>>
>>
>>
>> Release the netlink sock created in kernel(not hold the _net_
>> namespace):
>>
>> netlink_release
>>call_rcu(>rcu, deferred_put_nlk_sk);
>>
>> deferred_put_nlk_sk
>>sk_free(sk);
>>
>>
>> I may add a comment for sock_inuse_add in v6.
>
>


Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.

2017-12-07 Thread Eric Dumazet
On Fri, 2017-12-08 at 13:28 +0800, Tonghao Zhang wrote:
> On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet 
> wrote:
> > On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote:
> > > In some case, we want to know how many sockets are in use in
> > > different _net_ namespaces. It's a key resource metric.
> > > 
> > 
> > ...
> > 
> > > +static void sock_inuse_add(struct net *net, int val)
> > > +{
> > > + if (net->core.prot_inuse)
> > > + this_cpu_add(*net->core.sock_inuse, val);
> > > +}
> > 
> > This is very confusing.
> > 
> > Why testing net->core.prot_inuse for NULL is needed at all ?
> > 
> > Why not testing net->core.sock_inuse instead ?
> > 
> 
> Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there.
> Why
> we should check the net->core.sock_inuse
> Now show you the code:
> 
> cleanup_net will call all of the network namespace exit methods,
> rcu_barrier, and then remove the _net_ namespace.
> 
> cleanup_net:
> list_for_each_entry_reverse(ops, _list, list)
>  ops_exit_list(ops, _exit_list);
> 
> rcu_barrier(); /* for netlink sock, the ‘deferred_put_nlk_sk’
> will
> be called. But sock_inuse has been released. */


Thats would be a bug.

Please find another way, but we want ultimately to check that before
net->core.sock_inuse is freed, folding the inuse count on all cpus is
0, to make sure we do not have a bug somewhere.

We should not have to test if net->core.sock_inuse is NULL or not from
sock_inuse_add(). Pointer must be there all the time.

The freeing should only happen once we are sure sock_inuse_add() can
not be called anymore.

> 
> 
> /* Finally it is safe to free my network namespace structure */
> list_for_each_entry_safe(net, tmp, _exit_list, exit_list) {}
> 
> 
> 
> Release the netlink sock created in kernel(not hold the _net_
> namespace):
> 
> netlink_release
>    call_rcu(>rcu, deferred_put_nlk_sk);
> 
> deferred_put_nlk_sk
>    sk_free(sk);
> 
> 
> I may add a comment for sock_inuse_add in v6.




Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.

2017-12-07 Thread Tonghao Zhang
On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet  wrote:
> On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote:
>> In some case, we want to know how many sockets are in use in
>> different _net_ namespaces. It's a key resource metric.
>>
>
> ...
>
>> +static void sock_inuse_add(struct net *net, int val)
>> +{
>> + if (net->core.prot_inuse)
>> + this_cpu_add(*net->core.sock_inuse, val);
>> +}
>
> This is very confusing.
>
> Why testing net->core.prot_inuse for NULL is needed at all ?
>
> Why not testing net->core.sock_inuse instead ?
>
Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there. Why
we should check the net->core.sock_inuse
Now show you the code:

cleanup_net will call all of the network namespace exit methods,
rcu_barrier, and then remove the _net_ namespace.

cleanup_net:
list_for_each_entry_reverse(ops, _list, list)
 ops_exit_list(ops, _exit_list);

rcu_barrier(); /* for netlink sock, the ‘deferred_put_nlk_sk’ will
be called. But sock_inuse has been released. */


/* Finally it is safe to free my network namespace structure */
list_for_each_entry_safe(net, tmp, _exit_list, exit_list) {}



Release the netlink sock created in kernel(not hold the _net_ namespace):

netlink_release
   call_rcu(>rcu, deferred_put_nlk_sk);

deferred_put_nlk_sk
   sk_free(sk);


I may add a comment for sock_inuse_add in v6.


Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.

2017-12-07 Thread Cong Wang
On Thu, Dec 7, 2017 at 9:20 AM, Eric Dumazet  wrote:
> On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote:
>> In some case, we want to know how many sockets are in use in
>> different _net_ namespaces. It's a key resource metric.
>>
>
> ...
>
>> +static void sock_inuse_add(struct net *net, int val)
>> +{
>> + if (net->core.prot_inuse)
>> + this_cpu_add(*net->core.sock_inuse, val);
>> +}
>
> This is very confusing.
>
> Why testing net->core.prot_inuse for NULL is needed at all ?
>
> Why not testing net->core.sock_inuse instead ?

I bet that is copy-n-paste error given that sock_inuse_exit_net()
has a similar typo.


Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.

2017-12-07 Thread Eric Dumazet
On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote:
> In some case, we want to know how many sockets are in use in
> different _net_ namespaces. It's a key resource metric.
> 

...

> +static void sock_inuse_add(struct net *net, int val)
> +{
> + if (net->core.prot_inuse)
> + this_cpu_add(*net->core.sock_inuse, val);
> +}

This is very confusing.

Why testing net->core.prot_inuse for NULL is needed at all ?

Why not testing net->core.sock_inuse instead ?



[PATCH v5 2/2] sock: Move the socket inuse to namespace.

2017-12-07 Thread Tonghao Zhang
In some case, we want to know how many sockets are in use in
different _net_ namespaces. It's a key resource metric.

This patch add a member in struct netns_core. This is a counter
for socket-inuse in the _net_ namespace. The patch will add/sub
counter in the sk_alloc, sk_clone_lock and __sk_free.

The main reasons for doing this are that:

1. When linux calls the 'do_exit' for process to exit, the functions
'exit_task_namespaces' and 'exit_task_work' will be called sequentially.
'exit_task_namespaces' may have destroyed the _net_ namespace, but
'sock_release' called in 'exit_task_work' may use the _net_ namespace
if we counter the socket-inuse in sock_release.

2. socket and sock are in pair. More important, sock holds the _net_
namespace. We counter the socket-inuse in sock, for avoiding holding
_net_ namespace again in socket. It's a easy way to maintain the code.

Signed-off-by: Martin Zhang 
Signed-off-by: Tonghao Zhang 
---
 include/net/netns/core.h |  1 +
 include/net/sock.h   |  1 +
 net/core/sock.c  | 52 ++--
 net/socket.c | 21 ++-
 4 files changed, 54 insertions(+), 21 deletions(-)

diff --git a/include/net/netns/core.h b/include/net/netns/core.h
index 45cfb5d..d1b4748f 100644
--- a/include/net/netns/core.h
+++ b/include/net/netns/core.h
@@ -11,6 +11,7 @@ struct netns_core {
 
int sysctl_somaxconn;
 
+   int __percpu *sock_inuse;
struct prot_inuse __percpu *prot_inuse;
 };
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 79e1a2c..0809b31 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1266,6 +1266,7 @@ static inline void sk_sockets_allocated_inc(struct sock 
*sk)
 /* Called with local bh disabled */
 void sock_prot_inuse_add(struct net *net, struct proto *prot, int inc);
 int sock_prot_inuse_get(struct net *net, struct proto *proto);
+int sock_inuse_get(struct net *net);
 #else
 static inline void sock_prot_inuse_add(struct net *net, struct proto *prot,
int inc)
diff --git a/net/core/sock.c b/net/core/sock.c
index c2dd2d3..a11680a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -145,6 +145,8 @@
 static DEFINE_MUTEX(proto_list_mutex);
 static LIST_HEAD(proto_list);
 
+static void sock_inuse_add(struct net *net, int val);
+
 /**
  * sk_ns_capable - General socket capability test
  * @sk: Socket to use a capability on or through
@@ -1534,6 +1536,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t 
priority,
if (likely(sk->sk_net_refcnt))
get_net(net);
sock_net_set(sk, net);
+   sock_inuse_add(net, 1);
refcount_set(>sk_wmem_alloc, 1);
 
mem_cgroup_sk_alloc(sk);
@@ -1595,6 +1598,8 @@ void sk_destruct(struct sock *sk)
 
 static void __sk_free(struct sock *sk)
 {
+   sock_inuse_add(sock_net(sk), -1);
+
if (unlikely(sock_diag_has_destroy_listeners(sk) && sk->sk_net_refcnt))
sock_diag_broadcast_destroy(sk);
else
@@ -1716,6 +1721,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const 
gfp_t priority)
newsk->sk_priority = 0;
newsk->sk_incoming_cpu = raw_smp_processor_id();
atomic64_set(>sk_cookie, 0);
+   sock_inuse_add(sock_net(newsk), 1);
 
/*
 * Before updating sk_refcnt, we must commit prior changes to 
memory
@@ -3061,15 +3067,53 @@ int sock_prot_inuse_get(struct net *net, struct proto 
*prot)
 }
 EXPORT_SYMBOL_GPL(sock_prot_inuse_get);
 
+static void sock_inuse_add(struct net *net, int val)
+{
+   if (net->core.prot_inuse)
+   this_cpu_add(*net->core.sock_inuse, val);
+}
+
+int sock_inuse_get(struct net *net)
+{
+   int cpu, res = 0;
+
+   if (!net->core.prot_inuse)
+   return 0;
+
+   for_each_possible_cpu(cpu)
+   res += *per_cpu_ptr(net->core.sock_inuse, cpu);
+
+   return res >= 0 ? res : 0;
+}
+EXPORT_SYMBOL_GPL(sock_inuse_get);
+
 static int __net_init sock_inuse_init_net(struct net *net)
 {
net->core.prot_inuse = alloc_percpu(struct prot_inuse);
-   return net->core.prot_inuse ? 0 : -ENOMEM;
+   if (!net->core.prot_inuse)
+   return -ENOMEM;
+
+   net->core.sock_inuse = alloc_percpu(int);
+   if (!net->core.sock_inuse)
+   goto out;
+
+   return 0;
+out:
+   free_percpu(net->core.prot_inuse);
+   return -ENOMEM;
 }
 
 static void __net_exit sock_inuse_exit_net(struct net *net)
 {
-   free_percpu(net->core.prot_inuse);
+   if (net->core.prot_inuse) {
+   free_percpu(net->core.prot_inuse);
+   net->core.prot_inuse = NULL;
+   }
+
+   if (net->core.sock_inuse) {
+   free_percpu(net->core.sock_inuse);
+   net->core.prot_inuse = NULL;
+   }
 }
 
 static struct