Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob

2017-09-09 Thread 严海双


> On 2017年9月9日, at 下午1:16, David Miller  wrote:
> 
> From: 严海双 
> Date: Sat, 9 Sep 2017 13:09:57 +0800
> 
>> 
>> 
>>> On 2017年9月9日, at 下午12:35, Cong Wang  wrote:
>>> 
>>> On Fri, Sep 8, 2017 at 6:25 PM, 严海双  
>>> wrote:
 
 
> On 2017年9月9日, at 上午6:13, Cong Wang  wrote:
> 
> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>  wrote:
>> Different namespace application might require different maximal number
>> of TCP sockets independently of the host.
> 
> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
> in a whole system, right? This just makes OOM easier to trigger.
> 
 
 From my understanding, before the patch, we had N * 
 net->ipv4.sysctl_tcp_max_orphans,
 and after the patch, we could have ns1.sysctl_tcp_max_orphans + 
 ns2.sysctl_tcp_max_orphans
 + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
>>> 
>>> Nope, by N I mean the number of containers. Before your patch, the limit
>>> is global, after your patch it is per container.
>>> 
>> 
>> Yeah, for example, if there is N containers, before the patch, I mean the 
>> limit is:
>> 
>>  N * net->ipv4.sysctl_tcp_max_orphans
>> 
>> After the patch, the limit is:
>> 
>>  ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. 
>> net->ipv4.sysctl_tcp_max_orphans + …
> 
> Not true.
> 
> Please remove "N" from your equation of the current situation.
> 
> "sysctl_tcp_max_orphans" applies to entire system, it is a global limit,
> comparing one limit against all orphans in the system, there is no N.

Yes, it’s right. I browse the source code and found that it’s a global limit, 
sorry for my mistake.

Thanks David and Cong.






Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob

2017-09-08 Thread David Miller
From: 严海双 
Date: Sat, 9 Sep 2017 13:09:57 +0800

> 
> 
>> On 2017年9月9日, at 下午12:35, Cong Wang  wrote:
>> 
>> On Fri, Sep 8, 2017 at 6:25 PM, 严海双  
>> wrote:
>>> 
>>> 
 On 2017年9月9日, at 上午6:13, Cong Wang  wrote:
 
 On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
  wrote:
> Different namespace application might require different maximal number
> of TCP sockets independently of the host.
 
 So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
 in a whole system, right? This just makes OOM easier to trigger.
 
>>> 
>>> From my understanding, before the patch, we had N * 
>>> net->ipv4.sysctl_tcp_max_orphans,
>>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + 
>>> ns2.sysctl_tcp_max_orphans
>>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
>> 
>> Nope, by N I mean the number of containers. Before your patch, the limit
>> is global, after your patch it is per container.
>> 
> 
> Yeah, for example, if there is N containers, before the patch, I mean the 
> limit is:
> 
>   N * net->ipv4.sysctl_tcp_max_orphans
> 
> After the patch, the limit is:
> 
>   ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. 
> net->ipv4.sysctl_tcp_max_orphans + …

Not true.

Please remove "N" from your equation of the current situation.

"sysctl_tcp_max_orphans" applies to entire system, it is a global limit,
comparing one limit against all orphans in the system, there is no N.


Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob

2017-09-08 Thread 严海双


> On 2017年9月9日, at 下午12:35, Cong Wang  wrote:
> 
> On Fri, Sep 8, 2017 at 6:25 PM, 严海双  wrote:
>> 
>> 
>>> On 2017年9月9日, at 上午6:13, Cong Wang  wrote:
>>> 
>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>>>  wrote:
 Different namespace application might require different maximal number
 of TCP sockets independently of the host.
>>> 
>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>>> in a whole system, right? This just makes OOM easier to trigger.
>>> 
>> 
>> From my understanding, before the patch, we had N * 
>> net->ipv4.sysctl_tcp_max_orphans,
>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + 
>> ns2.sysctl_tcp_max_orphans
>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
> 
> Nope, by N I mean the number of containers. Before your patch, the limit
> is global, after your patch it is per container.
> 

Yeah, for example, if there is N containers, before the patch, I mean the limit 
is:

N * net->ipv4.sysctl_tcp_max_orphans

After the patch, the limit is:

ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. 
net->ipv4.sysctl_tcp_max_orphans + …






Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob

2017-09-08 Thread Cong Wang
On Fri, Sep 8, 2017 at 6:25 PM, 严海双  wrote:
>
>
>> On 2017年9月9日, at 上午6:13, Cong Wang  wrote:
>>
>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>>  wrote:
>>> Different namespace application might require different maximal number
>>> of TCP sockets independently of the host.
>>
>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>> in a whole system, right? This just makes OOM easier to trigger.
>>
>
> From my understanding, before the patch, we had N * 
> net->ipv4.sysctl_tcp_max_orphans,
> and after the patch, we could have ns1.sysctl_tcp_max_orphans + 
> ns2.sysctl_tcp_max_orphans
> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.

Nope, by N I mean the number of containers. Before your patch, the limit
is global, after your patch it is per container.


Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob

2017-09-08 Thread 严海双


> On 2017年9月9日, at 上午6:13, Cong Wang  wrote:
> 
> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>  wrote:
>> Different namespace application might require different maximal number
>> of TCP sockets independently of the host.
> 
> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
> in a whole system, right? This just makes OOM easier to trigger.
> 

>From my understanding, before the patch, we had N * 
>net->ipv4.sysctl_tcp_max_orphans,
and after the patch, we could have ns1.sysctl_tcp_max_orphans + 
ns2.sysctl_tcp_max_orphans
+ ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.



Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob

2017-09-08 Thread Cong Wang
On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
 wrote:
> Different namespace application might require different maximal number
> of TCP sockets independently of the host.

So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
in a whole system, right? This just makes OOM easier to trigger.


[PATCH] ipv4: Namespaceify tcp_max_orphans knob

2017-09-06 Thread Haishuang Yan
Different namespace application might require different maximal number
of TCP sockets independently of the host.

Signed-off-by: Haishuang Yan 
---
 include/net/netns/ipv4.h   |  1 +
 include/net/tcp.h  |  5 +++--
 net/ipv4/sysctl_net_ipv4.c | 14 +++---
 net/ipv4/tcp.c |  3 ---
 net/ipv4/tcp_input.c   |  1 -
 net/ipv4/tcp_ipv4.c|  1 +
 6 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 20d061c..305e031 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -127,6 +127,7 @@ struct netns_ipv4 {
int sysctl_tcp_timestamps;
struct inet_timewait_death_row tcp_death_row;
int sysctl_max_syn_backlog;
+   int sysctl_tcp_max_orphans;
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
int sysctl_udp_l3mdev_accept;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index b510f28..ac2d998 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -320,10 +320,11 @@ static inline bool tcp_too_many_orphans(struct sock *sk, 
int shift)
 {
struct percpu_counter *ocp = sk->sk_prot->orphan_count;
int orphans = percpu_counter_read_positive(ocp);
+   int tcp_max_orphans = sock_net(sk)->ipv4.sysctl_tcp_max_orphans;
 
-   if (orphans << shift > sysctl_tcp_max_orphans) {
+   if (orphans << shift > tcp_max_orphans) {
orphans = percpu_counter_sum_positive(ocp);
-   if (orphans << shift > sysctl_tcp_max_orphans)
+   if (orphans << shift > tcp_max_orphans)
return true;
}
return false;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 0d3c038..4f26c8d3 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -394,13 +394,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
.proc_handler   = proc_dointvec
},
{
-   .procname   = "tcp_max_orphans",
-   .data   = _tcp_max_orphans,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec
-   },
-   {
.procname   = "tcp_fastopen",
.data   = _tcp_fastopen,
.maxlen = sizeof(int),
@@ -1085,6 +1078,13 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
.mode   = 0644,
.proc_handler   = proc_dointvec
},
+   {
+   .procname   = "tcp_max_orphans",
+   .data   = _net.ipv4.sysctl_tcp_max_orphans,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec
+   },
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
{
.procname   = "fib_multipath_use_neigh",
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5091402..39187ac 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3522,9 +3522,6 @@ void __init tcp_init(void)
}
 
 
-   cnt = tcp_hashinfo.ehash_mask + 1;
-   sysctl_tcp_max_orphans = cnt / 2;
-
tcp_init_mem();
/* Set per-socket limits to no more than 1/128 the pressure threshold */
limit = nr_free_buffer_pages() << (PAGE_SHIFT - 7);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c5d7656..0230509 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -88,7 +88,6 @@
 
 int sysctl_tcp_stdurg __read_mostly;
 int sysctl_tcp_rfc1337 __read_mostly;
-int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
 int sysctl_tcp_frto __read_mostly = 2;
 int sysctl_tcp_min_rtt_wlen __read_mostly = 300;
 int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a63486a..4b17a91 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2468,6 +2468,7 @@ static int __net_init tcp_sk_init(struct net *net)
net->ipv4.tcp_death_row.hashinfo = _hashinfo;
 
net->ipv4.sysctl_max_syn_backlog = max(128, cnt / 256);
+   net->ipv4.sysctl_tcp_max_orphans = cnt / 2;
net->ipv4.sysctl_tcp_sack = 1;
net->ipv4.sysctl_tcp_window_scaling = 1;
net->ipv4.sysctl_tcp_timestamps = 1;
-- 
1.8.3.1