Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob
> On 2017年9月9日, at 下午1:16, David Millerwrote: > > From: 严海双 > Date: Sat, 9 Sep 2017 13:09:57 +0800 > >> >> >>> On 2017年9月9日, at 下午12:35, Cong Wang wrote: >>> >>> On Fri, Sep 8, 2017 at 6:25 PM, 严海双 >>> wrote: > On 2017年9月9日, at 上午6:13, Cong Wang wrote: > > On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan > wrote: >> Different namespace application might require different maximal number >> of TCP sockets independently of the host. > > So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans > in a whole system, right? This just makes OOM easier to trigger. > From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans, and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing. >>> >>> Nope, by N I mean the number of containers. Before your patch, the limit >>> is global, after your patch it is per container. >>> >> >> Yeah, for example, if there is N containers, before the patch, I mean the >> limit is: >> >> N * net->ipv4.sysctl_tcp_max_orphans >> >> After the patch, the limit is: >> >> ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. >> net->ipv4.sysctl_tcp_max_orphans + … > > Not true. > > Please remove "N" from your equation of the current situation. > > "sysctl_tcp_max_orphans" applies to entire system, it is a global limit, > comparing one limit against all orphans in the system, there is no N. Yes, it’s right. I browse the source code and found that it’s a global limit, sorry for my mistake. Thanks David and Cong.
Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob
From: 严海双Date: Sat, 9 Sep 2017 13:09:57 +0800 > > >> On 2017年9月9日, at 下午12:35, Cong Wang wrote: >> >> On Fri, Sep 8, 2017 at 6:25 PM, 严海双 >> wrote: >>> >>> On 2017年9月9日, at 上午6:13, Cong Wang wrote: On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan wrote: > Different namespace application might require different maximal number > of TCP sockets independently of the host. So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans in a whole system, right? This just makes OOM easier to trigger. >>> >>> From my understanding, before the patch, we had N * >>> net->ipv4.sysctl_tcp_max_orphans, >>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + >>> ns2.sysctl_tcp_max_orphans >>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing. >> >> Nope, by N I mean the number of containers. Before your patch, the limit >> is global, after your patch it is per container. >> > > Yeah, for example, if there is N containers, before the patch, I mean the > limit is: > > N * net->ipv4.sysctl_tcp_max_orphans > > After the patch, the limit is: > > ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. > net->ipv4.sysctl_tcp_max_orphans + … Not true. Please remove "N" from your equation of the current situation. "sysctl_tcp_max_orphans" applies to entire system, it is a global limit, comparing one limit against all orphans in the system, there is no N.
Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob
> On 2017年9月9日, at 下午12:35, Cong Wangwrote: > > On Fri, Sep 8, 2017 at 6:25 PM, 严海双 wrote: >> >> >>> On 2017年9月9日, at 上午6:13, Cong Wang wrote: >>> >>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan >>> wrote: Different namespace application might require different maximal number of TCP sockets independently of the host. >>> >>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans >>> in a whole system, right? This just makes OOM easier to trigger. >>> >> >> From my understanding, before the patch, we had N * >> net->ipv4.sysctl_tcp_max_orphans, >> and after the patch, we could have ns1.sysctl_tcp_max_orphans + >> ns2.sysctl_tcp_max_orphans >> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing. > > Nope, by N I mean the number of containers. Before your patch, the limit > is global, after your patch it is per container. > Yeah, for example, if there is N containers, before the patch, I mean the limit is: N * net->ipv4.sysctl_tcp_max_orphans After the patch, the limit is: ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + …
Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob
On Fri, Sep 8, 2017 at 6:25 PM, 严海双wrote: > > >> On 2017年9月9日, at 上午6:13, Cong Wang wrote: >> >> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan >> wrote: >>> Different namespace application might require different maximal number >>> of TCP sockets independently of the host. >> >> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans >> in a whole system, right? This just makes OOM easier to trigger. >> > > From my understanding, before the patch, we had N * > net->ipv4.sysctl_tcp_max_orphans, > and after the patch, we could have ns1.sysctl_tcp_max_orphans + > ns2.sysctl_tcp_max_orphans > + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing. Nope, by N I mean the number of containers. Before your patch, the limit is global, after your patch it is per container.
Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob
> On 2017年9月9日, at 上午6:13, Cong Wangwrote: > > On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan > wrote: >> Different namespace application might require different maximal number >> of TCP sockets independently of the host. > > So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans > in a whole system, right? This just makes OOM easier to trigger. > >From my understanding, before the patch, we had N * >net->ipv4.sysctl_tcp_max_orphans, and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob
On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yanwrote: > Different namespace application might require different maximal number > of TCP sockets independently of the host. So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans in a whole system, right? This just makes OOM easier to trigger.
[PATCH] ipv4: Namespaceify tcp_max_orphans knob
Different namespace application might require different maximal number of TCP sockets independently of the host. Signed-off-by: Haishuang Yan--- include/net/netns/ipv4.h | 1 + include/net/tcp.h | 5 +++-- net/ipv4/sysctl_net_ipv4.c | 14 +++--- net/ipv4/tcp.c | 3 --- net/ipv4/tcp_input.c | 1 - net/ipv4/tcp_ipv4.c| 1 + 6 files changed, 12 insertions(+), 13 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 20d061c..305e031 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -127,6 +127,7 @@ struct netns_ipv4 { int sysctl_tcp_timestamps; struct inet_timewait_death_row tcp_death_row; int sysctl_max_syn_backlog; + int sysctl_tcp_max_orphans; #ifdef CONFIG_NET_L3_MASTER_DEV int sysctl_udp_l3mdev_accept; diff --git a/include/net/tcp.h b/include/net/tcp.h index b510f28..ac2d998 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -320,10 +320,11 @@ static inline bool tcp_too_many_orphans(struct sock *sk, int shift) { struct percpu_counter *ocp = sk->sk_prot->orphan_count; int orphans = percpu_counter_read_positive(ocp); + int tcp_max_orphans = sock_net(sk)->ipv4.sysctl_tcp_max_orphans; - if (orphans << shift > sysctl_tcp_max_orphans) { + if (orphans << shift > tcp_max_orphans) { orphans = percpu_counter_sum_positive(ocp); - if (orphans << shift > sysctl_tcp_max_orphans) + if (orphans << shift > tcp_max_orphans) return true; } return false; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 0d3c038..4f26c8d3 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -394,13 +394,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl, .proc_handler = proc_dointvec }, { - .procname = "tcp_max_orphans", - .data = _tcp_max_orphans, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec - }, - { .procname = "tcp_fastopen", .data = _tcp_fastopen, .maxlen = sizeof(int), @@ -1085,6 +1078,13 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl, .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "tcp_max_orphans", + .data = _net.ipv4.sysctl_tcp_max_orphans, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, #ifdef CONFIG_IP_ROUTE_MULTIPATH { .procname = "fib_multipath_use_neigh", diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5091402..39187ac 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3522,9 +3522,6 @@ void __init tcp_init(void) } - cnt = tcp_hashinfo.ehash_mask + 1; - sysctl_tcp_max_orphans = cnt / 2; - tcp_init_mem(); /* Set per-socket limits to no more than 1/128 the pressure threshold */ limit = nr_free_buffer_pages() << (PAGE_SHIFT - 7); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c5d7656..0230509 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -88,7 +88,6 @@ int sysctl_tcp_stdurg __read_mostly; int sysctl_tcp_rfc1337 __read_mostly; -int sysctl_tcp_max_orphans __read_mostly = NR_FILE; int sysctl_tcp_frto __read_mostly = 2; int sysctl_tcp_min_rtt_wlen __read_mostly = 300; int sysctl_tcp_moderate_rcvbuf __read_mostly = 1; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index a63486a..4b17a91 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2468,6 +2468,7 @@ static int __net_init tcp_sk_init(struct net *net) net->ipv4.tcp_death_row.hashinfo = _hashinfo; net->ipv4.sysctl_max_syn_backlog = max(128, cnt / 256); + net->ipv4.sysctl_tcp_max_orphans = cnt / 2; net->ipv4.sysctl_tcp_sack = 1; net->ipv4.sysctl_tcp_window_scaling = 1; net->ipv4.sysctl_tcp_timestamps = 1; -- 1.8.3.1