Re: Something hitting my total number of connections to the server
On Wed, Aug 23, 2017 at 1:08 AM, Akshat Kakkarwrote: > > On Tue, Aug 22, 2017 at 5:58 PM, Neal Cardwell wrote: > > On Tue, Aug 22, 2017 at 1:42 AM, Akshat Kakkar > > wrote: > >> There are multiple hosts/clients. All are mainly windows based. > >> > >> Timestamp is not used as my clients mainly are windows based and in > >> that it tcp timestamp is by defauly disabled. > > ... > >> net.ipv4.tcp_tw_reuse=1 > >> net.ipv4.tcp_tw_recycle=1 > > > > I suspect the problem is there. The net.ipv4.tcp_tw_recycle setting > > should be 0. Running with the value 1 is known to cause buggy behavior > > related to TCP timestamps, and that feature has been removed in kernel > > v4.12. > > > > Can you please re-run your tests with net.ipv4.tcp_tw_recycle=0 or a > > newer kernel? > > > > neal > > Thanks for your reply. > > I understand that. > > But my point is, though tcp timestamp is enabled on the server, but as > client is not using it ... so how come this _bug_ (if any) is > triggered in first place. You mention "clients mainly are windows based". if they are only "mainly" Windows-based, and some are of other OSes that do use TCP timestamps, and the remote address is the same for TCP-timestamp-using and non-TCP-timestamp-using clients, then running with timestamps enabled on the server could tickle the bugs in pre-4.12 kernels that save info from TCP-timestamp-using connections and erroneously try to use that info to validate non-TCP-timestamp-using connections. But the main point is that the configuration you cited (net.ipv4.tcp_tw_recycle=1) is an unsupported configuration with known bugs. The best resolution would be to just run with net.ipv4.tcp_tw_recycle=0. It's not worth digging any further unless you run with net.ipv4.tcp_tw_recycle=0 or a kernel that is v4.12 or later and still have problems. Hope that helps, neal
Re: Something hitting my total number of connections to the server
On Tue, Aug 22, 2017 at 5:58 PM, Neal Cardwellwrote: > On Tue, Aug 22, 2017 at 1:42 AM, Akshat Kakkar wrote: >> There are multiple hosts/clients. All are mainly windows based. >> >> Timestamp is not used as my clients mainly are windows based and in >> that it tcp timestamp is by defauly disabled. > ... >> net.ipv4.tcp_tw_reuse=1 >> net.ipv4.tcp_tw_recycle=1 > > I suspect the problem is there. The net.ipv4.tcp_tw_recycle setting > should be 0. Running with the value 1 is known to cause buggy behavior > related to TCP timestamps, and that feature has been removed in kernel > v4.12. > > Can you please re-run your tests with net.ipv4.tcp_tw_recycle=0 or a > newer kernel? > > neal Thanks for your reply. I understand that. But my point is, though tcp timestamp is enabled on the server, but as client is not using it ... so how come this _bug_ (if any) is triggered in first place.
Re: Something hitting my total number of connections to the server
On Tue, 2017-08-22 at 10:46 -0700, David Ahern wrote: > On 8/22/17 10:44 AM, Eric Dumazet wrote: > > Willem wrote this doc in 2013, before we finally went back to 1000. > > > > We should update this doc. > > > And these too: > > $ egrep -r netdev_max_backlog Documentation/networking/ > Documentation/networking//cxgb.txt: sysctl -w > net.core.netdev_max_backlog=30 > Documentation/networking//ixgb.txt:net.core.netdev_max_backlog = 30 Yes, whoever wrote this had no idea of the implications I guess.
Re: Something hitting my total number of connections to the server
On 8/22/17 10:44 AM, Eric Dumazet wrote: > Willem wrote this doc in 2013, before we finally went back to 1000. > > We should update this doc. And these too: $ egrep -r netdev_max_backlog Documentation/networking/ Documentation/networking//cxgb.txt: sysctl -w net.core.netdev_max_backlog=30 Documentation/networking//ixgb.txt:net.core.netdev_max_backlog = 30
Re: Something hitting my total number of connections to the server
On Tue, 2017-08-22 at 09:43 -0700, David Ahern wrote: > On 8/22/17 6:02 AM, Eric Dumazet wrote: > >> > >> net.core.netdev_max_backlog=1 > > This is an insane backlog. > > > > https://www.kernel.org/doc/Documentation/networking/scaling.txt > > "== Suggested Configuration > > Flow limit is useful on systems with many concurrent connections, > where a single connection taking up 50% of a CPU indicates a problem. > In such environments, enable the feature on all CPUs that handle > network rx interrupts (as set in /proc/irq/N/smp_affinity). > > The feature depends on the input packet queue length to exceed > the flow limit threshold (50%) + the flow history length (256). > Setting net.core.netdev_max_backlog to either 1000 or 1 > performed well in experiments." 1 is adding tail latencies. At Google we run all the fleet with backlog of 1000 And yes, it took time to get rid of the backlog of 1 that was setup years ago, because of old constraints and some fears. Willem wrote this doc in 2013, before we finally went back to 1000. We should update this doc.
Re: Something hitting my total number of connections to the server
On 8/22/17 6:02 AM, Eric Dumazet wrote: >> >> net.core.netdev_max_backlog=1 > This is an insane backlog. > https://www.kernel.org/doc/Documentation/networking/scaling.txt "== Suggested Configuration Flow limit is useful on systems with many concurrent connections, where a single connection taking up 50% of a CPU indicates a problem. In such environments, enable the feature on all CPUs that handle network rx interrupts (as set in /proc/irq/N/smp_affinity). The feature depends on the input packet queue length to exceed the flow limit threshold (50%) + the flow history length (256). Setting net.core.netdev_max_backlog to either 1000 or 1 performed well in experiments."
Re: Something hitting my total number of connections to the server
On Tue, 2017-08-22 at 11:12 +0530, Akshat Kakkar wrote: > There are multiple hosts/clients. All are mainly windows based. > > Timestamp is not used as my clients mainly are windows based and in > that it tcp timestamp is by defauly disabled. > > sysctl is as follows: > > kernel.shmmax = 68719476736 > kernel.shmall = 4294967296 > kernel.pid_max=4194303 > vm.max_map_count=131072 > kernel.sem=250 32000 32 250 > > net.netfilter.nf_conntrack_generic_timeout = 300 > net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 60 > net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 30 > net.netfilter.nf_conntrack_tcp_timeout_established = 7200 > net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 60 > net.netfilter.nf_conntrack_tcp_timeout_close_wait = 30 > net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30 > net.netfilter.nf_conntrack_tcp_timeout_time_wait = 60 > net.netfilter.nf_conntrack_tcp_timeout_close = 10 > net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300 > net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300 > net.netfilter.nf_conntrack_udp_timeout = 30 > net.netfilter.nf_conntrack_udp_timeout_stream = 180 > net.netfilter.nf_conntrack_icmp_timeout = 30 > net.netfilter.nf_conntrack_events_retry_timeout = 15 > net.core.rmem_max = 8388608 > net.core.wmem_max = 8388608 > > net.ipv4.tcp_tw_reuse=1 > net.ipv4.tcp_tw_recycle=1 This is exactly what I feared. We do not support tcp_tw_reuse = 1 AND tcp_tw_recycle = 1 This is a very well known bad combination. > net.ipv4.tcp_fin_timeout=30 > net.ipv4.tcp_keepalive_time=1800 > net.ipv4.tcp_keepalive_intvl=60 > net.ipv4.tcp_keepalive_probes=20 > net.ipv4.tcp_max_syn_backlog=4096 > net.ipv4.tcp_syncookies=1 > net.ipv4.tcp_sack=1 > net.ipv4.tcp_dsack=1 > net.ipv4.tcp_window_scaling=1 > net.ipv4.tcp_syn_retries=3 > net.ipv4.tcp_synack_retries=3 > net.ipv4.tcp_retries1=3 > net.ipv4.tcp_retries2=15 > net.ipv4.ip_local_port_range=102465535 > > net.ipv4.tcp_timestamps=0 > > net.core.netdev_max_backlog=1 This is an insane backlog. > net.core.somaxconn=10 > net.core.optmem_max=81920 > > net.netfilter.nf_conntrack_max=524288 > net.nf_conntrack_max=524288 > net.ipv6.conf.all.disable_ipv6 = 1 > fs.file-max=100 > > net.ipv4.tcp_no_metrics_save = 1 > net.ipv4.tcp_max_syn_backlog = 10240 > net.ipv4.tcp_congestion_control=htcp > > net.ipv4.tcp_rfc1337 = 1 > net.core.netdev_max_backlog = 65536 This is a crazy backlog. Do not do that. > net.ipv4.tcp_max_tw_buckets = 144 > > net.core.rmem_max = 134217728 > net.core.wmem_max = 134217728 > > > It looks like your sysctls have been set to unreasonable values.
Re: Something hitting my total number of connections to the server
On Tue, Aug 22, 2017 at 1:42 AM, Akshat Kakkarwrote: > There are multiple hosts/clients. All are mainly windows based. > > Timestamp is not used as my clients mainly are windows based and in > that it tcp timestamp is by defauly disabled. ... > net.ipv4.tcp_tw_reuse=1 > net.ipv4.tcp_tw_recycle=1 I suspect the problem is there. The net.ipv4.tcp_tw_recycle setting should be 0. Running with the value 1 is known to cause buggy behavior related to TCP timestamps, and that feature has been removed in kernel v4.12. Can you please re-run your tests with net.ipv4.tcp_tw_recycle=0 or a newer kernel? neal
Re: Something hitting my total number of connections to the server
On Tue, Aug 22, 2017 at 11:12 AM, Akshat Kakkarwrote: > There are multiple hosts/clients. All are mainly windows based. > > Timestamp is not used as my clients mainly are windows based and in > that it tcp timestamp is by defauly disabled. > > sysctl is as follows: > > kernel.shmmax = 68719476736 > kernel.shmall = 4294967296 > kernel.pid_max=4194303 > vm.max_map_count=131072 > kernel.sem=250 32000 32 250 > > net.netfilter.nf_conntrack_generic_timeout = 300 > net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 60 > net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 30 > net.netfilter.nf_conntrack_tcp_timeout_established = 7200 > net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 60 > net.netfilter.nf_conntrack_tcp_timeout_close_wait = 30 > net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30 > net.netfilter.nf_conntrack_tcp_timeout_time_wait = 60 > net.netfilter.nf_conntrack_tcp_timeout_close = 10 > net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300 > net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300 > net.netfilter.nf_conntrack_udp_timeout = 30 > net.netfilter.nf_conntrack_udp_timeout_stream = 180 > net.netfilter.nf_conntrack_icmp_timeout = 30 > net.netfilter.nf_conntrack_events_retry_timeout = 15 > net.core.rmem_max = 8388608 > net.core.wmem_max = 8388608 > > net.ipv4.tcp_tw_reuse=1 > net.ipv4.tcp_tw_recycle=1 > net.ipv4.tcp_fin_timeout=30 > net.ipv4.tcp_keepalive_time=1800 > net.ipv4.tcp_keepalive_intvl=60 > net.ipv4.tcp_keepalive_probes=20 > net.ipv4.tcp_max_syn_backlog=4096 > net.ipv4.tcp_syncookies=1 > net.ipv4.tcp_sack=1 > net.ipv4.tcp_dsack=1 > net.ipv4.tcp_window_scaling=1 > net.ipv4.tcp_syn_retries=3 > net.ipv4.tcp_synack_retries=3 > net.ipv4.tcp_retries1=3 > net.ipv4.tcp_retries2=15 > net.ipv4.ip_local_port_range=102465535 > > net.ipv4.tcp_timestamps=0 > > net.core.netdev_max_backlog=1 > net.core.somaxconn=10 > net.core.optmem_max=81920 > > net.netfilter.nf_conntrack_max=524288 > net.nf_conntrack_max=524288 > net.ipv6.conf.all.disable_ipv6 = 1 > fs.file-max=100 > > net.ipv4.tcp_no_metrics_save = 1 > net.ipv4.tcp_max_syn_backlog = 10240 > net.ipv4.tcp_congestion_control=htcp > > net.ipv4.tcp_rfc1337 = 1 > net.core.netdev_max_backlog = 65536 > net.ipv4.tcp_max_tw_buckets = 144 > > net.core.rmem_max = 134217728 > net.core.wmem_max = 134217728 > > > > > On Mon, Aug 21, 2017 at 11:14 PM, Eric Dumazet wrote: >> On Mon, 2017-08-21 at 10:44 -0700, Eric Dumazet wrote: >> >>> - Why is timewait not being used ? >>> >> >> s/timewait/timestamps/ >> >> >> [Apologies for top post.] There are multiple hosts/clients. All are mainly windows based. Timestamp is not used as my clients mainly are windows based and in that it tcp timestamp is by defauly disabled. sysctl is as follows: kernel.shmmax = 68719476736 kernel.shmall = 4294967296 kernel.pid_max=4194303 vm.max_map_count=131072 kernel.sem=250 32000 32 250 net.netfilter.nf_conntrack_generic_timeout = 300 net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 60 net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 30 net.netfilter.nf_conntrack_tcp_timeout_established = 7200 net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 60 net.netfilter.nf_conntrack_tcp_timeout_close_wait = 30 net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30 net.netfilter.nf_conntrack_tcp_timeout_time_wait = 60 net.netfilter.nf_conntrack_tcp_timeout_close = 10 net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300 net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300 net.netfilter.nf_conntrack_udp_timeout = 30 net.netfilter.nf_conntrack_udp_timeout_stream = 180 net.netfilter.nf_conntrack_icmp_timeout = 30 net.netfilter.nf_conntrack_events_retry_timeout = 15 net.core.rmem_max = 8388608 net.core.wmem_max = 8388608 net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_tw_recycle=1 net.ipv4.tcp_fin_timeout=30 net.ipv4.tcp_keepalive_time=1800 net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=20 net.ipv4.tcp_max_syn_backlog=4096 net.ipv4.tcp_syncookies=1 net.ipv4.tcp_sack=1 net.ipv4.tcp_dsack=1 net.ipv4.tcp_window_scaling=1 net.ipv4.tcp_syn_retries=3 net.ipv4.tcp_synack_retries=3 net.ipv4.tcp_retries1=3 net.ipv4.tcp_retries2=15 net.ipv4.ip_local_port_range=102465535 net.ipv4.tcp_timestamps=0 net.core.netdev_max_backlog=1 net.core.somaxconn=10 net.core.optmem_max=81920 net.netfilter.nf_conntrack_max=524288 net.nf_conntrack_max=524288 net.ipv6.conf.all.disable_ipv6 = 1 fs.file-max=100 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_max_syn_backlog = 10240 net.ipv4.tcp_congestion_control=htcp net.ipv4.tcp_rfc1337 = 1 net.core.netdev_max_backlog = 65536 net.ipv4.tcp_max_tw_buckets = 144 net.core.rmem_max = 134217728 net.core.wmem_max = 134217728
Re: Something hitting my total number of connections to the server
There are multiple hosts/clients. All are mainly windows based. Timestamp is not used as my clients mainly are windows based and in that it tcp timestamp is by defauly disabled. sysctl is as follows: kernel.shmmax = 68719476736 kernel.shmall = 4294967296 kernel.pid_max=4194303 vm.max_map_count=131072 kernel.sem=250 32000 32 250 net.netfilter.nf_conntrack_generic_timeout = 300 net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 60 net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 30 net.netfilter.nf_conntrack_tcp_timeout_established = 7200 net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 60 net.netfilter.nf_conntrack_tcp_timeout_close_wait = 30 net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30 net.netfilter.nf_conntrack_tcp_timeout_time_wait = 60 net.netfilter.nf_conntrack_tcp_timeout_close = 10 net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300 net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300 net.netfilter.nf_conntrack_udp_timeout = 30 net.netfilter.nf_conntrack_udp_timeout_stream = 180 net.netfilter.nf_conntrack_icmp_timeout = 30 net.netfilter.nf_conntrack_events_retry_timeout = 15 net.core.rmem_max = 8388608 net.core.wmem_max = 8388608 net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_tw_recycle=1 net.ipv4.tcp_fin_timeout=30 net.ipv4.tcp_keepalive_time=1800 net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=20 net.ipv4.tcp_max_syn_backlog=4096 net.ipv4.tcp_syncookies=1 net.ipv4.tcp_sack=1 net.ipv4.tcp_dsack=1 net.ipv4.tcp_window_scaling=1 net.ipv4.tcp_syn_retries=3 net.ipv4.tcp_synack_retries=3 net.ipv4.tcp_retries1=3 net.ipv4.tcp_retries2=15 net.ipv4.ip_local_port_range=102465535 net.ipv4.tcp_timestamps=0 net.core.netdev_max_backlog=1 net.core.somaxconn=10 net.core.optmem_max=81920 net.netfilter.nf_conntrack_max=524288 net.nf_conntrack_max=524288 net.ipv6.conf.all.disable_ipv6 = 1 fs.file-max=100 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_max_syn_backlog = 10240 net.ipv4.tcp_congestion_control=htcp net.ipv4.tcp_rfc1337 = 1 net.core.netdev_max_backlog = 65536 net.ipv4.tcp_max_tw_buckets = 144 net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 On Mon, Aug 21, 2017 at 11:14 PM, Eric Dumazetwrote: > On Mon, 2017-08-21 at 10:44 -0700, Eric Dumazet wrote: > >> - Why is timewait not being used ? >> > > s/timewait/timestamps/ > > >
Re: Something hitting my total number of connections to the server
On Mon, 2017-08-21 at 10:44 -0700, Eric Dumazet wrote: > - Why is timewait not being used ? > s/timewait/timestamps/
Re: Something hitting my total number of connections to the server
On Mon, 2017-08-21 at 22:58 +0530, Akshat Kakkar wrote: > As mentioned in my initial description, the server is not sending > SYN-ACK. Thats what the main symptom. For completeness, its not > sending any RST also. > However, if I disable TCP timestamp ... the server starts giving SYN-ACK. > The strangest thing is, my client doesnt initiate a connection with > tcp timestamp, so how come disabling tcp timestamp is making things > work. As I said, maybe the bug was already fixed months ago. By running an old kernel, you want us to spend time on something that might already have been fixed. Only if you run a current kernel _and_ reproduce the problem, then we might take a look. I suspect your client is a single host ? - Why is timewait not being used ? - What sysctls have been changed on your server ?
Re: Something hitting my total number of connections to the server
On Monday, August 21, 2017, Eric Dumazetwrote: > > On Mon, 2017-08-21 at 15:26 +0530, Akshat Kakkar wrote: > > On Mon, Aug 21, 2017 at 3:13 PM, David Laight > > wrote: > > > From: Akshat Kakkar > > >> Sent: 18 August 2017 10:14 > > >> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet > > >> wrote: > > >> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote: > > >> > > > >> >> I upgraded to 4.4 but still experiencing same issue. > > >> >> Please help. > > >> > > > >> > Still too old kernel, shoot again ;) > > >> > > > >> > > > >> > > >> > > >> Sorry but that's the maximum I can try as of now as its the LT version. > > > > > > You should be able to build a current kernel and run it with your > > > existing user space. > > > > > > David > > > > > > > The issue is with tcp timestamp. When I am disabling it, things are > > working fine but when I enable the issue re-occurs. However, I am not > > seeing tcp timestamps on packet, even when it is enabled simply > > because my client doesn't support it. > > > > But the question is, if I my client doesnt support timestamp , why > > enabling timestamp on server side is creating an issue?? > > Maybe you changed some sysctls wrongly ? > > As mentioned in my initial description, the server is not sending SYN-ACK. Thats what the main symptom. For completeness, its not sending any RST also. However, if I disable TCP timestamp ... the server starts giving SYN-ACK. The strangest thing is, my client doesnt initiate a connection with tcp timestamp, so how come disabling tcp timestamp is making things work.
Re: Something hitting my total number of connections to the server
On Mon, 2017-08-21 at 15:26 +0530, Akshat Kakkar wrote: > On Mon, Aug 21, 2017 at 3:13 PM, David Laightwrote: > > From: Akshat Kakkar > >> Sent: 18 August 2017 10:14 > >> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet > >> wrote: > >> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote: > >> > > >> >> I upgraded to 4.4 but still experiencing same issue. > >> >> Please help. > >> > > >> > Still too old kernel, shoot again ;) > >> > > >> > > >> > >> > >> Sorry but that's the maximum I can try as of now as its the LT version. > > > > You should be able to build a current kernel and run it with your > > existing user space. > > > > David > > > > The issue is with tcp timestamp. When I am disabling it, things are > working fine but when I enable the issue re-occurs. However, I am not > seeing tcp timestamps on packet, even when it is enabled simply > because my client doesn't support it. > > But the question is, if I my client doesnt support timestamp , why > enabling timestamp on server side is creating an issue?? Maybe you changed some sysctls wrongly ?
Re: Something hitting my total number of connections to the server
On Mon, Aug 21, 2017 at 5:56 AM, Akshat Kakkarwrote: > > The issue is with tcp timestamp. When I am disabling it, things are > working fine but when I enable the issue re-occurs. However, I am not > seeing tcp timestamps on packet, even when it is enabled simply > because my client doesn't support it. > > But the question is, if I my client doesnt support timestamp , why > enabling timestamp on server side is creating an issue?? To help shed light on this, you could try collecting and dumping the nstat counters when the system is in the mode where it is not creating/accepting new connections, e.g.: nstat > /dev/null && sleep 10 && nstat The sleep interval would need to be long enough to cover a failing client connect attempt. It would also be helpful to gather a tcpdump trace over the interval, to see if the server is sending a RST, SYN+ACK, or nothing. neal
Re: Something hitting my total number of connections to the server
On Mon, Aug 21, 2017 at 3:13 PM, David Laightwrote: > From: Akshat Kakkar >> Sent: 18 August 2017 10:14 >> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet wrote: >> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote: >> > >> >> I upgraded to 4.4 but still experiencing same issue. >> >> Please help. >> > >> > Still too old kernel, shoot again ;) >> > >> > >> >> >> Sorry but that's the maximum I can try as of now as its the LT version. > > You should be able to build a current kernel and run it with your > existing user space. > > David > The issue is with tcp timestamp. When I am disabling it, things are working fine but when I enable the issue re-occurs. However, I am not seeing tcp timestamps on packet, even when it is enabled simply because my client doesn't support it. But the question is, if I my client doesnt support timestamp , why enabling timestamp on server side is creating an issue??
RE: Something hitting my total number of connections to the server
From: Akshat Kakkar > Sent: 18 August 2017 10:14 > On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazetwrote: > > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote: > > > >> I upgraded to 4.4 but still experiencing same issue. > >> Please help. > > > > Still too old kernel, shoot again ;) > > > > > > > Sorry but that's the maximum I can try as of now as its the LT version. You should be able to build a current kernel and run it with your existing user space. David
Re: Something hitting my total number of connections to the server
On Fri, 2017-08-18 at 18:14 +0530, Akshat Kakkar wrote: > On Fri, Aug 18, 2017 at 5:36 PM, Eric Dumazetwrote: > > On Fri, 2017-08-18 at 14:44 +0530, Akshat Kakkar wrote: > >> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet > >> wrote: > >> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote: > >> > > >> >> I upgraded to 4.4 but still experiencing same issue. > >> >> Please help. > >> > > >> > Still too old kernel, shoot again ;) > >> > > >> > > >> > >> > >> Sorry but that's the maximum I can try as of now as its the LT version. > >> > >> Besides, this issue was not present in 2.6.32 but came with 3.10 and > >> still there in 4.4, so I doubt if it has to do with some kernel and/or > >> kernel parameters much as you guys are good enough not to keep an > >> issue for so long (around 3 years). > >> > >> So please help. > > > > netdev is the developer list. > > > > We deal with recent kernels only. Because we already spent time fixing > > all these issues, we are not going to spend time fixing old kernels. > > > > Please to your distro provider to backport the needed patches. > > > > > > > I appreciate that. > Can you just recall if there was any such issue which was fixed after 4.4. More than one hundred patches yes. Sorry, someone else than me will have to build a list of these patches.
Re: Something hitting my total number of connections to the server
On Fri, Aug 18, 2017 at 5:36 PM, Eric Dumazetwrote: > On Fri, 2017-08-18 at 14:44 +0530, Akshat Kakkar wrote: >> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet wrote: >> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote: >> > >> >> I upgraded to 4.4 but still experiencing same issue. >> >> Please help. >> > >> > Still too old kernel, shoot again ;) >> > >> > >> >> >> Sorry but that's the maximum I can try as of now as its the LT version. >> >> Besides, this issue was not present in 2.6.32 but came with 3.10 and >> still there in 4.4, so I doubt if it has to do with some kernel and/or >> kernel parameters much as you guys are good enough not to keep an >> issue for so long (around 3 years). >> >> So please help. > > netdev is the developer list. > > We deal with recent kernels only. Because we already spent time fixing > all these issues, we are not going to spend time fixing old kernels. > > Please to your distro provider to backport the needed patches. > > > I appreciate that. Can you just recall if there was any such issue which was fixed after 4.4.
Re: Something hitting my total number of connections to the server
On Fri, 2017-08-18 at 14:44 +0530, Akshat Kakkar wrote: > On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazetwrote: > > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote: > > > >> I upgraded to 4.4 but still experiencing same issue. > >> Please help. > > > > Still too old kernel, shoot again ;) > > > > > > > Sorry but that's the maximum I can try as of now as its the LT version. > > Besides, this issue was not present in 2.6.32 but came with 3.10 and > still there in 4.4, so I doubt if it has to do with some kernel and/or > kernel parameters much as you guys are good enough not to keep an > issue for so long (around 3 years). > > So please help. netdev is the developer list. We deal with recent kernels only. Because we already spent time fixing all these issues, we are not going to spend time fixing old kernels. Please to your distro provider to backport the needed patches.
Re: Something hitting my total number of connections to the server
On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazetwrote: > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote: > >> I upgraded to 4.4 but still experiencing same issue. >> Please help. > > Still too old kernel, shoot again ;) > > Sorry but that's the maximum I can try as of now as its the LT version. Besides, this issue was not present in 2.6.32 but came with 3.10 and still there in 4.4, so I doubt if it has to do with some kernel and/or kernel parameters much as you guys are good enough not to keep an issue for so long (around 3 years). So please help.
Re: Something hitting my total number of connections to the server
On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote: > I upgraded to 4.4 but still experiencing same issue. > Please help. Still too old kernel, shoot again ;)
Re: Something hitting my total number of connections to the server
On Wed, Aug 16, 2017 at 4:04 PM, Eric Dumazetwrote: > On Wed, 2017-08-16 at 10:18 +0530, Akshat Kakkar wrote: >> On Mon, Aug 14, 2017 at 2:37 PM, Akshat Kakkar wrote: >> > I have centos 7.3 (Kernel 3.10) running on a server with 128GB RAM and >> > 2 x 10 Core Xeon Processor. >> > I have hosted a webserver on it and enabled ssh for remote maintenance. >> > Previously it was running on Centos 6.3. >> > After upgrading to CentOS 7.3, occasionally (probably when number of >> > hits are more on the server), I am not able to create new connections >> > (neither on web nor on ssh). Existing connections keeps on running >> > fine. >> > >> > I did packet capturing using tcpdump to understand if its some >> > intermediate network issue. >> > What I found was the server is not replying for new SYN requests. >> > >> > So it's clear that its not at all application issue. Also, there are >> > no logs in applications logs for any connections dropped, if any. >> > >> > I check my firewall rules if there is some rate limiting imposed. >> > There is nothing in there. >> > >> > I check tc, if by mistake some rate limiting is imposed. There is >> > nothing in there too. >> > >> > I have increased noOfFiles to 100 and other sysctl parameters, but >> > the issue is still there. >> > >> > Has anybody experienced the same? >> > >> > How to go about? Anybody ... Please Help!!! >> >> Its getting lonely out here. Anybody there ??? > > We wont help you unless you use a recent kernel. > > 3.10 misses all recent improvements in TCP stack (4 years of hard work) > > > > > I upgraded to 4.4 but still experiencing same issue. Please help.
Re: Something hitting my total number of connections to the server
On Wed, 2017-08-16 at 10:18 +0530, Akshat Kakkar wrote: > On Mon, Aug 14, 2017 at 2:37 PM, Akshat Kakkarwrote: > > I have centos 7.3 (Kernel 3.10) running on a server with 128GB RAM and > > 2 x 10 Core Xeon Processor. > > I have hosted a webserver on it and enabled ssh for remote maintenance. > > Previously it was running on Centos 6.3. > > After upgrading to CentOS 7.3, occasionally (probably when number of > > hits are more on the server), I am not able to create new connections > > (neither on web nor on ssh). Existing connections keeps on running > > fine. > > > > I did packet capturing using tcpdump to understand if its some > > intermediate network issue. > > What I found was the server is not replying for new SYN requests. > > > > So it's clear that its not at all application issue. Also, there are > > no logs in applications logs for any connections dropped, if any. > > > > I check my firewall rules if there is some rate limiting imposed. > > There is nothing in there. > > > > I check tc, if by mistake some rate limiting is imposed. There is > > nothing in there too. > > > > I have increased noOfFiles to 100 and other sysctl parameters, but > > the issue is still there. > > > > Has anybody experienced the same? > > > > How to go about? Anybody ... Please Help!!! > > Its getting lonely out here. Anybody there ??? We wont help you unless you use a recent kernel. 3.10 misses all recent improvements in TCP stack (4 years of hard work)
Re: Something hitting my total number of connections to the server
On Mon, Aug 14, 2017 at 2:37 PM, Akshat Kakkarwrote: > I have centos 7.3 (Kernel 3.10) running on a server with 128GB RAM and > 2 x 10 Core Xeon Processor. > I have hosted a webserver on it and enabled ssh for remote maintenance. > Previously it was running on Centos 6.3. > After upgrading to CentOS 7.3, occasionally (probably when number of > hits are more on the server), I am not able to create new connections > (neither on web nor on ssh). Existing connections keeps on running > fine. > > I did packet capturing using tcpdump to understand if its some > intermediate network issue. > What I found was the server is not replying for new SYN requests. > > So it's clear that its not at all application issue. Also, there are > no logs in applications logs for any connections dropped, if any. > > I check my firewall rules if there is some rate limiting imposed. > There is nothing in there. > > I check tc, if by mistake some rate limiting is imposed. There is > nothing in there too. > > I have increased noOfFiles to 100 and other sysctl parameters, but > the issue is still there. > > Has anybody experienced the same? > > How to go about? Anybody ... Please Help!!! Its getting lonely out here. Anybody there ???