Re: Something hitting my total number of connections to the server

2017-08-23 Thread Neal Cardwell
On Wed, Aug 23, 2017 at 1:08 AM, Akshat Kakkar  wrote:
>
> On Tue, Aug 22, 2017 at 5:58 PM, Neal Cardwell  wrote:
> > On Tue, Aug 22, 2017 at 1:42 AM, Akshat Kakkar  
> > wrote:
> >> There are multiple hosts/clients. All are mainly windows based.
> >>
> >> Timestamp is not used as my clients mainly are windows based and in
> >> that it tcp timestamp is by defauly disabled.
> > ...
> >> net.ipv4.tcp_tw_reuse=1
> >> net.ipv4.tcp_tw_recycle=1
> >
> > I suspect the problem is there. The net.ipv4.tcp_tw_recycle setting
> > should be 0. Running with the value 1 is known to cause buggy behavior
> > related to TCP timestamps, and that feature has been removed in kernel
> > v4.12.
> >
> > Can you please re-run your tests with net.ipv4.tcp_tw_recycle=0 or a
> > newer kernel?
> >
> > neal
>
> Thanks for your reply.
>
> I understand that.
>
> But my point is, though tcp timestamp is enabled on the server, but as
> client is not using it ... so how come this _bug_ (if any) is
> triggered in first place.

You mention "clients mainly are windows based". if they are only
"mainly" Windows-based, and some are of other OSes that do use TCP
timestamps, and the remote address is the same for TCP-timestamp-using
and non-TCP-timestamp-using clients, then running with timestamps
enabled on the server could tickle the bugs in pre-4.12 kernels that
save info from TCP-timestamp-using connections and erroneously try to
use that info to validate non-TCP-timestamp-using connections.

But the main point is that the configuration you cited
(net.ipv4.tcp_tw_recycle=1) is an unsupported configuration with known
bugs. The best resolution would be to just run with
net.ipv4.tcp_tw_recycle=0. It's not worth digging any further unless
you run with net.ipv4.tcp_tw_recycle=0 or a kernel that is v4.12 or
later and still have problems.

Hope that helps,
neal


Re: Something hitting my total number of connections to the server

2017-08-22 Thread Akshat Kakkar
On Tue, Aug 22, 2017 at 5:58 PM, Neal Cardwell  wrote:
> On Tue, Aug 22, 2017 at 1:42 AM, Akshat Kakkar  wrote:
>> There are multiple hosts/clients. All are mainly windows based.
>>
>> Timestamp is not used as my clients mainly are windows based and in
>> that it tcp timestamp is by defauly disabled.
> ...
>> net.ipv4.tcp_tw_reuse=1
>> net.ipv4.tcp_tw_recycle=1
>
> I suspect the problem is there. The net.ipv4.tcp_tw_recycle setting
> should be 0. Running with the value 1 is known to cause buggy behavior
> related to TCP timestamps, and that feature has been removed in kernel
> v4.12.
>
> Can you please re-run your tests with net.ipv4.tcp_tw_recycle=0 or a
> newer kernel?
>
> neal

Thanks for your reply.

I understand that.

But my point is, though tcp timestamp is enabled on the server, but as
client is not using it ... so how come this _bug_ (if any) is
triggered in first place.


Re: Something hitting my total number of connections to the server

2017-08-22 Thread Eric Dumazet
On Tue, 2017-08-22 at 10:46 -0700, David Ahern wrote:
> On 8/22/17 10:44 AM, Eric Dumazet wrote:
> > Willem wrote this doc in 2013, before we finally went back to 1000.
> > 
> > We should update this doc.
> 
> 
> And these too:
> 
> $ egrep -r netdev_max_backlog Documentation/networking/
> Documentation/networking//cxgb.txt:  sysctl -w
> net.core.netdev_max_backlog=30
> Documentation/networking//ixgb.txt:net.core.netdev_max_backlog = 30

Yes, whoever wrote this had no idea of the implications I guess.




Re: Something hitting my total number of connections to the server

2017-08-22 Thread David Ahern
On 8/22/17 10:44 AM, Eric Dumazet wrote:
> Willem wrote this doc in 2013, before we finally went back to 1000.
> 
> We should update this doc.


And these too:

$ egrep -r netdev_max_backlog Documentation/networking/
Documentation/networking//cxgb.txt:  sysctl -w
net.core.netdev_max_backlog=30
Documentation/networking//ixgb.txt:net.core.netdev_max_backlog = 30


Re: Something hitting my total number of connections to the server

2017-08-22 Thread Eric Dumazet
On Tue, 2017-08-22 at 09:43 -0700, David Ahern wrote:
> On 8/22/17 6:02 AM, Eric Dumazet wrote:
> >>
> >> net.core.netdev_max_backlog=1
> > This is an insane backlog.
> > 
> 
> https://www.kernel.org/doc/Documentation/networking/scaling.txt
> 
> "== Suggested Configuration
> 
> Flow limit is useful on systems with many concurrent connections,
> where a single connection taking up 50% of a CPU indicates a problem.
> In such environments, enable the feature on all CPUs that handle
> network rx interrupts (as set in /proc/irq/N/smp_affinity).
> 
> The feature depends on the input packet queue length to exceed
> the flow limit threshold (50%) + the flow history length (256).
> Setting net.core.netdev_max_backlog to either 1000 or 1
> performed well in experiments."

1 is adding tail latencies.

At Google we run all the fleet with backlog of 1000

And yes, it took time to get rid of the backlog of 1 that was setup
years ago, because of old constraints and some fears.

Willem wrote this doc in 2013, before we finally went back to 1000.

We should update this doc.




Re: Something hitting my total number of connections to the server

2017-08-22 Thread David Ahern
On 8/22/17 6:02 AM, Eric Dumazet wrote:
>>
>> net.core.netdev_max_backlog=1
> This is an insane backlog.
> 

https://www.kernel.org/doc/Documentation/networking/scaling.txt

"== Suggested Configuration

Flow limit is useful on systems with many concurrent connections,
where a single connection taking up 50% of a CPU indicates a problem.
In such environments, enable the feature on all CPUs that handle
network rx interrupts (as set in /proc/irq/N/smp_affinity).

The feature depends on the input packet queue length to exceed
the flow limit threshold (50%) + the flow history length (256).
Setting net.core.netdev_max_backlog to either 1000 or 1
performed well in experiments."


Re: Something hitting my total number of connections to the server

2017-08-22 Thread Eric Dumazet
On Tue, 2017-08-22 at 11:12 +0530, Akshat Kakkar wrote:
> There are multiple hosts/clients. All are mainly windows based.
> 
> Timestamp is not used as my clients mainly are windows based and in
> that it tcp timestamp is by defauly disabled.
> 
> sysctl is as follows:
> 
> kernel.shmmax = 68719476736
> kernel.shmall = 4294967296
> kernel.pid_max=4194303
> vm.max_map_count=131072
> kernel.sem=250 32000 32 250
> 
> net.netfilter.nf_conntrack_generic_timeout = 300
> net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 60
> net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 30
> net.netfilter.nf_conntrack_tcp_timeout_established = 7200
> net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 60
> net.netfilter.nf_conntrack_tcp_timeout_close_wait = 30
> net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
> net.netfilter.nf_conntrack_tcp_timeout_time_wait = 60
> net.netfilter.nf_conntrack_tcp_timeout_close = 10
> net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
> net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
> net.netfilter.nf_conntrack_udp_timeout = 30
> net.netfilter.nf_conntrack_udp_timeout_stream = 180
> net.netfilter.nf_conntrack_icmp_timeout = 30
> net.netfilter.nf_conntrack_events_retry_timeout = 15
> net.core.rmem_max = 8388608
> net.core.wmem_max = 8388608
> 
> net.ipv4.tcp_tw_reuse=1
> net.ipv4.tcp_tw_recycle=1

This is exactly what I feared.

We do not support tcp_tw_reuse = 1 AND tcp_tw_recycle = 1

This is a very well known bad combination.



> net.ipv4.tcp_fin_timeout=30
> net.ipv4.tcp_keepalive_time=1800
> net.ipv4.tcp_keepalive_intvl=60
> net.ipv4.tcp_keepalive_probes=20
> net.ipv4.tcp_max_syn_backlog=4096
> net.ipv4.tcp_syncookies=1
> net.ipv4.tcp_sack=1
> net.ipv4.tcp_dsack=1
> net.ipv4.tcp_window_scaling=1
> net.ipv4.tcp_syn_retries=3
> net.ipv4.tcp_synack_retries=3
> net.ipv4.tcp_retries1=3
> net.ipv4.tcp_retries2=15
> net.ipv4.ip_local_port_range=102465535
> 
> net.ipv4.tcp_timestamps=0
> 
> net.core.netdev_max_backlog=1

This is an insane backlog.

> net.core.somaxconn=10
> net.core.optmem_max=81920
> 
> net.netfilter.nf_conntrack_max=524288
> net.nf_conntrack_max=524288
> net.ipv6.conf.all.disable_ipv6 = 1
> fs.file-max=100
> 
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_max_syn_backlog = 10240
> net.ipv4.tcp_congestion_control=htcp
> 
> net.ipv4.tcp_rfc1337 = 1
> net.core.netdev_max_backlog = 65536

This is a crazy backlog. Do not do that.


> net.ipv4.tcp_max_tw_buckets = 144
> 
> net.core.rmem_max = 134217728
> net.core.wmem_max = 134217728
> 
> 
> 


It looks like your sysctls have been set to unreasonable values.





Re: Something hitting my total number of connections to the server

2017-08-22 Thread Neal Cardwell
On Tue, Aug 22, 2017 at 1:42 AM, Akshat Kakkar  wrote:
> There are multiple hosts/clients. All are mainly windows based.
>
> Timestamp is not used as my clients mainly are windows based and in
> that it tcp timestamp is by defauly disabled.
...
> net.ipv4.tcp_tw_reuse=1
> net.ipv4.tcp_tw_recycle=1

I suspect the problem is there. The net.ipv4.tcp_tw_recycle setting
should be 0. Running with the value 1 is known to cause buggy behavior
related to TCP timestamps, and that feature has been removed in kernel
v4.12.

Can you please re-run your tests with net.ipv4.tcp_tw_recycle=0 or a
newer kernel?

neal


Re: Something hitting my total number of connections to the server

2017-08-21 Thread Akshat Kakkar
On Tue, Aug 22, 2017 at 11:12 AM, Akshat Kakkar  wrote:
> There are multiple hosts/clients. All are mainly windows based.
>
> Timestamp is not used as my clients mainly are windows based and in
> that it tcp timestamp is by defauly disabled.
>
> sysctl is as follows:
>
> kernel.shmmax = 68719476736
> kernel.shmall = 4294967296
> kernel.pid_max=4194303
> vm.max_map_count=131072
> kernel.sem=250 32000 32 250
>
> net.netfilter.nf_conntrack_generic_timeout = 300
> net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 60
> net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 30
> net.netfilter.nf_conntrack_tcp_timeout_established = 7200
> net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 60
> net.netfilter.nf_conntrack_tcp_timeout_close_wait = 30
> net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
> net.netfilter.nf_conntrack_tcp_timeout_time_wait = 60
> net.netfilter.nf_conntrack_tcp_timeout_close = 10
> net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
> net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
> net.netfilter.nf_conntrack_udp_timeout = 30
> net.netfilter.nf_conntrack_udp_timeout_stream = 180
> net.netfilter.nf_conntrack_icmp_timeout = 30
> net.netfilter.nf_conntrack_events_retry_timeout = 15
> net.core.rmem_max = 8388608
> net.core.wmem_max = 8388608
>
> net.ipv4.tcp_tw_reuse=1
> net.ipv4.tcp_tw_recycle=1
> net.ipv4.tcp_fin_timeout=30
> net.ipv4.tcp_keepalive_time=1800
> net.ipv4.tcp_keepalive_intvl=60
> net.ipv4.tcp_keepalive_probes=20
> net.ipv4.tcp_max_syn_backlog=4096
> net.ipv4.tcp_syncookies=1
> net.ipv4.tcp_sack=1
> net.ipv4.tcp_dsack=1
> net.ipv4.tcp_window_scaling=1
> net.ipv4.tcp_syn_retries=3
> net.ipv4.tcp_synack_retries=3
> net.ipv4.tcp_retries1=3
> net.ipv4.tcp_retries2=15
> net.ipv4.ip_local_port_range=102465535
>
> net.ipv4.tcp_timestamps=0
>
> net.core.netdev_max_backlog=1
> net.core.somaxconn=10
> net.core.optmem_max=81920
>
> net.netfilter.nf_conntrack_max=524288
> net.nf_conntrack_max=524288
> net.ipv6.conf.all.disable_ipv6 = 1
> fs.file-max=100
>
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_max_syn_backlog = 10240
> net.ipv4.tcp_congestion_control=htcp
>
> net.ipv4.tcp_rfc1337 = 1
> net.core.netdev_max_backlog = 65536
> net.ipv4.tcp_max_tw_buckets = 144
>
> net.core.rmem_max = 134217728
> net.core.wmem_max = 134217728
>
>
>
>
> On Mon, Aug 21, 2017 at 11:14 PM, Eric Dumazet  wrote:
>> On Mon, 2017-08-21 at 10:44 -0700, Eric Dumazet wrote:
>>
>>> - Why is timewait not being used ?
>>>
>>
>> s/timewait/timestamps/
>>
>>
>>
[Apologies for top post.]


There are multiple hosts/clients. All are mainly windows based.

Timestamp is not used as my clients mainly are windows based and in
that it tcp timestamp is by defauly disabled.

sysctl is as follows:

kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.pid_max=4194303
vm.max_map_count=131072
kernel.sem=250 32000 32 250

net.netfilter.nf_conntrack_generic_timeout = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 30
net.netfilter.nf_conntrack_tcp_timeout_established = 7200
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 30
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_events_retry_timeout = 15
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608

net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_keepalive_time=1800
net.ipv4.tcp_keepalive_intvl=60
net.ipv4.tcp_keepalive_probes=20
net.ipv4.tcp_max_syn_backlog=4096
net.ipv4.tcp_syncookies=1
net.ipv4.tcp_sack=1
net.ipv4.tcp_dsack=1
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_syn_retries=3
net.ipv4.tcp_synack_retries=3
net.ipv4.tcp_retries1=3
net.ipv4.tcp_retries2=15
net.ipv4.ip_local_port_range=102465535

net.ipv4.tcp_timestamps=0

net.core.netdev_max_backlog=1
net.core.somaxconn=10
net.core.optmem_max=81920

net.netfilter.nf_conntrack_max=524288
net.nf_conntrack_max=524288
net.ipv6.conf.all.disable_ipv6 = 1
fs.file-max=100

net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_max_syn_backlog = 10240
net.ipv4.tcp_congestion_control=htcp

net.ipv4.tcp_rfc1337 = 1
net.core.netdev_max_backlog = 65536
net.ipv4.tcp_max_tw_buckets = 144

net.core.rmem_max = 134217728
net.core.wmem_max = 134217728


Re: Something hitting my total number of connections to the server

2017-08-21 Thread Akshat Kakkar
There are multiple hosts/clients. All are mainly windows based.

Timestamp is not used as my clients mainly are windows based and in
that it tcp timestamp is by defauly disabled.

sysctl is as follows:

kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.pid_max=4194303
vm.max_map_count=131072
kernel.sem=250 32000 32 250

net.netfilter.nf_conntrack_generic_timeout = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 30
net.netfilter.nf_conntrack_tcp_timeout_established = 7200
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 30
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_events_retry_timeout = 15
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608

net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_keepalive_time=1800
net.ipv4.tcp_keepalive_intvl=60
net.ipv4.tcp_keepalive_probes=20
net.ipv4.tcp_max_syn_backlog=4096
net.ipv4.tcp_syncookies=1
net.ipv4.tcp_sack=1
net.ipv4.tcp_dsack=1
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_syn_retries=3
net.ipv4.tcp_synack_retries=3
net.ipv4.tcp_retries1=3
net.ipv4.tcp_retries2=15
net.ipv4.ip_local_port_range=102465535

net.ipv4.tcp_timestamps=0

net.core.netdev_max_backlog=1
net.core.somaxconn=10
net.core.optmem_max=81920

net.netfilter.nf_conntrack_max=524288
net.nf_conntrack_max=524288
net.ipv6.conf.all.disable_ipv6 = 1
fs.file-max=100

net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_max_syn_backlog = 10240
net.ipv4.tcp_congestion_control=htcp

net.ipv4.tcp_rfc1337 = 1
net.core.netdev_max_backlog = 65536
net.ipv4.tcp_max_tw_buckets = 144

net.core.rmem_max = 134217728
net.core.wmem_max = 134217728




On Mon, Aug 21, 2017 at 11:14 PM, Eric Dumazet  wrote:
> On Mon, 2017-08-21 at 10:44 -0700, Eric Dumazet wrote:
>
>> - Why is timewait not being used ?
>>
>
> s/timewait/timestamps/
>
>
>


Re: Something hitting my total number of connections to the server

2017-08-21 Thread Eric Dumazet
On Mon, 2017-08-21 at 10:44 -0700, Eric Dumazet wrote:

> - Why is timewait not being used ?
> 

s/timewait/timestamps/





Re: Something hitting my total number of connections to the server

2017-08-21 Thread Eric Dumazet
On Mon, 2017-08-21 at 22:58 +0530, Akshat Kakkar wrote:

> As mentioned in my initial description, the server is not sending
> SYN-ACK. Thats what the main symptom. For completeness, its not
> sending any RST also.
> However, if I disable TCP timestamp ... the server starts giving SYN-ACK.
> The strangest thing is, my client doesnt initiate a connection with
> tcp timestamp, so how come disabling tcp timestamp is making things
> work.

As I said, maybe the bug was already fixed months ago.

By running an old kernel, you want us to spend time on something that
might already have been fixed.

Only if you run a current kernel _and_ reproduce the problem, then we
might take a look.

I suspect your client is a single host ?

- Why is timewait not being used ?

- What sysctls have been changed on your server ?





Re: Something hitting my total number of connections to the server

2017-08-21 Thread Akshat Kakkar
On Monday, August 21, 2017, Eric Dumazet  wrote:
>
> On Mon, 2017-08-21 at 15:26 +0530, Akshat Kakkar wrote:
> > On Mon, Aug 21, 2017 at 3:13 PM, David Laight  
> > wrote:
> > > From: Akshat Kakkar
> > >> Sent: 18 August 2017 10:14
> > >> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet  
> > >> wrote:
> > >> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote:
> > >> >
> > >> >> I upgraded to 4.4 but still experiencing same issue.
> > >> >> Please help.
> > >> >
> > >> > Still too old kernel, shoot again ;)
> > >> >
> > >> >
> > >>
> > >>
> > >> Sorry but that's the maximum I can try as of now as its the LT version.
> > >
> > > You should be able to build a current kernel and run it with your
> > > existing user space.
> > >
> > > David
> > >
> >
> > The issue is with tcp timestamp. When I am disabling it, things are
> > working fine but when I enable the issue re-occurs. However, I am not
> > seeing tcp timestamps on packet, even when it is enabled simply
> > because my client doesn't support it.
> >
> > But the question is, if I my client doesnt support timestamp , why
> > enabling timestamp on server side is creating an issue??
>
> Maybe you changed some sysctls wrongly ?
>
>

As mentioned in my initial description, the server is not sending
SYN-ACK. Thats what the main symptom. For completeness, its not
sending any RST also.
However, if I disable TCP timestamp ... the server starts giving SYN-ACK.
The strangest thing is, my client doesnt initiate a connection with
tcp timestamp, so how come disabling tcp timestamp is making things
work.


Re: Something hitting my total number of connections to the server

2017-08-21 Thread Eric Dumazet
On Mon, 2017-08-21 at 15:26 +0530, Akshat Kakkar wrote:
> On Mon, Aug 21, 2017 at 3:13 PM, David Laight  wrote:
> > From: Akshat Kakkar
> >> Sent: 18 August 2017 10:14
> >> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet  
> >> wrote:
> >> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote:
> >> >
> >> >> I upgraded to 4.4 but still experiencing same issue.
> >> >> Please help.
> >> >
> >> > Still too old kernel, shoot again ;)
> >> >
> >> >
> >>
> >>
> >> Sorry but that's the maximum I can try as of now as its the LT version.
> >
> > You should be able to build a current kernel and run it with your
> > existing user space.
> >
> > David
> >
> 
> The issue is with tcp timestamp. When I am disabling it, things are
> working fine but when I enable the issue re-occurs. However, I am not
> seeing tcp timestamps on packet, even when it is enabled simply
> because my client doesn't support it.
> 
> But the question is, if I my client doesnt support timestamp , why
> enabling timestamp on server side is creating an issue??

Maybe you changed some sysctls wrongly ?




Re: Something hitting my total number of connections to the server

2017-08-21 Thread Neal Cardwell
On Mon, Aug 21, 2017 at 5:56 AM, Akshat Kakkar  wrote:
>
> The issue is with tcp timestamp. When I am disabling it, things are
> working fine but when I enable the issue re-occurs. However, I am not
> seeing tcp timestamps on packet, even when it is enabled simply
> because my client doesn't support it.
>
> But the question is, if I my client doesnt support timestamp , why
> enabling timestamp on server side is creating an issue??

To help shed light on this, you could try collecting and dumping the
nstat counters when the system is in the mode where it is not
creating/accepting new connections, e.g.:

nstat > /dev/null && sleep 10 && nstat

The sleep interval would need to be long enough to cover a failing
client connect attempt. It would also be helpful to gather a tcpdump
trace over the interval, to see if the server is sending a RST,
SYN+ACK, or nothing.

neal


Re: Something hitting my total number of connections to the server

2017-08-21 Thread Akshat Kakkar
On Mon, Aug 21, 2017 at 3:13 PM, David Laight  wrote:
> From: Akshat Kakkar
>> Sent: 18 August 2017 10:14
>> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet  wrote:
>> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote:
>> >
>> >> I upgraded to 4.4 but still experiencing same issue.
>> >> Please help.
>> >
>> > Still too old kernel, shoot again ;)
>> >
>> >
>>
>>
>> Sorry but that's the maximum I can try as of now as its the LT version.
>
> You should be able to build a current kernel and run it with your
> existing user space.
>
> David
>

The issue is with tcp timestamp. When I am disabling it, things are
working fine but when I enable the issue re-occurs. However, I am not
seeing tcp timestamps on packet, even when it is enabled simply
because my client doesn't support it.

But the question is, if I my client doesnt support timestamp , why
enabling timestamp on server side is creating an issue??


RE: Something hitting my total number of connections to the server

2017-08-21 Thread David Laight
From: Akshat Kakkar
> Sent: 18 August 2017 10:14
> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet  wrote:
> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote:
> >
> >> I upgraded to 4.4 but still experiencing same issue.
> >> Please help.
> >
> > Still too old kernel, shoot again ;)
> >
> >
> 
> 
> Sorry but that's the maximum I can try as of now as its the LT version.

You should be able to build a current kernel and run it with your
existing user space.

David



Re: Something hitting my total number of connections to the server

2017-08-18 Thread Eric Dumazet
On Fri, 2017-08-18 at 18:14 +0530, Akshat Kakkar wrote:
> On Fri, Aug 18, 2017 at 5:36 PM, Eric Dumazet  wrote:
> > On Fri, 2017-08-18 at 14:44 +0530, Akshat Kakkar wrote:
> >> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet  
> >> wrote:
> >> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote:
> >> >
> >> >> I upgraded to 4.4 but still experiencing same issue.
> >> >> Please help.
> >> >
> >> > Still too old kernel, shoot again ;)
> >> >
> >> >
> >>
> >>
> >> Sorry but that's the maximum I can try as of now as its the LT version.
> >>
> >> Besides, this issue was not present in 2.6.32 but came with 3.10 and
> >> still there in 4.4, so I doubt if it has to do with some kernel and/or
> >> kernel parameters much as you guys are good enough not to keep an
> >> issue for so long (around 3 years).
> >>
> >> So please help.
> >
> > netdev is the developer list.
> >
> > We deal with recent kernels only. Because we already spent time fixing
> > all these issues, we are not going to spend time fixing old kernels.
> >
> > Please to your distro provider to backport the needed patches.
> >
> >
> >
> I appreciate that.
> Can you just recall if there was any such issue which was fixed after 4.4.

More than one hundred patches yes.

Sorry, someone else than me will have to build a list of these patches.




Re: Something hitting my total number of connections to the server

2017-08-18 Thread Akshat Kakkar
On Fri, Aug 18, 2017 at 5:36 PM, Eric Dumazet  wrote:
> On Fri, 2017-08-18 at 14:44 +0530, Akshat Kakkar wrote:
>> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet  wrote:
>> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote:
>> >
>> >> I upgraded to 4.4 but still experiencing same issue.
>> >> Please help.
>> >
>> > Still too old kernel, shoot again ;)
>> >
>> >
>>
>>
>> Sorry but that's the maximum I can try as of now as its the LT version.
>>
>> Besides, this issue was not present in 2.6.32 but came with 3.10 and
>> still there in 4.4, so I doubt if it has to do with some kernel and/or
>> kernel parameters much as you guys are good enough not to keep an
>> issue for so long (around 3 years).
>>
>> So please help.
>
> netdev is the developer list.
>
> We deal with recent kernels only. Because we already spent time fixing
> all these issues, we are not going to spend time fixing old kernels.
>
> Please to your distro provider to backport the needed patches.
>
>
>
I appreciate that.
Can you just recall if there was any such issue which was fixed after 4.4.


Re: Something hitting my total number of connections to the server

2017-08-18 Thread Eric Dumazet
On Fri, 2017-08-18 at 14:44 +0530, Akshat Kakkar wrote:
> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet  wrote:
> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote:
> >
> >> I upgraded to 4.4 but still experiencing same issue.
> >> Please help.
> >
> > Still too old kernel, shoot again ;)
> >
> >
> 
> 
> Sorry but that's the maximum I can try as of now as its the LT version.
> 
> Besides, this issue was not present in 2.6.32 but came with 3.10 and
> still there in 4.4, so I doubt if it has to do with some kernel and/or
> kernel parameters much as you guys are good enough not to keep an
> issue for so long (around 3 years).
> 
> So please help.

netdev is the developer list.

We deal with recent kernels only. Because we already spent time fixing
all these issues, we are not going to spend time fixing old kernels.

Please to your distro provider to backport the needed patches.





Re: Something hitting my total number of connections to the server

2017-08-18 Thread Akshat Kakkar
On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet  wrote:
> On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote:
>
>> I upgraded to 4.4 but still experiencing same issue.
>> Please help.
>
> Still too old kernel, shoot again ;)
>
>


Sorry but that's the maximum I can try as of now as its the LT version.

Besides, this issue was not present in 2.6.32 but came with 3.10 and
still there in 4.4, so I doubt if it has to do with some kernel and/or
kernel parameters much as you guys are good enough not to keep an
issue for so long (around 3 years).

So please help.


Re: Something hitting my total number of connections to the server

2017-08-17 Thread Eric Dumazet
On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote:

> I upgraded to 4.4 but still experiencing same issue.
> Please help.

Still too old kernel, shoot again ;)




Re: Something hitting my total number of connections to the server

2017-08-17 Thread Akshat Kakkar
On Wed, Aug 16, 2017 at 4:04 PM, Eric Dumazet  wrote:
> On Wed, 2017-08-16 at 10:18 +0530, Akshat Kakkar wrote:
>> On Mon, Aug 14, 2017 at 2:37 PM, Akshat Kakkar  wrote:
>> > I have centos 7.3 (Kernel 3.10) running on a server with 128GB RAM and
>> > 2 x 10 Core Xeon Processor.
>> > I have hosted a webserver on it and enabled ssh for remote maintenance.
>> > Previously it was running on Centos 6.3.
>> > After upgrading to CentOS 7.3, occasionally (probably when number of
>> > hits are more on the server), I am not able to create new connections
>> > (neither on web nor on ssh). Existing connections keeps on running
>> > fine.
>> >
>> > I did packet capturing using tcpdump to understand if its some
>> > intermediate network issue.
>> > What I found was the server is not replying for new SYN requests.
>> >
>> > So it's clear that its not at all application issue. Also, there are
>> > no logs in applications logs for any connections dropped, if any.
>> >
>> > I check my firewall rules if there is some rate limiting imposed.
>> > There is nothing in there.
>> >
>> > I check tc, if by mistake some rate limiting is imposed. There is
>> > nothing in there too.
>> >
>> > I have increased noOfFiles to 100 and other sysctl parameters, but
>> > the issue is still there.
>> >
>> > Has anybody experienced the same?
>> >
>> > How to go about? Anybody ... Please Help!!!
>>
>> Its getting lonely out here. Anybody there ???
>
> We wont help you unless you use a recent kernel.
>
> 3.10 misses all recent improvements in TCP stack (4 years of hard work)
>
>
>
>
>

I upgraded to 4.4 but still experiencing same issue.
Please help.


Re: Something hitting my total number of connections to the server

2017-08-16 Thread Eric Dumazet
On Wed, 2017-08-16 at 10:18 +0530, Akshat Kakkar wrote:
> On Mon, Aug 14, 2017 at 2:37 PM, Akshat Kakkar  wrote:
> > I have centos 7.3 (Kernel 3.10) running on a server with 128GB RAM and
> > 2 x 10 Core Xeon Processor.
> > I have hosted a webserver on it and enabled ssh for remote maintenance.
> > Previously it was running on Centos 6.3.
> > After upgrading to CentOS 7.3, occasionally (probably when number of
> > hits are more on the server), I am not able to create new connections
> > (neither on web nor on ssh). Existing connections keeps on running
> > fine.
> >
> > I did packet capturing using tcpdump to understand if its some
> > intermediate network issue.
> > What I found was the server is not replying for new SYN requests.
> >
> > So it's clear that its not at all application issue. Also, there are
> > no logs in applications logs for any connections dropped, if any.
> >
> > I check my firewall rules if there is some rate limiting imposed.
> > There is nothing in there.
> >
> > I check tc, if by mistake some rate limiting is imposed. There is
> > nothing in there too.
> >
> > I have increased noOfFiles to 100 and other sysctl parameters, but
> > the issue is still there.
> >
> > Has anybody experienced the same?
> >
> > How to go about? Anybody ... Please Help!!!
> 
> Its getting lonely out here. Anybody there ???

We wont help you unless you use a recent kernel.

3.10 misses all recent improvements in TCP stack (4 years of hard work)







Re: Something hitting my total number of connections to the server

2017-08-15 Thread Akshat Kakkar
On Mon, Aug 14, 2017 at 2:37 PM, Akshat Kakkar  wrote:
> I have centos 7.3 (Kernel 3.10) running on a server with 128GB RAM and
> 2 x 10 Core Xeon Processor.
> I have hosted a webserver on it and enabled ssh for remote maintenance.
> Previously it was running on Centos 6.3.
> After upgrading to CentOS 7.3, occasionally (probably when number of
> hits are more on the server), I am not able to create new connections
> (neither on web nor on ssh). Existing connections keeps on running
> fine.
>
> I did packet capturing using tcpdump to understand if its some
> intermediate network issue.
> What I found was the server is not replying for new SYN requests.
>
> So it's clear that its not at all application issue. Also, there are
> no logs in applications logs for any connections dropped, if any.
>
> I check my firewall rules if there is some rate limiting imposed.
> There is nothing in there.
>
> I check tc, if by mistake some rate limiting is imposed. There is
> nothing in there too.
>
> I have increased noOfFiles to 100 and other sysctl parameters, but
> the issue is still there.
>
> Has anybody experienced the same?
>
> How to go about? Anybody ... Please Help!!!

Its getting lonely out here. Anybody there ???