Hi Sergio, I’m definitely not enough of a network wonk to make definitive statements on network configuration, finding your in-company network expert is definitely going to be a lot more productive. I’ve forgotten if you are on-prem or in AWS, so if in AWS replace “your network wonk” with “your AWS support contact” if you’re paying for support. I will make two more concrete observations though, and you can run these notions down as appropriate.
When C* starts up, see if the logs contain a warning about jemalloc not being detected. That’s something we missed in our 3.11.4 setup and is on my todo list to circle back around to evaluate later. JVMs have some rather complicated memory management that relates to efficient allocation of memory to threads (this isn’t strictly a JVM thing, but JVMs definitely care). If you have high connection counts, I can see that likely mattering to you. Also, as part of that, the memory arena setting of 4 that is Cassandra’s default may not be the right one for you. The more concurrency you have, the more that number may need to bump up to avoid contention on memory allocations. We haven’t played with it because our simultaneous connection counts are modest. Note that Cassandra can create a lot of threads but many of them have low activity so I think it’s more about how many area actually active. Large connection counts will move the needle up on you and may motivate tuning the arena count. When talking to your network person, I’d see what they think about C*’s defaults on TCP_NODELAY vs delayed ACKs. The Datastax docs say that the TCP_NODELAY default setting is false in C*, but I looked in the 3.11.4 source and the default is coded as true. It’s only via the config file samples that bounce around that it typically gets set to false. There are times where Nagle and delayed ACKs don’t play well together and induce stalls. I’m not the person to help you investigate that because it gets a bit gnarly on the details (for example, a refinement to the Nagle algorithm was proposed in the 1990’s that exists in some OS’s and can make my comments here moot). Somebody who lives this stuff will be a more definitive source, but you are welcome to copy-paste my thoughts to them for context. R From: Sergio <lapostadiser...@gmail.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Wednesday, October 30, 2019 at 5:56 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: Cassandra 3.11.4 Node the load starts to increase after few minutes to 40 on 4 CPU machine Message from External Sender Hi Reid, I don't have anymore this loading problem. I solved by changing the Cassandra Driver Configuration. Now my cluster is pretty stable and I don't have machines with crazy CPU Load. The only thing not urgent but I need to investigate is the number of ESTABLISHED TCP connections. I see just one node having 7K TCP connections ESTABLISHED while the others are having around 4-6K connection opened. So the newest nodes added into the cluster have a higher number of ESTABLISHED TCP connections. default['cassandra']['sysctl'] = { 'net.ipv4.tcp_keepalive_time' => 60, 'net.ipv4.tcp_keepalive_probes' => 3, 'net.ipv4.tcp_keepalive_intvl' => 10, 'net.core.rmem_max' => 16777216, 'net.core.wmem_max' => 16777216, 'net.core.rmem_default' => 16777216, 'net.core.wmem_default' => 16777216, 'net.core.optmem_max' => 40960, 'net.ipv4.tcp_rmem' => '4096 87380 16777216', 'net.ipv4.tcp_wmem' => '4096 65536 16777216', 'net.ipv4.ip_local_port_range' => '10000 65535', 'net.ipv4.tcp_window_scaling' => 1, 'net.core.netdev_max_backlog' => 2500, 'net.core.somaxconn' => 65000, 'vm.max_map_count' => 1048575, 'vm.swappiness' => 0 } These are my tweaked value and I used the values recommended from datastax. Do you have something different? Best, Sergio Il giorno mer 30 ott 2019 alle ore 13:27 Reid Pinchback <rpinchb...@tripadvisor.com<mailto:rpinchb...@tripadvisor.com>> ha scritto: Oh nvm, didn't see the later msg about just posting what your fix was. R On 10/30/19, 4:24 PM, "Reid Pinchback" <rpinchb...@tripadvisor.com<mailto:rpinchb...@tripadvisor.com>> wrote: Message from External Sender Hi Sergio, Assuming nobody is actually mounting a SYN flood attack, then this sounds like you're either being hammered with connection requests in very short periods of time, or your TCP backlog tuning is off. At least, that's where I'd start looking. If you take that log message and google it (Possible SYN flooding... Sending cookies") you'll find explanations. Or just googling "TCP backlog tuning". R On 10/30/19, 3:29 PM, "Sergio Bilello" <lapostadiser...@gmail.com<mailto:lapostadiser...@gmail.com>> wrote: > >Oct 17 00:23:03 prod-personalization-live-data-cassandra-08 kernel: TCP: request_sock_TCP: Possible SYN flooding on port 9042. Sending cookies. Check SNMP counters. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org> For additional commands, e-mail: user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>