https://docs.datastax.com/en/drivers/java/2.2/com/datastax/driver/core/policies/LatencyAwarePolicy.html I had to change the Policy in the Cassandra Driver. I solved this problem few weeks ago. I am just posting the solution for anyone that could hit the same issue. Best, Sergio
On 2019/10/17 02:46:01, Sergio Bilello <lapostadiser...@gmail.com> wrote: > Hello guys! > > I performed a thread dump > https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMTAvMTcvLS1kdW1wLnR4dC0tMC0zMC00MA==& > while try to join the node with > > -Dcassandra.join_ring=false > > OR > -Dcassandra.join.ring=false > > OR > > -Djoin.ring=false > > because the node spiked in load and latency was affecting the clients. > > With or without that flag the node is high in latency and I see the load sky > rocketing when the number of TCP established connections increases > > Analyzing the /var/log/messages I am able to read > > Oct 17 00:23:39 prod-personalization-live-data-cassandra-08 cassandra: INFO > [Service Thread] 2019-10-17 00:23:39,030 GCInspector.java:284 - G1 Young > Generation GC in 255ms. G1 Eden Space: 361758720 -> 0; G1 Old Gen: 1855455944 > -> 1781007048; G1 Survivor Space: 39845888 -> 32505856; > > Oct 17 00:23:40 prod-personalization-live-data-cassandra-08 cassandra: INFO > [ScheduledTasks:1] 2019-10-17 00:23:40,352 NoSpamLogger.java:91 - Some > operations were slow, details available at debug level (debug.log) > > > Oct 17 00:23:03 prod-personalization-live-data-cassandra-08 kernel: TCP: > request_sock_TCP: Possible SYN flooding on port 9042. Sending cookies. Check > SNMP counters. > > I don't see anything on debug.log that looks to be relevant > > The machine is on aws with 4 cpu with 32GB Ram and 1 TB SSD i3.xlarge > > > > > > [sergio.bilello@prod-personalization-live-data-cassandra-08 ~]$ nodetool > tpstats > > Pool Name Active Pending Completed Blocked All time blocked > > ReadStage 32 53 559304 0 0 > > MiscStage 0 0 0 0 0 > > CompactionExecutor 1 107 118 0 0 > > MutationStage 0 0 2695 0 0 > > MemtableReclaimMemory 0 0 11 0 0 > > PendingRangeCalculator 0 0 33 0 0 > > GossipStage 0 0 4314 0 0 > > SecondaryIndexManagement 0 0 0 0 0 > > HintsDispatcher 0 0 0 0 0 > > RequestResponseStage 0 0 421865 0 0 > > Native-Transport-Requests 22 0 1903400 0 0 > > ReadRepairStage 0 0 59078 0 0 > > CounterMutationStage 0 0 0 0 0 > > MigrationStage 0 0 0 0 0 > > MemtablePostFlush 0 0 32 0 0 > > PerDiskMemtableFlushWriter_0 0 0 11 0 0 > > ValidationExecutor 0 0 0 0 0 > > Sampler 0 0 0 0 0 > > MemtableFlushWriter 0 0 11 0 0 > > InternalResponseStage 0 0 0 0 0 > > ViewMutationStage 0 0 0 0 0 > > AntiEntropyStage 0 0 0 0 0 > > CacheCleanupExecutor 0 0 0 0 0 > > > > Message type Dropped > > READ 0 > > RANGE_SLICE 0 > > _TRACE 0 > > HINT 0 > > MUTATION 0 > > COUNTER_MUTATION 0 > > BATCH_STORE 0 > > BATCH_REMOVE 0 > > REQUEST_RESPONSE 0 > > PAGED_RANGE 0 > > READ_REPAIR 0 > > [sergio.bilello@prod-personalization-live-data-cassandra-08 ~]$ > > > > > > top - 01:44:15 up 2 days, 1:45, 4 users, load average: 34.45, 27.71, 15.37 > > Tasks: 140 total, 1 running, 74 sleeping, 0 stopped, 0 zombie > > %Cpu(s): 90.0 us, 4.5 sy, 3.0 ni, 1.1 id, 0.0 wa, 0.0 hi, 1.4 si, 0.0 st > > KiB Mem : 31391772 total, 250504 free, 10880364 used, 20260904 buff/cache > > KiB Swap: 0 total, 0 free, 0 used. 19341960 avail Mem > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 20712 cassand+ 20 0 194.1g 14.4g 4.6g S 392.0 48.2 74:50.48 java > > 20823 sergio.+ 20 0 124856 6304 3136 S 1.7 0.0 0:13.51 htop > > 7865 root 20 0 1062684 39880 11428 S 0.7 0.1 4:06.02 ir_agent > > 3557 consul 20 0 41568 30192 18832 S 0.3 0.1 13:16.37 consul > > 7600 root 20 0 2082700 46624 11880 S 0.3 0.1 4:14.60 ir_agent > > 1 root 20 0 193660 7740 5220 S 0.0 0.0 0:56.36 systemd > > 2 root 20 0 0 0 0 S 0.0 0.0 0:00.08 kthreadd > > 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H > > 6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq > > 7 root 20 0 0 0 0 S 0.0 0.0 0:06.04 ksoftirqd/0 > > > > > > [sergio.bilello@prod-personalization-live-data-cassandra-08 ~]$ free > > total used free shared buff/cache available > > Mem: 31391772 10880916 256732 426552 20254124 19341768 > > Swap: 0 0 0 > > [sergio.bilello@prod-personalization-live-data-cassandra-08 ~]$ > > > > > > > > bash-4.2$ java -jar sjk.jar ttop -p 20712 > > Monitoring threads ... > > > > 2019-10-17T01:45:33.352+0000 Process summary > > process cpu=363.58% > > application cpu=261.91% (user=248.65% sys=13.26%) > > other: cpu=101.67% > > thread count: 474 > > heap allocation rate 583mb/s > > [000039] user=13.56% sys=-0.59% alloc= 11mb/s - OptionalTasks:1 > > [000379] user= 8.57% sys=-0.27% alloc= 18mb/s - ReadStage-19 > > [000380] user= 7.85% sys= 0.22% alloc= 19mb/s - Native-Transport-Requests-21 > > [000295] user= 7.14% sys= 0.23% alloc= 14mb/s - Native-Transport-Requests-5 > > [000378] user= 7.14% sys=-0.03% alloc= 22mb/s - Native-Transport-Requests-17 > > [000514] user= 6.42% sys= 0.12% alloc= 20mb/s - Native-Transport-Requests-85 > > [000293] user= 6.66% sys=-0.32% alloc= 12mb/s - Native-Transport-Requests-2 > > [000392] user= 6.19% sys= 0.14% alloc= 9545kb/s - Native-Transport-Requests-12 > > [000492] user= 5.71% sys=-0.24% alloc= 15mb/s - Native-Transport-Requests-24 > > [000294] user= 5.23% sys=-0.25% alloc= 14mb/s - Native-Transport-Requests-3 > > [000381] user= 5.47% sys=-0.52% alloc= 7430kb/s - Native-Transport-Requests-23 > > [000672] user= 4.52% sys= 0.25% alloc= 14mb/s - Native-Transport-Requests-270 > > [000296] user= 5.23% sys=-0.47% alloc= 13mb/s - ReadStage-7 > > [000673] user= 4.52% sys= 0.05% alloc= 13mb/s - Native-Transport-Requests-269 > > [000118] user= 4.28% sys= 0.28% alloc= 13mb/s - Native-Transport-Requests-1 > > [000570] user= 4.28% sys= 0.27% alloc= 13mb/s - ReadStage-31 > > [000671] user= 4.28% sys= 0.25% alloc= 13mb/s - ReadStage-271 > > [000376] user= 4.52% sys=-0.03% alloc= 13mb/s - Native-Transport-Requests-13 > > [000302] user= 4.76% sys=-0.30% alloc= 13mb/s - ReadStage-8 > > [000377] user= 4.28% sys= 0.17% alloc= 13mb/s - ReadStage-15 > > > > > > ^Cbash-4.2$ dstat > > You did not select any stats, using -cdngy by default. > > ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- > > usr sys idl wai hiq siq| read writ| recv send| in out | int csw > > 33 2 65 0 0 0| 443k 7988k| 0 0 | 0 0 |3255 2324 > > 92 6 1 0 0 2|4732k 0 |1437k 1039k| 0 0 | 20k 24k > > 94 4 1 0 0 1|2588k 104k|1371k 545k| 0 0 | 13k 14k > > 91 6 1 0 0 2|1548k 0 |2045k 949k| 0 0 | 17k 20k > > 94 4 1 0 0 2|1120k 0 |1364k 372k| 0 0 | 12k 11k > > 95 4 0 0 0 1|3616k 0 |2137k 333k| 0 0 | 12k 7875 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org