Re: Cassandra eats all cpu cores, high load average

Skvazh Roman Fri, 12 Feb 2016 07:31:22 -0800

I have disabled autocompaction and stop it on highload node.
Freezes all nodes sequentially, 2-6 simultaneously.


Heap is 8Gb. gc_grace is 86400
All sstables is about 200-300 Mb.

$ nodetool compactionstats
pending tasks: 14


$ dstat -lvnr 10
---load-avg--- ---procs--- ------memory-usage----- ---paging-- -dsk/total- 
---system-- ----total-cpu-usage---- -net/total- --io/total-
 1m   5m  15m |run blk new| used  buff  cach  free|  in   out | read  writ| int 
  csw |usr sys idl wai hiq siq| recv  send| read  writ
29.4 28.6 23.5|0.0   0 1.2|11.3G  190M 17.6G  407M|   0     0 |7507k 7330k|  
13k   40k| 11   1  88   0   0   0|   0     0 |96.5  64.6
29.3 28.6 23.5| 29   0 0.9|11.3G  190M 17.6G  408M|   0     0 |   0   189k|9822 
 2319 | 99   0   0   0   0   0| 138k  120k|   0  4.30
29.4 28.6 23.6| 30   0 2.0|11.3G  190M 17.6G  408M|   0     0 |   0    26k|8689 
 2189 |100   0   0   0   0   0| 139k  120k|   0  2.70
29.4 28.7 23.6| 29   0 3.0|11.3G  190M 17.6G  408M|   0     0 |   0    20k|8722 
 1846 | 99   0   0   0   0   0| 136k  120k|   0  1.50 ^C


JvmTop 0.8.0 alpha - 15:20:37,  amd64, 16 cpus, Linux 3.14.44-3, load avg 28.09
 http://code.google.com/p/jvmtop

 PID 32505: org.apache.cassandra.service.CassandraDaemon
 ARGS:
 VMARGS: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar -XX:+CMSCl[...]
 VM: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_65
 UP:  8:31m  #THR: 334  #THRPEAK: 437  #THRCREATED: 4694 USER: cassandra
 GC-Time:  0: 8m   #GC-Runs: 6378      #TotalLoadedClasses: 5926
 CPU: 97.96% GC:  0.00% HEAP:6049m /7540m NONHEAP:  82m /  n/a

  TID   NAME                                    STATE    CPU  TOTALCPU BLOCKEDBY
    447 SharedPool-Worker-45                 RUNNABLE 60.47%     1.03%
    343 SharedPool-Worker-2                  RUNNABLE 56.46%     3.07%
    349 SharedPool-Worker-8                  RUNNABLE 56.43%     1.61%
    456 SharedPool-Worker-25                 RUNNABLE 55.25%     1.06%
    483 SharedPool-Worker-40                 RUNNABLE 53.06%     1.04%
    475 SharedPool-Worker-53                 RUNNABLE 52.31%     1.03%
    464 SharedPool-Worker-20                 RUNNABLE 52.00%     1.11%
    577 SharedPool-Worker-71                 RUNNABLE 51.73%     1.02%
    404 SharedPool-Worker-10                 RUNNABLE 51.10%     1.29%
    486 SharedPool-Worker-34                 RUNNABLE 51.06%     1.03%
 Note: Only top 10 threads (according cpu load) are shown!


> On 12 Feb 2016, at 18:14, Julien Anguenot <jul...@anguenot.org> wrote:
> 
> At the time when the load is high and you have to restart, do you see any 
> pending compactions when using `nodetool compactionstats`?
> 
> Possible to see a `nodetool compactionstats` taken *when* the load is too 
> high?  Have you checked the size of your SSTables for that big table? Any 
> large ones in there?  What about the Java HEAP configuration on these nodes?
> 
> If you have too many tombstones I would try to decrease gc_grace_seconds so 
> they get cleared out earlier during compactions.
> 
>   J.
> 
>> On Feb 12, 2016, at 8:45 AM, Skvazh Roman <r...@skvazh.com> wrote:
>> 
>> There is 1-4 compactions at that moment.
>> We have many tombstones, which does not removed.
>> DroppableTombstoneRatio is 5-6 (greater than 1)
>> 
>>> On 12 Feb 2016, at 15:53, Julien Anguenot <jul...@anguenot.org> wrote:
>>> 
>>> Hey, 
>>> 
>>> What about compactions count when that is happening?
>>> 
>>> J.
>>> 
>>> 
>>>> On Feb 12, 2016, at 3:06 AM, Skvazh Roman <r...@skvazh.com> wrote:
>>>> 
>>>> Hello!
>>>> We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with attached 
>>>> 1.5 TB 4000 PIOPS EBS drive.
>>>> Sometimes one or two nodes user cpu spikes to 100%, load average to 20-30 
>>>> - read requests drops of.
>>>> Only restart of this cassandra services helps.
>>>> Please advice.
>>>> 
>>>> One big table with wide rows. 600 Gb per node.
>>>> LZ4Compressor
>>>> LeveledCompaction
>>>> 
>>>> concurrent compactors: 4
>>>> compactor throughput: tried from 16 to 128
>>>> Concurrent_readers: from 16 to 32
>>>> Concurrent_writers: 128
>>>> 
>>>> 
>>>> https://gist.github.com/rskvazh/de916327779b98a437a6
>>>> 
>>>> 
>>>> JvmTop 0.8.0 alpha - 06:51:10,  amd64, 16 cpus, Linux 3.14.44-3, load avg 
>>>> 19.35
>>>> http://code.google.com/p/jvmtop
>>>> 
>>>> Profiling PID 9256: org.apache.cassandra.service.CassandraDa
>>>> 
>>>> 95.73% (     4.31s) 
>>>> ....google.common.collect.AbstractIterator.tryToComputeN()
>>>> 1.39% (     0.06s) com.google.common.base.Objects.hashCode()
>>>> 1.26% (     0.06s) io.netty.channel.epoll.Native.epollWait()
>>>> 0.85% (     0.04s) net.jpountz.lz4.LZ4JNI.LZ4_compress_limitedOutput()
>>>> 0.46% (     0.02s) net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast()
>>>> 0.26% (     0.01s) com.google.common.collect.Iterators$7.computeNext()
>>>> 0.06% (     0.00s) io.netty.channel.epoll.Native.eventFdWrite()
>>>> 
>>>> 
>>>> ttop:
>>>> 
>>>> 2016-02-12T08:20:25.605+0000 Process summary
>>>> process cpu=1565.15%
>>>> application cpu=1314.48% (user=1354.48% sys=-40.00%)
>>>> other: cpu=250.67%
>>>> heap allocation rate 146mb/s
>>>> [000405] user=76.25% sys=-0.54% alloc=     0b/s - SharedPool-Worker-9
>>>> [000457] user=75.54% sys=-1.26% alloc=     0b/s - SharedPool-Worker-14
>>>> [000451] user=73.52% sys= 0.29% alloc=     0b/s - SharedPool-Worker-16
>>>> [000311] user=76.45% sys=-2.99% alloc=     0b/s - SharedPool-Worker-4
>>>> [000389] user=70.69% sys= 2.62% alloc=     0b/s - SharedPool-Worker-6
>>>> [000388] user=86.95% sys=-14.28% alloc=     0b/s - SharedPool-Worker-5
>>>> [000404] user=70.69% sys= 0.10% alloc=     0b/s - SharedPool-Worker-8
>>>> [000390] user=72.61% sys=-1.82% alloc=     0b/s - SharedPool-Worker-7
>>>> [000255] user=87.86% sys=-17.87% alloc=     0b/s - SharedPool-Worker-1
>>>> [000444] user=72.21% sys=-2.30% alloc=     0b/s - SharedPool-Worker-12
>>>> [000310] user=71.50% sys=-2.31% alloc=     0b/s - SharedPool-Worker-3
>>>> [000445] user=69.68% sys=-0.83% alloc=     0b/s - SharedPool-Worker-13
>>>> [000406] user=72.61% sys=-4.40% alloc=     0b/s - SharedPool-Worker-10
>>>> [000446] user=69.78% sys=-1.65% alloc=     0b/s - SharedPool-Worker-11
>>>> [000452] user=66.86% sys= 0.22% alloc=     0b/s - SharedPool-Worker-15
>>>> [000256] user=69.08% sys=-2.42% alloc=     0b/s - SharedPool-Worker-2
>>>> [004496] user=29.99% sys= 0.59% alloc=   30mb/s - CompactionExecutor:15
>>>> [004906] user=29.49% sys= 0.74% alloc=   39mb/s - CompactionExecutor:16
>>>> [010143] user=28.58% sys= 0.25% alloc=   26mb/s - CompactionExecutor:17
>>>> [000785] user=27.87% sys= 0.70% alloc=   38mb/s - CompactionExecutor:12
>>>> [012723] user= 9.09% sys= 2.46% alloc= 2977kb/s - RMI TCP 
>>>> Connection(2673)-127.0.0.1
>>>> [000555] user= 5.35% sys=-0.08% alloc=  474kb/s - SharedPool-Worker-24
>>>> [000560] user= 3.94% sys= 0.07% alloc=  434kb/s - SharedPool-Worker-22
>>>> [000557] user= 3.94% sys=-0.17% alloc=  339kb/s - SharedPool-Worker-25
>>>> [000447] user= 2.73% sys= 0.60% alloc=  436kb/s - SharedPool-Worker-19
>>>> [000563] user= 3.33% sys=-0.04% alloc=  460kb/s - SharedPool-Worker-20
>>>> [000448] user= 2.73% sys= 0.27% alloc=  414kb/s - SharedPool-Worker-21
>>>> [000554] user= 1.72% sys= 0.70% alloc=  232kb/s - SharedPool-Worker-26
>>>> [000558] user= 1.41% sys= 0.39% alloc=  213kb/s - SharedPool-Worker-23
>>>> [000450] user= 1.41% sys=-0.03% alloc=  158kb/s - SharedPool-Worker-17
>>> 
> 
> 
>

Re: Cassandra eats all cpu cores, high load average

Reply via email to