Hi After change the parameter of concurrent compactor, we can limit Cassandra to use 100% of one core at that moment. (concurrent_compactors: 1)
And I got the stack of the "crazy" thread, it last 2~3 minutes, on same stack. Any clue of this issue? Thread 18114: (state = IN_JAVA) - java.util.AbstractList$Itr.hasNext() @bci=8, line=339 (Compiled frame; information may be imprecise) - org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(org.apache.cassandra.db.ColumnFamily, int) @bci=6, line=841 (Compiled frame) - org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(org.apache.cassandra.db.ColumnFamily, int) @bci=17, line=835 (Compiled frame) - org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(org.apache.cassandra.db.ColumnFamily, int) @bci=8, line=826 (Compiled frame) - org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(org.apache.cassandra.db.DecoratedKey, org.apache.cassandra.db.compaction.CompactionController, org.apache.cassandra.db.ColumnFamily) @bci=38, line=77 (Compiled frame) - org.apache.cassandra.db.compaction.PrecompactedRow.<init>(org.apache.cassandra.db.compaction.CompactionController, java.util.List) @bci=33, line=102 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(java.util.List) @bci=223, line=133 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced() @bci=44, line=102 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced() @bci=1, line=87 (Compiled frame) - org.apache.cassandra.utils.MergeIterator$ManyToOne.consume() @bci=88, line=116 (Compiled frame) - org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext() @bci=5, line=99 (Compiled frame) - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, line=140 (Compiled frame) - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135 (Compiled frame) - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=614 (Compiled frame) - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, line=140 (Compiled frame) - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector) @bci=542, line=141 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=117, line=134 (Interpreted frame) - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=1, line=114 (Interpreted frame) - java.util.concurrent.FutureTask$Sync.innerRun() @bci=30, line=303 (Interpreted frame) - java.util.concurrent.FutureTask.run() @bci=4, line=138 (Interpreted frame) - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Compiled frame) - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame) BRs //Jason 2012/7/11 Jason Tang <ares.t...@gmail.com> > Hi > > I encounter the High CPU problem, Cassandra 1.0.3, happened on both > sized and leveled compaction, 6G heap, 64bit Oracle java. For normal > traffic, Cassandra will use 15% CPU. > > But every half a hour, Cassandra will use almost 100% total cpu (SUSE, > 12 Core). > > And here is the top information for that moment. > > #top -H -p 12451 > > top - 12:30:14 up 15 days, 12:49, 6 users, load average: 10.52, 8.92, > 8.14 > Tasks: 706 total, 21 running, 685 sleeping, 0 stopped, 0 zombie > Cpu(s): 25.7%us, 14.0%sy, 48.9%ni, 6.5%id, 0.0%wa, 0.0%hi, 4.9%si, > 0.0%st > Mem: 24150M total, 12218M used, 11932M free, 142M buffers > Swap: 0M total, 0M used, 0M free, 3714M cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 20291 casadm 24 4 8003m 5.4g 167m R 92 22.7 0:42.46 java > 20276 casadm 24 4 8003m 5.4g 167m R 88 22.7 0:43.88 java > 20181 casadm 24 4 8003m 5.4g 167m R 86 22.7 0:52.97 java > 20213 casadm 24 4 8003m 5.4g 167m R 85 22.7 0:49.21 java > 20188 casadm 24 4 8003m 5.4g 167m R 82 22.7 0:54.34 java > 20268 casadm 24 4 8003m 5.4g 167m R 81 22.7 0:46.25 java > 20269 casadm 24 4 8003m 5.4g 167m R 41 22.7 0:15.11 java > 20316 casadm 24 4 8003m 5.4g 167m S 20 22.7 0:02.35 java > 20191 casadm 24 4 8003m 5.4g 167m R 15 22.7 0:16.85 java > 12500 casadm 20 0 8003m 5.4g 167m R 6 22.7 1:07.86 java > 15245 casadm 20 0 8003m 5.4g 167m D 5 22.7 0:36.45 java > > Jstack can not print the stack. > Thread 20291: (state = IN_JAVA) > Error occurred during stack walking: > ... > Thread 20276: (state = IN_JAVA) > Error occurred during stack walking: > > After it come back, the stack shows: > Thread 20291: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, > long) @bci=20, line=196 (Compiled frame) > - > java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode, > boolean, long) @bci=174, line=424 (Compiled frame) > - > java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object, > boolean, long) @bci=102, line=323 (Compiled frame) > - java.util.concurrent.SynchronousQueue.poll(long, > java.util.concurrent.TimeUnit) @bci=11, line=874 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=62, line=945 > (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=18, line=907 > (Compiled frame) > - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame > > And after this happened, the data is not correct, some > large column which suppose to be deleted, come back again. > Here is the suspect thread when it use up 100% > Thread 20191: (state = IN_VM) > - sun.misc.Unsafe.unpark(java.lang.Object) @bci=0 (Compiled frame; > information may be imprecise) > - java.util.concurrent.locks.LockSupport.unpark(java.lang.Thread) @bci=8, > line=122 (Compiled frame) > - > java.util.concurrent.SynchronousQueue$TransferStack$SNode.tryMatch(java.util.concurrent.SynchronousQueue$TransferStack$SNode) > @bci=34, line=242 (Compiled frame) > - > java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object, > boolean, long) @bci=268, line=344 (Compiled frame) > - java.util.concurrent.SynchronousQueue.offer(java.lang.Object) @bci=19, > line=846 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor.execute(java.lang.Runnable) > @bci=43, line=653 (Compiled frame) > - > java.util.concurrent.AbstractExecutorService.submit(java.util.concurrent.Callable) > @bci=20, line=92 (Compiled frame) > - > org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getCompactedRow(java.util.List) > @bci=86, line=190 (Compiled frame) - > org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced() > @bci=31, line=164 (Compiled frame) > - > org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced() > @bci=1, line=144 (Compiled frame) > - org.apache.cassandra.utils.MergeIterator$ManyToOne.consume() @bci=88, > line=116 (Compiled frame) > - org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext() > @bci=5, line=99 (Compiled frame) > - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, > line=140 (Compiled frame) > - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135 > (Compiled frame) > - > org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext() > @bci=4, line=103 (Compiled frame) > - > org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext() > @bci=1, line=90 (Compiled frame) > - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, > line=140 (Compiled frame) > - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135 > (Compiled frame) > - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=614 > (Compiled frame) > - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, > line=140 (Compiled frame) > - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135 > (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector) > @bci=772, line=172 (Compiled frame) > - > org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector) > @bci=2, line=57 (Interpreted frame) > - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=117, > line=134 (Interpreted frame) > - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=1, > line=114 (Interpreted frame) > - java.util.concurrent.FutureTask$Sync.innerRun() @bci=30, line=303 > (Compiled frame) > - java.util.concurrent.FutureTask.run() @bci=4, line=138 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) > @bci=59, line=886 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 > (Compiled frame) > - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame) > > Thread 20269: (state = BLOCKED) > - org.apache.cassandra.utils.obs.OpenBitSet.<init>(long, boolean) > @bci=51, line=104 (Compiled frame) > - org.apache.cassandra.utils.obs.OpenBitSet.<init>(long) @bci=3, line=92 > (Compiled frame) > - org.apache.cassandra.utils.BloomFilter.bucketsFor(long, int) @bci=12, > line=54 (Compiled frame) > - org.apache.cassandra.utils.BloomFilter.getFilter(long, int) @bci=110, > line=73 (Compiled frame) > - > org.apache.cassandra.db.ColumnIndexer.serialize(org.apache.cassandra.io.util.IIterableColumns) > @bci=10, line=83 (Compiled frame) > - > org.apache.cassandra.db.ColumnIndexer.serialize(org.apache.cassandra.io.util.IIterableColumns, > java.io.DataOutput) @bci=5, line=51 (Compiled frame) > - > org.apache.cassandra.db.compaction.PrecompactedRow.write(java.io.DataOutput) > @bci=42, line=140 (Compiled frame) > - > org.apache.cassandra.io.sstable.SSTableWriter.append(org.apache.cassandra.db.compaction.AbstractCompactedRow) > @bci=43, line=160 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector) > @bci=685, line=158 (Compiled frame) > - > org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector) > @bci=2, line=57 (Interpreted frame) > - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=117, > line=134 (Interpreted frame) > - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=1, > line=114 (Interpreted frame) > - java.util.concurrent.FutureTask$Sync.innerRun() @bci=30, line=303 > (Compiled frame) > - java.util.concurrent.FutureTask.run() @bci=4, line=138 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) > @bci=59, line=886 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 > (Compiled frame) > - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame) > > > BRs > //Tang Weiqiang > >