Anything in the logs? It *could* be https://issues.apache.org/jira/browse/CASSANDRA-13873
On Tue, Oct 24, 2017 at 11:18 PM, Sotirios Delimanolis < sotodel...@yahoo.com.invalid> wrote: > On a Cassandra 2.2.11 cluster, I noticed estimated compactions > accumulating on one node. nodetool compactionstats showed the following: > > compaction type keyspace table completed > total unit progress > Compaction ks1 some_table 204.68 MB > 204.98 MB bytes 99.86% > Index summary redistribution *null* *null* 457.72 > KB 950 MB bytes *0.05%* > Compaction ks1 some_table 461.61 MB > 461.95 MB bytes 99.93% > Tombstone Compaction ks1 some_table 618.34 MB > 618.47 MB bytes 99.98% > Compaction ks1 some_table 378.37 MB > 380 MB bytes 99.57% > Tombstone Compaction ks1 some_table 326.51 MB > 327.63 MB bytes 99.66% > Tombstone Compaction ks2 other_table 29.38 MB > 29.38 MB bytes 100.00% > Tombstone Compaction ks1 some_table 503.4 MB > 507.28 MB bytes 99.24% > Compaction ks1 some_table 353.44 MB > 353.47 MB bytes 99.99% > > > They had been like this for a while (all different tables). A thread dump > showed all 8 CompactionExecutor threads looking like > > "CompactionExecutor:6" #84 daemon prio=1 os_prio=4 tid=0x00007f5771172000 > nid=0x7646 waiting on condition [0x00007f578847b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000005fe5656e8> (a > com.google.common.util.concurrent.AbstractFuture$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport. > java:175) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at com.google.common.util.concurrent.AbstractFuture$ > Sync.get(AbstractFuture.java:285) > at com.google.common.util.concurrent.AbstractFuture.get( > AbstractFuture.java:116) > at org.apache.cassandra.utils.FBUtilities.waitOnFuture( > FBUtilities.java:390) > at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush( > SystemKeyspace.java:593) > at org.apache.cassandra.db.SystemKeyspace.finishCompaction( > SystemKeyspace.java:368) > at org.apache.cassandra.db.compaction.CompactionTask. > runMayThrow(CompactionTask.java:205) > at org.apache.cassandra.utils.WrappedRunnable.run( > WrappedRunnable.java:28) > at org.apache.cassandra.db.compaction.CompactionTask. > executeInternal(CompactionTask.java:74) > at org.apache.cassandra.db.compaction.AbstractCompactionTask. > execute(AbstractCompactionTask.java:80) > at org.apache.cassandra.db.compaction.CompactionManager$ > BackgroundCompactionCandidate.run(CompactionManager.java:257) > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > A MemtablePostFlush thread was awaiting some flush count down latch > > "MemtablePostFlush:1" #30 daemon prio=5 os_prio=0 tid=0x00007f57705dac00 > nid=0x75bf waiting on condition [0x00007f578a8fb000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000573da6c90> (a > java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport. > java:175) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at java.util.concurrent.CountDownLatch.await( > CountDownLatch.java:231) > at org.apache.cassandra.db.ColumnFamilyStore$PostFlush. > call(ColumnFamilyStore.java:1073) > at org.apache.cassandra.db.ColumnFamilyStore$PostFlush. > call(ColumnFamilyStore.java:1026) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > The 4 MemtableFlushWriter threads were all RUNNABLE, sorting something in > IntervalTree. Finally, the IndexSummaryManager thread was also RUNNABLE: > > "IndexSummaryManager:1" #1463 daemon prio=1 os_prio=4 > tid=0x00007f577139b000 nid=0x8100 runnable [0x00007f5726f6c000] > java.lang.Thread.State: RUNNABLE > at com.google.common.collect.ImmutableSet.construct( > ImmutableSet.java:206) > at com.google.common.collect.ImmutableSet.copyOf( > ImmutableSet.java:375) > at org.apache.cassandra.db.lifecycle.Helpers.replace( > Helpers.java:43) > at org.apache.cassandra.db.lifecycle.View$2.apply(View.java:166) > at org.apache.cassandra.db.lifecycle.View$2.apply(View.java:161) > at org.apache.cassandra.db.lifecycle.Tracker.apply( > Tracker.java:138) > at org.apache.cassandra.db.lifecycle.Tracker.apply( > Tracker.java:111) > at org.apache.cassandra.db.lifecycle.Tracker.apply( > Tracker.java:118) > at org.apache.cassandra.db.lifecycle.LifecycleTransaction. > unmarkCompacting(LifecycleTransaction.java:445) > at org.apache.cassandra.db.lifecycle.LifecycleTransaction.cancel( > LifecycleTransaction.java:400) > at org.apache.cassandra.io.sstable.IndexSummaryRedistribution. > adjustSamplingLevels(IndexSummaryRedistribution.java:230) > at org.apache.cassandra.io.sstable.IndexSummaryRedistribution. > redistributeSummaries(IndexSummaryRedistribution.java:126) > at org.apache.cassandra.db.compaction.CompactionManager. > runIndexSummaryRedistribution(CompactionManager.java:1400) > at org.apache.cassandra.io.sstable.IndexSummaryManager. > redistributeSummaries(IndexSummaryManager.java:250) > at org.apache.cassandra.io.sstable.IndexSummaryManager. > redistributeSummaries(IndexSummaryManager.java:228) > at org.apache.cassandra.io.sstable.IndexSummaryManager$1. > runMayThrow(IndexSummaryManager.java:125) > at org.apache.cassandra.utils.WrappedRunnable.run( > WrappedRunnable.java:28) > at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolE > xecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolE > xecutor.java:118) > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset( > FutureTask.java:308) > at java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > How should I interpret these? What flushing behavior is blocking the > compactions? > >