[jira] [Updated] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Loic Lambiel updated CASSANDRA-13948: - Attachment: threaddump-cleanup.txt Compactions are now processing as expected with your latest patch :) However I'm facing an issue with nodetool cleanup, dunno if it is related or not. Starting a cleanup cancel the ongoing compactions (which is expected from my understanding) and then get lost. Not performing any cleanup nor processing the pending compactions. 1 thread is using 100% of a core all the time. The log doesn't show any error. I've attached a thread dump. I'm happy to open another Jira if it's not related. > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 3.11.x, 4.x > > Attachments: debug.log, threaddump-cleanup.txt, threaddump.txt, > trace.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame)
[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240341#comment-16240341 ] Loic Lambiel commented on CASSANDRA-13948: -- Thanks [~krummas] The patches from [~pauloricardomg] already reduced the startup time by at least 10x > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 3.11.x, 4.x > > Attachments: debug.log, threaddump.txt, trace.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 > (Interpreted frame) > - > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) > @bci=1, line=81 (Interpreted frame) > - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 > (Interpreted frame) > - java.lang.Thread.run() @bci=11, line=748 (Compiled frame) >
[jira] [Updated] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Loic Lambiel updated CASSANDRA-13948: - Attachment: threaddump.txt > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Major > Fix For: 3.11.x, 4.x > > Attachments: debug.log, threaddump.txt, trace.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 > (Interpreted frame) > - > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) > @bci=1, line=81 (Interpreted frame) > - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 > (Interpreted frame) > - java.lang.Thread.run() @bci=11, line=748 (Compiled frame) > {noformat} > {noformat} > Thread 94573: (state = IN_JAVA) > -
[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238384#comment-16238384 ] Loic Lambiel commented on CASSANDRA-13948: -- I've deployed the patch on a few big nodes. I've not seen the error popping up so far. However I'm still facing issues with compactions. These are big nodes with with a big CF, holding many SSTables and pending compactions. According the thread dump it seems to be stuck around getNextBackgroundTask. Compactions are still being processed for the other keyspace. Beside that the node is running normally. Some nodetool commands takes time to proceed like compactionstats. Debug log doesn't show any error. {code:java} CREATE TABLE blobstore.block ( inode uuid, version timeuuid, block bigint, offset bigint, chunksize int, payload blob, PRIMARY KEY ((inode, version, block), offset) ) WITH CLUSTERING ORDER BY (offset ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'enabled': 'true', 'tombstone_compaction_interval': '60', 'tombstone_threshold': '0.2', 'unchecked_tombstone_compaction': 'false'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 172000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE'; {code} {code:java} Keyspace : blobstore Read Count: 97019 Read Latency: 2.4842547026871027 ms. Write Count: 472590 Write Latency: 0.060107954040500226 ms. Pending Flushes: 0 Table: block SSTable count: 43373 SSTables in each level: [18890/4, 115/10, 198/100, 1905/1000, 9451, 12814, 0, 0, 0] Space used (live): 4839933810943 Space used (total): 4839933815913 Space used by snapshots (total): 0 Off heap memory used (total): 3273703284 SSTable Compression Ratio: 0.9416884172984209 Number of partitions (estimate): 2925826 Memtable cell count: 41542 Memtable data size: 2631688187 Memtable off heap memory used: 2638649871 Memtable switch count: 7 Local read count: 87281 Local read latency: 2.186 ms Local write count: 465591 Local write latency: 0.124 ms Pending flushes: 0 Percent repaired: 4.01 Bloom filter false positives: 297882 Bloom filter false ratio: 0.69198 Bloom filter space used: 5111208 Bloom filter off heap memory used: 4764232 Index summary off heap memory used: 3360917 Compression metadata off heap memory used: 626928264 Compacted partition minimum bytes: 61 Compacted partition maximum bytes: 186563160 Compacted partition mean bytes: 1797922 Average live cells per slice (last five minutes): 8.641592920353983 Maximum live cells per slice (last five minutes): 258 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1 Dropped Mutations: 0 {code} {code:java} nodetool compactionstats pending tasks: 3362 - blobstore.block: 3362 {code} > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Major > Fix For: 3.11.x, 4.x > > Attachments: debug.log, trace.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - >
[jira] [Updated] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Loic Lambiel updated CASSANDRA-13948: - Attachment: trace.log Ok I was able to reproduce it on one node. I've attached the trace log. It's unfiltered since I didn't managed to filter only to org.apache.cassandra.db.compaction > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Major > Fix For: 3.11.x, 4.x > > Attachments: debug.log, trace.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 > (Interpreted frame) > - > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) > @bci=1, line=81 (Interpreted frame) > - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 > (Interpreted frame) >
[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236514#comment-16236514 ] Loic Lambiel commented on CASSANDRA-13948: -- Did additional testing and wasn't able to reproduce :-/ I'll try the patch on more representative nodes in the coming days and report back any issue. > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Major > Fix For: 3.11.x, 4.x > > Attachments: debug.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 > (Interpreted frame) > - > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) > @bci=1, line=81 (Interpreted frame) > - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 > (Interpreted frame) > -
[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234902#comment-16234902 ] Loic Lambiel commented on CASSANDRA-13948: -- The strategy is LCS, node with 9 data locations. The error is repeating frequently. I'll attach a debug log tomorrow and make some additional tests. > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Major > Fix For: 3.11.x, 4.x > > Attachments: debug.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 > (Interpreted frame) > - > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) > @bci=1, line=81 (Interpreted frame) > - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 > (Interpreted frame) > -
[jira] [Comment Edited] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234460#comment-16234460 ] Loic Lambiel edited comment on CASSANDRA-13948 at 11/1/17 5:55 PM: --- Yes [~krummas], build including the latest patch. was (Author: llambiel): Yes @krummas, build including the latest patch. > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Major > Fix For: 3.11.x, 4.x > > Attachments: debug.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 > (Interpreted frame) > - > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) > @bci=1, line=81 (Interpreted frame) > - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 > (Interpreted
[jira] [Comment Edited] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234460#comment-16234460 ] Loic Lambiel edited comment on CASSANDRA-13948 at 11/1/17 5:44 PM: --- Yes @krummas, build including the latest patch. was (Author: llambiel): Yes, build including the latest patch. > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Major > Fix For: 3.11.x, 4.x > > Attachments: debug.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 > (Interpreted frame) > - > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) > @bci=1, line=81 (Interpreted frame) > - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 > (Interpreted frame) > -
[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234460#comment-16234460 ] Loic Lambiel commented on CASSANDRA-13948: -- Yes, build including the latest patch. > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Major > Fix For: 3.11.x, 4.x > > Attachments: debug.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 > (Interpreted frame) > - > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) > @bci=1, line=81 (Interpreted frame) > - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 > (Interpreted frame) > - java.lang.Thread.run() @bci=11, line=748 (Compiled frame) > {noformat} > {noformat} > Thread 94573: (state = IN_JAVA) > -
[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234385#comment-16234385 ] Loic Lambiel commented on CASSANDRA-13948: -- I tried your patch on 3.11.2 and got the following errors: {code:java} ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,397 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@51f52b91) to class org.apache.cassandra.io.util.MmappedRegions$Tidier@1358582595:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103504-big-Data.db was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,413 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@70a08046) to class org.apache.cassandra.io.util.MmappedRegions$Tidier@632323950:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103503-big-Data.db was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,413 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@5d6161ea) to class org.apache.cassandra.io.util.FileHandle$Cleanup@1594052942:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103502-big-Index.db was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,429 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@2bd55858) to class org.apache.cassandra.io.util.MmappedRegions$Tidier@230164803:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103502-big-Data.db was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,429 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@5b00472f) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@508355616:Memory@[7f6b54130b10..7f6b54136f10) was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,429 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@1f5a7829) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@1390774416:[Memory@[0..20), Memory@[0..240)] was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@b8594dc) to class org.apache.cassandra.io.util.FileHandle$Cleanup@1913719912:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103503-big-Index.db was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@3ec6a933) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1083770739:Memory@[7f6b5453ff30..7f6b54546330) was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@671d48c8) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@496335375:[Memory@[0..20), Memory@[0..240)] was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@1611d7bf) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@515635345:Memory@[7f6b540f7dc0..7f6b540fe1c0) was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@136db886) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@622788070:[Memory@[0..20), Memory@[0..240)] was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@3daa7ad5) to class org.apache.cassandra.io.util.FileHandle$Cleanup@2090103425:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103504-big-Index.db was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@2435c1d) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1348493438:Memory@[7f6b546ebd80..7f6b546f2180) was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2017-11-01
[jira] [Commented] (CASSANDRA-13980) Compaction deadlock
[ https://issues.apache.org/jira/browse/CASSANDRA-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224401#comment-16224401 ] Loic Lambiel commented on CASSANDRA-13980: -- Yes, 4. > Compaction deadlock > --- > > Key: CASSANDRA-13980 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13980 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.11.2 > 48 nodes cluster using LCS, JBOD > Big nodes with many SSTables >Reporter: Loic Lambiel >Priority: Critical > Fix For: 3.11.x > > Attachments: threaddump.log > > > While upgrading the cluster from 2.1.16 from 3.11.2, after a few hours most > of the upgraded nodes started to go in a compaction infinite loop and showing > many events like the one below (always for the same SSTable): > {code:java} > INFO [CompactionExecutor:4] 2017-10-29 00:28:31,480 LeveledManifest.java:474 > - Adding high-level (L5) > BigTableReader(path='/var/lib/cassandra/data/datadisk4/blobstore/block-1d63273065b911e49cd7ef0972cffde6/blobstore-block-ka-201694-Data.db') > to candidates > {code} > Since the log get spammed at a huge rate, I'm unable to get any previous > events. > Tried restarts and sstablescrub -m without success. The only workaround that > seems to work (so far) was sstablelevelreset. > I've attached the dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13980) Compaction deadlock
Loic Lambiel created CASSANDRA-13980: Summary: Compaction deadlock Key: CASSANDRA-13980 URL: https://issues.apache.org/jira/browse/CASSANDRA-13980 Project: Cassandra Issue Type: Bug Components: Compaction Environment: Cassandra 3.11.2 48 nodes cluster using LCS, JBOD Big nodes with many SSTables Reporter: Loic Lambiel Priority: Critical Fix For: 3.11.x Attachments: threaddump.log While upgrading the cluster from 2.1.16 from 3.11.2, after a few hours most of the upgraded nodes started to go in a compaction infinite loop and showing many events like the one below (always for the same SSTable): {code:java} INFO [CompactionExecutor:4] 2017-10-29 00:28:31,480 LeveledManifest.java:474 - Adding high-level (L5) BigTableReader(path='/var/lib/cassandra/data/datadisk4/blobstore/block-1d63273065b911e49cd7ef0972cffde6/blobstore-block-ka-201694-Data.db') to candidates {code} Since the log get spammed at a huge rate, I'm unable to get any previous events. Tried restarts and sstablescrub -m without success. The only workaround that seems to work (so far) was sstablelevelreset. I've attached the dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Loic Lambiel updated CASSANDRA-9625: Attachment: thread-dump2.log New thread dump attached > GraphiteReporter not reporting > -- > > Key: CASSANDRA-9625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 > Project: Cassandra > Issue Type: Bug > Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 >Reporter: Eric Evans >Assignee: Stefan Podkowinski > Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, > thread-dump.log, thread-dump2.log > > > When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops > working. The usual startup is logged, and one batch of samples is sent, but > the reporting interval comes and goes, and no other samples are ever sent. > The logs are free from errors. > Frustratingly, metrics reporting works in our smaller (staging) environment > on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not > on a 3 node (otherwise identical) staging cluster (maybe it takes a certain > level of concurrency?). > Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760731#comment-15760731 ] Loic Lambiel commented on CASSANDRA-9625: - Well, after running the patch for 4 days, most of the graphite reporter threads are dead. The patch does not seems to fix the issue. :-( > GraphiteReporter not reporting > -- > > Key: CASSANDRA-9625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 > Project: Cassandra > Issue Type: Bug > Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 >Reporter: Eric Evans >Assignee: Stefan Podkowinski > Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, > thread-dump.log > > > When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops > working. The usual startup is logged, and one batch of samples is sent, but > the reporting interval comes and goes, and no other samples are ever sent. > The logs are free from errors. > Frustratingly, metrics reporting works in our smaller (staging) environment > on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not > on a 3 node (otherwise identical) staging cluster (maybe it takes a certain > level of concurrency?). > Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753883#comment-15753883 ] Loic Lambiel commented on CASSANDRA-9625: - Running the patch on 2.1.16 production cluster flawlessly since 24 hours, many thanks ! > GraphiteReporter not reporting > -- > > Key: CASSANDRA-9625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 > Project: Cassandra > Issue Type: Bug > Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 >Reporter: Eric Evans >Assignee: T Jake Luciani > Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, > thread-dump.log > > > When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops > working. The usual startup is logged, and one batch of samples is sent, but > the reporting interval comes and goes, and no other samples are ever sent. > The logs are free from errors. > Frustratingly, metrics reporting works in our smaller (staging) environment > on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not > on a 3 node (otherwise identical) staging cluster (maybe it takes a certain > level of concurrency?). > Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
[ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657238#comment-15657238 ] Loic Lambiel commented on CASSANDRA-9754: - Any update on your ongoing tests [~mkjellman] ? > Make index info heap friendly for large CQL partitions > -- > > Key: CASSANDRA-9754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9754 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Michael Kjellman >Priority: Minor > Fix For: 4.x > > Attachments: 0f8e28c220fd5af6c7b5dd2d3dab6936c4aa4b6b.patch, > gc_collection_times_with_birch.png, gc_collection_times_without_birch.png, > gc_counts_with_birch.png, gc_counts_without_birch.png, > perf_cluster_1_with_birch_read_latency_and_counts.png, > perf_cluster_1_with_birch_write_latency_and_counts.png, > perf_cluster_2_with_birch_read_latency_and_counts.png, > perf_cluster_2_with_birch_write_latency_and_counts.png, > perf_cluster_3_without_birch_read_latency_and_counts.png, > perf_cluster_3_without_birch_write_latency_and_counts.png > > > Looking at a heap dump of 2.0 cluster, I found that majority of the objects > are IndexInfo and its ByteBuffers. This is specially bad in endpoints with > large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K > IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for > GC. Can this be improved by not creating so many objects? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588224#comment-15588224 ] Loic Lambiel commented on CASSANDRA-9625: - Same issue with 2.1.16 > GraphiteReporter not reporting > -- > > Key: CASSANDRA-9625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 > Project: Cassandra > Issue Type: Bug > Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 >Reporter: Eric Evans >Assignee: T Jake Luciani > Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, > thread-dump.log > > > When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops > working. The usual startup is logged, and one batch of samples is sent, but > the reporting interval comes and goes, and no other samples are ever sent. > The logs are free from errors. > Frustratingly, metrics reporting works in our smaller (staging) environment > on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not > on a 3 node (otherwise identical) staging cluster (maybe it takes a certain > level of concurrency?). > Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457863#comment-15457863 ] Loic Lambiel commented on CASSANDRA-9625: - How can we go ahead and fix this annoying bug ? Running Cassandra 2.1.13, it happens randomly when there's a certain amount of compactions queued / running on nodes. There's nothing in the log at the time it stops reporting the metrics. It happens also when there's no repair in progress. > GraphiteReporter not reporting > -- > > Key: CASSANDRA-9625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 > Project: Cassandra > Issue Type: Bug > Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 >Reporter: Eric Evans >Assignee: T Jake Luciani > Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, > thread-dump.log > > > When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops > working. The usual startup is logged, and one batch of samples is sent, but > the reporting interval comes and goes, and no other samples are ever sent. > The logs are free from errors. > Frustratingly, metrics reporting works in our smaller (staging) environment > on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not > on a 3 node (otherwise identical) staging cluster (maybe it takes a certain > level of concurrency?). > Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631408#comment-14631408 ] Loic Lambiel commented on CASSANDRA-9683: - That stats comes from Opscenter. We're using durable_writes = false in the blobstore Keyspace, were most data are written. This may explain the low write latency. I'm going to try to reproduce this on a new single node setup as I don't want to kill this cluster. I'll do it in the coming days, as soon as I have the pipe. We're using Cassandra as a backend for our object storage service based on our pithos (http://pithos.io) API frontend. Data can then be uploaded using any S3 compatible tools like s3cmd. I don't know if you want to go into such setup. (we could help or do it remotely if needed). I don't know either how to reproduce the data pattern and usage without it. Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Assignee: Ariel Weisberg Fix For: 2.1.x Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, os_load.png, pending_compactions.png, read_latency.png, schema.txt, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625905#comment-14625905 ] Loic Lambiel commented on CASSANDRA-9683: - 2.1.8 does have the same performance issue that is addressed by nodetool disableautocompaction. 2.1.8 compiled without https://issues.apache.org/jira/browse/CASSANDRA-9592 does not have the performance issue. The issue is then related to https://issues.apache.org/jira/browse/CASSANDRA-9592 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Assignee: Ariel Weisberg Fix For: 2.1.x Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, os_load.png, pending_compactions.png, read_latency.png, schema.txt, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620044#comment-14620044 ] Loic Lambiel commented on CASSANDRA-9683: - 2.1.8-tentative compiled without https://issues.apache.org/jira/browse/CASSANDRA-9592 is running just fine. What could be the next steps ? Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Assignee: Ariel Weisberg Fix For: 2.1.x Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, os_load.png, pending_compactions.png, read_latency.png, schema.txt, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618168#comment-14618168 ] Loic Lambiel commented on CASSANDRA-9683: - Sorry for the response delay. We're running a 3 nodes cluster with C* 2.1.8-tentative version on 1 DC. The hosts are physical and connected on the same network switch: - Ubuntu 12.04 (3.13 Kernel) - RAM 32GB - 4 Cores (+4 HT) - SSD for OS and logs - 8* JBOD SATA drives for data - Java(TM) SE Runtime Environment (build 1.7.0_80-b15) cat /etc/default/cassandra: MAX_HEAP_SIZE=8G HEAP_NEWSIZE=800m JVM_OPTS=$JVM_OPTS -Dcassandra.metricsReporterConfigFile=metrics-reporter-config.yaml We're using stock cassandra.env (attached) The consistency level is at quorum. On this cluster we're doing: - Write a few kb in a CF, read back the data and then delete, this every minute - Write and delete ~300mb once a day I'm going to perform some thread dump to see if I could find something. Thanks for your support ! Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Assignee: Ariel Weisberg Fix For: 2.1.x Attachments: cassandra.yaml, cfstats.txt, os_load.png, pending_compactions.png, read_latency.png, schema.txt, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Loic Lambiel updated CASSANDRA-9683: Attachment: cassandra-env.sh Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Assignee: Ariel Weisberg Fix For: 2.1.x Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, os_load.png, pending_compactions.png, read_latency.png, schema.txt, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618809#comment-14618809 ] Loic Lambiel commented on CASSANDRA-9683: - I tried to run nodetool disableautocompaction on all nodes and the load and latencies went down to 2.1.6 previous level. Issue related to https://issues.apache.org/jira/browse/CASSANDRA-9592 ? Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Assignee: Ariel Weisberg Fix For: 2.1.x Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, os_load.png, pending_compactions.png, read_latency.png, schema.txt, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619227#comment-14619227 ] Loic Lambiel commented on CASSANDRA-9683: - I'm going to compile 2.1.8-tentative without CASSANDRA-9592 to see if it's related Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Assignee: Ariel Weisberg Fix For: 2.1.x Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, os_load.png, pending_compactions.png, read_latency.png, schema.txt, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612874#comment-14612874 ] Loic Lambiel commented on CASSANDRA-9683: - Yes it is correct Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Assignee: Ariel Weisberg Fix For: 2.1.x Attachments: cassandra.yaml, cfstats.txt, os_load.png, pending_compactions.png, read_latency.png, schema.txt, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Loic Lambiel updated CASSANDRA-9683: Attachment: schema.txt cfstats.txt Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Assignee: Ariel Weisberg Fix For: 2.1.x Attachments: cassandra.yaml, cfstats.txt, os_load.png, pending_compactions.png, read_latency.png, schema.txt, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612053#comment-14612053 ] Loic Lambiel commented on CASSANDRA-9683: - I've attached the schema and cfstats so you could have a better idea. I didn't had the opportunity yet to test it on a single node config. FYI our config is logs on SSD and data on 8* JBOD SATA disks. This cluster serves our S3 compatible object storage (http://pithos.io). Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Assignee: Ariel Weisberg Fix For: 2.1.x Attachments: cassandra.yaml, cfstats.txt, os_load.png, pending_compactions.png, read_latency.png, schema.txt, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
Loic Lambiel created CASSANDRA-9683: --- Summary: Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Priority: Critical Attachments: os_load.png, pending_compactions.png, read_latency.png, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Loic Lambiel updated CASSANDRA-9683: Attachment: cassandra.yaml Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Priority: Critical Attachments: cassandra.yaml, os_load.png, pending_compactions.png, read_latency.png, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609210#comment-14609210 ] Loic Lambiel commented on CASSANDRA-9683: - I did a build against the 2.1 branch and upgraded the cluster. I do not see much changes, load is still high (1.5 to 2.2) without workload Also still pending compactions in the queue Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Assignee: Philip Thompson Fix For: 2.1.x Attachments: cassandra.yaml, os_load.png, pending_compactions.png, read_latency.png, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229587#comment-14229587 ] Loic Lambiel commented on CASSANDRA-8316: - Hi guys, Any chance to get this issue fixed for 2.1.3 ? On our side we face this issue on almost all incremental repairs Did not get positive replies from all endpoints error on incremental repair -- Key: CASSANDRA-8316 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316 Project: Cassandra Issue Type: Bug Components: Core Environment: cassandra 2.1.2 Reporter: Loic Lambiel Assignee: Alan Boudreault Attachments: CassandraDaemon-2014-11-25-2.snapshot.tar.gz, test.sh Hi, I've got an issue with incremental repairs on our production 15 nodes 2.1.2 (new cluster, not yet loaded, RF=3) After having successfully performed an incremental repair (-par -inc) on 3 nodes, I started receiving Repair failed with error Did not get positive replies from all endpoints. from nodetool on all remaining nodes : [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges for keyspace (seq=false, full=false) [2014-11-14 09:12:47,919] Repair failed with error Did not get positive replies from all endpoints. All the nodes are up and running and the local system log shows that the repair commands got started and that's it. I've also noticed that soon after the repair, several nodes started having more cpu load indefinitely without any particular reason (no tasks / queries, nothing in the logs). I then restarted C* on these nodes and retried the repair on several nodes, which were successful until facing the issue again. I tried to repro on our 3 nodes preproduction cluster without success It looks like I'm not the only one having this issue: http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html Any idea? Thanks Loic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212055#comment-14212055 ] Loic Lambiel commented on CASSANDRA-8316: - Nope, nothing special noticed on other nodes (except load on few nodes) Did not get positive replies from all endpoints error on incremental repair -- Key: CASSANDRA-8316 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316 Project: Cassandra Issue Type: Bug Components: Core Environment: cassandra 2.1.2 Reporter: Loic Lambiel Hi, I've got an issue with incremental repairs on our production 15 nodes 2.1.2 (new cluster, not yet loaded, RF=3) After having successfully performed an incremental repair (-par -inc) on 3 nodes, I started receiving Repair failed with error Did not get positive replies from all endpoints. from nodetool on all remaining nodes : [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges for keyspace (seq=false, full=false) [2014-11-14 09:12:47,919] Repair failed with error Did not get positive replies from all endpoints. All the nodes are up and running and the local system log shows that the repair commands got started and that's it. I've also noticed that soon after the repair, several nodes started having more cpu load indefinitely without any particular reason (no tasks / queries, nothing in the logs). I then restarted C* on these nodes and retried the repair on several nodes, which were successful until facing the issue again. I tried to repro on our 3 nodes preproduction cluster without success It looks like I'm not the only one having this issue: http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html Any idea? Thanks Loic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212103#comment-14212103 ] Loic Lambiel commented on CASSANDRA-8316: - I forgot to mention that I'm using LCS, in case of Did not get positive replies from all endpoints error on incremental repair -- Key: CASSANDRA-8316 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316 Project: Cassandra Issue Type: Bug Components: Core Environment: cassandra 2.1.2 Reporter: Loic Lambiel Assignee: Ryan McGuire Hi, I've got an issue with incremental repairs on our production 15 nodes 2.1.2 (new cluster, not yet loaded, RF=3) After having successfully performed an incremental repair (-par -inc) on 3 nodes, I started receiving Repair failed with error Did not get positive replies from all endpoints. from nodetool on all remaining nodes : [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges for keyspace (seq=false, full=false) [2014-11-14 09:12:47,919] Repair failed with error Did not get positive replies from all endpoints. All the nodes are up and running and the local system log shows that the repair commands got started and that's it. I've also noticed that soon after the repair, several nodes started having more cpu load indefinitely without any particular reason (no tasks / queries, nothing in the logs). I then restarted C* on these nodes and retried the repair on several nodes, which were successful until facing the issue again. I tried to repro on our 3 nodes preproduction cluster without success It looks like I'm not the only one having this issue: http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html Any idea? Thanks Loic -- This message was sent by Atlassian JIRA (v6.3.4#6332)