[jira] [Updated] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-08 Thread Loic Lambiel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Loic Lambiel updated CASSANDRA-13948:
-
Attachment: threaddump-cleanup.txt

Compactions are now processing as expected with your latest patch :)

However I'm facing an issue with nodetool cleanup, dunno if it is related or 
not.

Starting a cleanup cancel the ongoing compactions (which is expected from my 
understanding) and then get lost. Not performing any cleanup nor processing the 
pending compactions. 1 thread is using 100% of a core all the time. The log 
doesn't show any error. I've attached a thread dump.

I'm happy to open another Jira if it's not related.

> Reload compaction strategies when JBOD disk boundary changes
> 
>
> Key: CASSANDRA-13948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 3.11.x, 4.x
>
> Attachments: debug.log, threaddump-cleanup.txt, threaddump.txt, 
> trace.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, 
> line=943 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
>  java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
>  java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
>  java.util.Collection, org.apache.cassandra.db.compaction.OperationType, 
> java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
>  @bci=157, line=227 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
>  @bci=61, line=116 (Compiled frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
>  @bci=2, line=200 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
>  @bci=5, line=185 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
>  @bci=559, line=130 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=9, line=1420 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=4, line=250 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() 
> @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() 
> @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
>  @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled 
> frame)
>  - 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>  @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
> @bci=37, line=294 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)

[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-06 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240341#comment-16240341
 ] 

Loic Lambiel commented on CASSANDRA-13948:
--

Thanks [~krummas] The patches from [~pauloricardomg] already reduced the 
startup time by at least 10x

> Reload compaction strategies when JBOD disk boundary changes
> 
>
> Key: CASSANDRA-13948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 3.11.x, 4.x
>
> Attachments: debug.log, threaddump.txt, trace.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, 
> line=943 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
>  java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
>  java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
>  java.util.Collection, org.apache.cassandra.db.compaction.OperationType, 
> java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
>  @bci=157, line=227 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
>  @bci=61, line=116 (Compiled frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
>  @bci=2, line=200 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
>  @bci=5, line=185 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
>  @bci=559, line=130 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=9, line=1420 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=4, line=250 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() 
> @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() 
> @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
>  @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled 
> frame)
>  - 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>  @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
> @bci=37, line=294 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> 

[jira] [Updated] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-03 Thread Loic Lambiel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Loic Lambiel updated CASSANDRA-13948:
-
Attachment: threaddump.txt

> Reload compaction strategies when JBOD disk boundary changes
> 
>
> Key: CASSANDRA-13948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Major
> Fix For: 3.11.x, 4.x
>
> Attachments: debug.log, threaddump.txt, trace.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, 
> line=943 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
>  java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
>  java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
>  java.util.Collection, org.apache.cassandra.db.compaction.OperationType, 
> java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
>  @bci=157, line=227 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
>  @bci=61, line=116 (Compiled frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
>  @bci=2, line=200 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
>  @bci=5, line=185 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
>  @bci=559, line=130 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=9, line=1420 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=4, line=250 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() 
> @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() 
> @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
>  @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled 
> frame)
>  - 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>  @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
> @bci=37, line=294 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> {noformat}
> {noformat}
> Thread 94573: (state = IN_JAVA)
>  - 

[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-03 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238384#comment-16238384
 ] 

Loic Lambiel commented on CASSANDRA-13948:
--

I've deployed the patch on a few big nodes. I've not seen the error popping up 
so far.

However I'm still facing issues with compactions. These are big nodes with with 
a big CF, holding many SSTables and pending compactions. According the thread 
dump it seems to be stuck around getNextBackgroundTask. Compactions are still 
being processed for the other keyspace. Beside that the node is running 
normally. Some nodetool commands takes time to proceed like compactionstats. 
Debug log doesn't show any error.

{code:java}
CREATE TABLE blobstore.block (
inode uuid,
version timeuuid,
block bigint,
offset bigint,
chunksize int,
payload blob,
PRIMARY KEY ((inode, version, block), offset)
) WITH CLUSTERING ORDER BY (offset ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'enabled': 
'true', 'tombstone_compaction_interval': '60', 'tombstone_threshold': '0.2', 
'unchecked_tombstone_compaction': 'false'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 172000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
{code}


{code:java}
Keyspace : blobstore
Read Count: 97019
Read Latency: 2.4842547026871027 ms.
Write Count: 472590
Write Latency: 0.060107954040500226 ms.
Pending Flushes: 0
Table: block
SSTable count: 43373
SSTables in each level: [18890/4, 115/10, 198/100, 1905/1000, 
9451, 12814, 0, 0, 0]
Space used (live): 4839933810943
Space used (total): 4839933815913
Space used by snapshots (total): 0
Off heap memory used (total): 3273703284
SSTable Compression Ratio: 0.9416884172984209
Number of partitions (estimate): 2925826
Memtable cell count: 41542
Memtable data size: 2631688187
Memtable off heap memory used: 2638649871
Memtable switch count: 7
Local read count: 87281
Local read latency: 2.186 ms
Local write count: 465591
Local write latency: 0.124 ms
Pending flushes: 0
Percent repaired: 4.01
Bloom filter false positives: 297882
Bloom filter false ratio: 0.69198
Bloom filter space used: 5111208
Bloom filter off heap memory used: 4764232
Index summary off heap memory used: 3360917
Compression metadata off heap memory used: 626928264
Compacted partition minimum bytes: 61
Compacted partition maximum bytes: 186563160
Compacted partition mean bytes: 1797922
Average live cells per slice (last five minutes): 
8.641592920353983
Maximum live cells per slice (last five minutes): 258
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0

{code}

{code:java}
nodetool compactionstats
pending tasks: 3362
- blobstore.block: 3362
{code}

> Reload compaction strategies when JBOD disk boundary changes
> 
>
> Key: CASSANDRA-13948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Major
> Fix For: 3.11.x, 4.x
>
> Attachments: debug.log, trace.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> 

[jira] [Updated] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-03 Thread Loic Lambiel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Loic Lambiel updated CASSANDRA-13948:
-
Attachment: trace.log

Ok I was able to reproduce it on one node. I've attached the trace log. It's 
unfiltered since I didn't managed to filter only to 
org.apache.cassandra.db.compaction





> Reload compaction strategies when JBOD disk boundary changes
> 
>
> Key: CASSANDRA-13948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Major
> Fix For: 3.11.x, 4.x
>
> Attachments: debug.log, trace.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, 
> line=943 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
>  java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
>  java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
>  java.util.Collection, org.apache.cassandra.db.compaction.OperationType, 
> java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
>  @bci=157, line=227 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
>  @bci=61, line=116 (Compiled frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
>  @bci=2, line=200 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
>  @bci=5, line=185 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
>  @bci=559, line=130 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=9, line=1420 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=4, line=250 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() 
> @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() 
> @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
>  @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled 
> frame)
>  - 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>  @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
> @bci=37, line=294 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted frame)
>  

[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-02 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236514#comment-16236514
 ] 

Loic Lambiel commented on CASSANDRA-13948:
--

Did additional testing and wasn't able to reproduce :-/ 

I'll try the patch on more representative nodes in the coming days and report 
back any issue.

> Reload compaction strategies when JBOD disk boundary changes
> 
>
> Key: CASSANDRA-13948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Major
> Fix For: 3.11.x, 4.x
>
> Attachments: debug.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, 
> line=943 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
>  java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
>  java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
>  java.util.Collection, org.apache.cassandra.db.compaction.OperationType, 
> java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
>  @bci=157, line=227 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
>  @bci=61, line=116 (Compiled frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
>  @bci=2, line=200 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
>  @bci=5, line=185 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
>  @bci=559, line=130 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=9, line=1420 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=4, line=250 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() 
> @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() 
> @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
>  @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled 
> frame)
>  - 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>  @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
> @bci=37, line=294 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted frame)
>  - 

[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-01 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234902#comment-16234902
 ] 

Loic Lambiel commented on CASSANDRA-13948:
--

The strategy is LCS, node with 9 data locations. The error is repeating 
frequently. I'll attach a debug log tomorrow and make some additional tests.

> Reload compaction strategies when JBOD disk boundary changes
> 
>
> Key: CASSANDRA-13948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Major
> Fix For: 3.11.x, 4.x
>
> Attachments: debug.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, 
> line=943 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
>  java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
>  java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
>  java.util.Collection, org.apache.cassandra.db.compaction.OperationType, 
> java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
>  @bci=157, line=227 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
>  @bci=61, line=116 (Compiled frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
>  @bci=2, line=200 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
>  @bci=5, line=185 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
>  @bci=559, line=130 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=9, line=1420 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=4, line=250 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() 
> @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() 
> @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
>  @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled 
> frame)
>  - 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>  @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
> @bci=37, line=294 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted frame)
>  - 

[jira] [Comment Edited] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-01 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234460#comment-16234460
 ] 

Loic Lambiel edited comment on CASSANDRA-13948 at 11/1/17 5:55 PM:
---

Yes [~krummas], build including the latest patch.


was (Author: llambiel):
Yes @krummas, build including the latest patch.

> Reload compaction strategies when JBOD disk boundary changes
> 
>
> Key: CASSANDRA-13948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Major
> Fix For: 3.11.x, 4.x
>
> Attachments: debug.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, 
> line=943 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
>  java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
>  java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
>  java.util.Collection, org.apache.cassandra.db.compaction.OperationType, 
> java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
>  @bci=157, line=227 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
>  @bci=61, line=116 (Compiled frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
>  @bci=2, line=200 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
>  @bci=5, line=185 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
>  @bci=559, line=130 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=9, line=1420 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=4, line=250 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() 
> @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() 
> @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
>  @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled 
> frame)
>  - 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>  @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
> @bci=37, line=294 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted 

[jira] [Comment Edited] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-01 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234460#comment-16234460
 ] 

Loic Lambiel edited comment on CASSANDRA-13948 at 11/1/17 5:44 PM:
---

Yes @krummas, build including the latest patch.


was (Author: llambiel):
Yes, build including the latest patch.

> Reload compaction strategies when JBOD disk boundary changes
> 
>
> Key: CASSANDRA-13948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Major
> Fix For: 3.11.x, 4.x
>
> Attachments: debug.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, 
> line=943 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
>  java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
>  java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
>  java.util.Collection, org.apache.cassandra.db.compaction.OperationType, 
> java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
>  @bci=157, line=227 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
>  @bci=61, line=116 (Compiled frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
>  @bci=2, line=200 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
>  @bci=5, line=185 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
>  @bci=559, line=130 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=9, line=1420 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=4, line=250 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() 
> @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() 
> @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
>  @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled 
> frame)
>  - 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>  @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
> @bci=37, line=294 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted frame)
>  - 

[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-01 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234460#comment-16234460
 ] 

Loic Lambiel commented on CASSANDRA-13948:
--

Yes, build including the latest patch.

> Reload compaction strategies when JBOD disk boundary changes
> 
>
> Key: CASSANDRA-13948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Major
> Fix For: 3.11.x, 4.x
>
> Attachments: debug.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, 
> line=943 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
>  java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
>  java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
>  java.util.Collection, org.apache.cassandra.db.compaction.OperationType, 
> java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
>  @bci=157, line=227 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
>  @bci=61, line=116 (Compiled frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
>  @bci=2, line=200 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
>  @bci=5, line=185 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
>  @bci=559, line=130 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=9, line=1420 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=4, line=250 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() 
> @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() 
> @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
>  @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled 
> frame)
>  - 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>  @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
> @bci=37, line=294 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> {noformat}
> {noformat}
> Thread 94573: (state = IN_JAVA)
>  - 

[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-01 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234385#comment-16234385
 ] 

Loic Lambiel commented on CASSANDRA-13948:
--

I tried your patch on 3.11.2 and got the following errors:


{code:java}
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,397 Ref.java:224 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@51f52b91) to class 
org.apache.cassandra.io.util.MmappedRegions$Tidier@1358582595:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103504-big-Data.db
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,413 Ref.java:224 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@70a08046) to class 
org.apache.cassandra.io.util.MmappedRegions$Tidier@632323950:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103503-big-Data.db
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,413 Ref.java:224 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@5d6161ea) to class 
org.apache.cassandra.io.util.FileHandle$Cleanup@1594052942:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103502-big-Index.db
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,429 Ref.java:224 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@2bd55858) to class 
org.apache.cassandra.io.util.MmappedRegions$Tidier@230164803:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103502-big-Data.db
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,429 Ref.java:224 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@5b00472f) to class 
org.apache.cassandra.io.util.SafeMemory$MemoryTidy@508355616:Memory@[7f6b54130b10..7f6b54136f10)
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,429 Ref.java:224 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@1f5a7829) to class 
org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@1390774416:[Memory@[0..20),
 Memory@[0..240)] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK 
DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@b8594dc) 
to class 
org.apache.cassandra.io.util.FileHandle$Cleanup@1913719912:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103503-big-Index.db
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@3ec6a933) to class 
org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1083770739:Memory@[7f6b5453ff30..7f6b54546330)
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@671d48c8) to class 
org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@496335375:[Memory@[0..20),
 Memory@[0..240)] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@1611d7bf) to class 
org.apache.cassandra.io.util.SafeMemory$MemoryTidy@515635345:Memory@[7f6b540f7dc0..7f6b540fe1c0)
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@136db886) to class 
org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@622788070:[Memory@[0..20),
 Memory@[0..240)] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@3daa7ad5) to class 
org.apache.cassandra.io.util.FileHandle$Cleanup@2090103425:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103504-big-Index.db
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK 
DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@2435c1d) 
to class 
org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1348493438:Memory@[7f6b546ebd80..7f6b546f2180)
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 

[jira] [Commented] (CASSANDRA-13980) Compaction deadlock

2017-10-29 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224401#comment-16224401
 ] 

Loic Lambiel commented on CASSANDRA-13980:
--

Yes, 4.

> Compaction deadlock
> ---
>
> Key: CASSANDRA-13980
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13980
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 3.11.2
> 48 nodes cluster using LCS, JBOD
> Big nodes with many SSTables
>Reporter: Loic Lambiel
>Priority: Critical
> Fix For: 3.11.x
>
> Attachments: threaddump.log
>
>
> While upgrading the cluster from 2.1.16 from 3.11.2, after a few hours most 
> of the upgraded nodes started to go in a compaction infinite loop and showing 
> many events like the one below (always for the same SSTable):
> {code:java}
> INFO  [CompactionExecutor:4] 2017-10-29 00:28:31,480 LeveledManifest.java:474 
> - Adding high-level (L5) 
> BigTableReader(path='/var/lib/cassandra/data/datadisk4/blobstore/block-1d63273065b911e49cd7ef0972cffde6/blobstore-block-ka-201694-Data.db')
>  to candidates
> {code}
> Since the log get spammed at a huge rate, I'm unable to get any previous 
> events.
> Tried restarts and sstablescrub -m without success. The only workaround that 
> seems to work (so far) was sstablelevelreset.
> I've attached the dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13980) Compaction deadlock

2017-10-29 Thread Loic Lambiel (JIRA)
Loic Lambiel created CASSANDRA-13980:


 Summary: Compaction deadlock
 Key: CASSANDRA-13980
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13980
 Project: Cassandra
  Issue Type: Bug
  Components: Compaction
 Environment: Cassandra 3.11.2
48 nodes cluster using LCS, JBOD
Big nodes with many SSTables
Reporter: Loic Lambiel
Priority: Critical
 Fix For: 3.11.x
 Attachments: threaddump.log

While upgrading the cluster from 2.1.16 from 3.11.2, after a few hours most of 
the upgraded nodes started to go in a compaction infinite loop and showing many 
events like the one below (always for the same SSTable):

{code:java}
INFO  [CompactionExecutor:4] 2017-10-29 00:28:31,480 LeveledManifest.java:474 - 
Adding high-level (L5) 
BigTableReader(path='/var/lib/cassandra/data/datadisk4/blobstore/block-1d63273065b911e49cd7ef0972cffde6/blobstore-block-ka-201694-Data.db')
 to candidates
{code}

Since the log get spammed at a huge rate, I'm unable to get any previous events.

Tried restarts and sstablescrub -m without success. The only workaround that 
seems to work (so far) was sstablelevelreset.

I've attached the dump.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-9625) GraphiteReporter not reporting

2016-12-19 Thread Loic Lambiel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Loic Lambiel updated CASSANDRA-9625:

Attachment: thread-dump2.log

New thread dump attached

> GraphiteReporter not reporting
> --
>
> Key: CASSANDRA-9625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9625
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3
>Reporter: Eric Evans
>Assignee: Stefan Podkowinski
> Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, 
> thread-dump.log, thread-dump2.log
>
>
> When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops 
> working.  The usual startup is logged, and one batch of samples is sent, but 
> the reporting interval comes and goes, and no other samples are ever sent.  
> The logs are free from errors.
> Frustratingly, metrics reporting works in our smaller (staging) environment 
> on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not 
> on a 3 node (otherwise identical) staging cluster (maybe it takes a certain 
> level of concurrency?).
> Attached is a thread dump, and our metrics.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting

2016-12-19 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760731#comment-15760731
 ] 

Loic Lambiel commented on CASSANDRA-9625:
-

Well, after running the patch for 4 days, most of the graphite reporter threads 
are dead. The patch does not seems to fix the issue. :-(

> GraphiteReporter not reporting
> --
>
> Key: CASSANDRA-9625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9625
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3
>Reporter: Eric Evans
>Assignee: Stefan Podkowinski
> Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, 
> thread-dump.log
>
>
> When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops 
> working.  The usual startup is logged, and one batch of samples is sent, but 
> the reporting interval comes and goes, and no other samples are ever sent.  
> The logs are free from errors.
> Frustratingly, metrics reporting works in our smaller (staging) environment 
> on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not 
> on a 3 node (otherwise identical) staging cluster (maybe it takes a certain 
> level of concurrency?).
> Attached is a thread dump, and our metrics.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting

2016-12-16 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753883#comment-15753883
 ] 

Loic Lambiel commented on CASSANDRA-9625:
-

Running the patch on 2.1.16 production cluster flawlessly since 24 hours, many 
thanks !

> GraphiteReporter not reporting
> --
>
> Key: CASSANDRA-9625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9625
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3
>Reporter: Eric Evans
>Assignee: T Jake Luciani
> Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, 
> thread-dump.log
>
>
> When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops 
> working.  The usual startup is logged, and one batch of samples is sent, but 
> the reporting interval comes and goes, and no other samples are ever sent.  
> The logs are free from errors.
> Frustratingly, metrics reporting works in our smaller (staging) environment 
> on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not 
> on a 3 node (otherwise identical) staging cluster (maybe it takes a certain 
> level of concurrency?).
> Attached is a thread dump, and our metrics.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-11-11 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657238#comment-15657238
 ] 

Loic Lambiel commented on CASSANDRA-9754:
-

Any update on your ongoing tests [~mkjellman] ?

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 0f8e28c220fd5af6c7b5dd2d3dab6936c4aa4b6b.patch, 
> gc_collection_times_with_birch.png, gc_collection_times_without_birch.png, 
> gc_counts_with_birch.png, gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting

2016-10-19 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588224#comment-15588224
 ] 

Loic Lambiel commented on CASSANDRA-9625:
-

Same issue with 2.1.16

> GraphiteReporter not reporting
> --
>
> Key: CASSANDRA-9625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9625
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3
>Reporter: Eric Evans
>Assignee: T Jake Luciani
> Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, 
> thread-dump.log
>
>
> When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops 
> working.  The usual startup is logged, and one batch of samples is sent, but 
> the reporting interval comes and goes, and no other samples are ever sent.  
> The logs are free from errors.
> Frustratingly, metrics reporting works in our smaller (staging) environment 
> on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not 
> on a 3 node (otherwise identical) staging cluster (maybe it takes a certain 
> level of concurrency?).
> Attached is a thread dump, and our metrics.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting

2016-09-02 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457863#comment-15457863
 ] 

Loic Lambiel commented on CASSANDRA-9625:
-

How can we go ahead and fix this annoying bug ?

Running Cassandra 2.1.13, it happens randomly when there's a certain amount of 
compactions queued / running on nodes. There's nothing in the log at the time 
it stops reporting the metrics. It happens also when there's no repair in 
progress.

> GraphiteReporter not reporting
> --
>
> Key: CASSANDRA-9625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9625
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3
>Reporter: Eric Evans
>Assignee: T Jake Luciani
> Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, 
> thread-dump.log
>
>
> When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops 
> working.  The usual startup is logged, and one batch of samples is sent, but 
> the reporting interval comes and goes, and no other samples are ever sent.  
> The logs are free from errors.
> Frustratingly, metrics reporting works in our smaller (staging) environment 
> on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not 
> on a 3 node (otherwise identical) staging cluster (maybe it takes a certain 
> level of concurrency?).
> Attached is a thread dump, and our metrics.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-07-17 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631408#comment-14631408
 ] 

Loic Lambiel commented on CASSANDRA-9683:
-

That stats comes from Opscenter.

We're using durable_writes = false in the blobstore Keyspace, were most data 
are written. This may explain the low write latency.

I'm going to try to reproduce this on a new single node setup as I don't want 
to kill this cluster. I'll do it in the coming days, as soon as I have the pipe.

We're using Cassandra as a backend for our object storage service based on our 
pithos (http://pithos.io) API frontend. Data can then be uploaded using any S3 
compatible tools like s3cmd. I don't know if you want to go into such setup. 
(we could help or do it remotely if needed). I don't know either how to 
reproduce the data pattern and usage without it.

 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Assignee: Ariel Weisberg
 Fix For: 2.1.x

 Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, 
 os_load.png, pending_compactions.png, read_latency.png, schema.txt, 
 system.log, write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-07-14 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625905#comment-14625905
 ] 

Loic Lambiel commented on CASSANDRA-9683:
-

2.1.8 does have the same performance issue that is addressed by nodetool 
disableautocompaction.

2.1.8 compiled without https://issues.apache.org/jira/browse/CASSANDRA-9592 
does not have the performance issue.

The issue is then related to 
https://issues.apache.org/jira/browse/CASSANDRA-9592

 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Assignee: Ariel Weisberg
 Fix For: 2.1.x

 Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, 
 os_load.png, pending_compactions.png, read_latency.png, schema.txt, 
 system.log, write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-07-09 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620044#comment-14620044
 ] 

Loic Lambiel commented on CASSANDRA-9683:
-

2.1.8-tentative compiled without 
https://issues.apache.org/jira/browse/CASSANDRA-9592 is running just fine.

What could be the next steps ?

 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Assignee: Ariel Weisberg
 Fix For: 2.1.x

 Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, 
 os_load.png, pending_compactions.png, read_latency.png, schema.txt, 
 system.log, write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-07-08 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618168#comment-14618168
 ] 

Loic Lambiel commented on CASSANDRA-9683:
-

Sorry for the response delay.

We're running a 3 nodes cluster with C* 2.1.8-tentative version on 1 DC. The 
hosts are physical and connected on the same network switch:

- Ubuntu 12.04 (3.13 Kernel)
- RAM 32GB
- 4 Cores (+4 HT)
- SSD for OS and logs
- 8* JBOD SATA drives for data
- Java(TM) SE Runtime Environment (build 1.7.0_80-b15)

cat /etc/default/cassandra:
MAX_HEAP_SIZE=8G
HEAP_NEWSIZE=800m
JVM_OPTS=$JVM_OPTS 
-Dcassandra.metricsReporterConfigFile=metrics-reporter-config.yaml

We're using stock cassandra.env (attached)

The consistency level is at quorum.

On this cluster we're doing:

- Write a few kb in a CF, read back the data and then delete, this every minute
- Write and delete ~300mb once a day

I'm going to perform some thread dump to see if I could find something.

Thanks for your support !

 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Assignee: Ariel Weisberg
 Fix For: 2.1.x

 Attachments: cassandra.yaml, cfstats.txt, os_load.png, 
 pending_compactions.png, read_latency.png, schema.txt, system.log, 
 write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-07-08 Thread Loic Lambiel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Loic Lambiel updated CASSANDRA-9683:

Attachment: cassandra-env.sh

 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Assignee: Ariel Weisberg
 Fix For: 2.1.x

 Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, 
 os_load.png, pending_compactions.png, read_latency.png, schema.txt, 
 system.log, write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-07-08 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618809#comment-14618809
 ] 

Loic Lambiel commented on CASSANDRA-9683:
-

I tried to run nodetool disableautocompaction on all nodes and the load and 
latencies went down to 2.1.6 previous level.

Issue related to https://issues.apache.org/jira/browse/CASSANDRA-9592 ?

 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Assignee: Ariel Weisberg
 Fix For: 2.1.x

 Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, 
 os_load.png, pending_compactions.png, read_latency.png, schema.txt, 
 system.log, write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-07-08 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619227#comment-14619227
 ] 

Loic Lambiel commented on CASSANDRA-9683:
-

I'm going to compile 2.1.8-tentative without CASSANDRA-9592 to see if it's 
related

 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Assignee: Ariel Weisberg
 Fix For: 2.1.x

 Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, 
 os_load.png, pending_compactions.png, read_latency.png, schema.txt, 
 system.log, write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-07-03 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612874#comment-14612874
 ] 

Loic Lambiel commented on CASSANDRA-9683:
-

Yes it is correct

 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Assignee: Ariel Weisberg
 Fix For: 2.1.x

 Attachments: cassandra.yaml, cfstats.txt, os_load.png, 
 pending_compactions.png, read_latency.png, schema.txt, system.log, 
 write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-07-02 Thread Loic Lambiel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Loic Lambiel updated CASSANDRA-9683:

Attachment: schema.txt
cfstats.txt

 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Assignee: Ariel Weisberg
 Fix For: 2.1.x

 Attachments: cassandra.yaml, cfstats.txt, os_load.png, 
 pending_compactions.png, read_latency.png, schema.txt, system.log, 
 write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-07-02 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612053#comment-14612053
 ] 

Loic Lambiel commented on CASSANDRA-9683:
-

I've attached the schema and cfstats so you could have a better idea.

I didn't had the opportunity yet to test it on a single node config.

FYI our config is logs on SSD and data on 8* JBOD SATA disks. This cluster 
serves our S3 compatible object storage (http://pithos.io).



 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Assignee: Ariel Weisberg
 Fix For: 2.1.x

 Attachments: cassandra.yaml, cfstats.txt, os_load.png, 
 pending_compactions.png, read_latency.png, schema.txt, system.log, 
 write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-06-30 Thread Loic Lambiel (JIRA)
Loic Lambiel created CASSANDRA-9683:
---

 Summary: Get mucher higher load and latencies after upgrading from 
2.1.6 to cassandra 2.1.7
 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
JDK: Oracle JDK 7
RAM: 32GB
Cores 4 (+4 HT)

Reporter: Loic Lambiel
Priority: Critical
 Attachments: os_load.png, pending_compactions.png, read_latency.png, 
system.log, write_latency.png

After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the 
average load grows from 0.1-0.3 to 1.8.

Latencies did increase as well.

We see an increase of pending compactions, probably due to CASSANDRA-9592.

This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-06-30 Thread Loic Lambiel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Loic Lambiel updated CASSANDRA-9683:

Attachment: cassandra.yaml

 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Priority: Critical
 Attachments: cassandra.yaml, os_load.png, pending_compactions.png, 
 read_latency.png, system.log, write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-06-30 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609210#comment-14609210
 ] 

Loic Lambiel commented on CASSANDRA-9683:
-

I did a build against the 2.1 branch and upgraded the cluster. I do not see 
much changes, load is still high (1.5 to 2.2) without workload

Also still pending compactions in the queue

 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Assignee: Philip Thompson
 Fix For: 2.1.x

 Attachments: cassandra.yaml, os_load.png, pending_compactions.png, 
 read_latency.png, system.log, write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair

2014-12-01 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229587#comment-14229587
 ] 

Loic Lambiel commented on CASSANDRA-8316:
-

Hi guys,

Any chance to get this issue fixed for 2.1.3 ? On our side we face this issue 
on almost all incremental repairs

  Did not get positive replies from all endpoints error on incremental repair
 --

 Key: CASSANDRA-8316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: cassandra 2.1.2
Reporter: Loic Lambiel
Assignee: Alan Boudreault
 Attachments: CassandraDaemon-2014-11-25-2.snapshot.tar.gz, test.sh


 Hi,
 I've got an issue with incremental repairs on our production 15 nodes 2.1.2 
 (new cluster, not yet loaded, RF=3)
 After having successfully performed an incremental repair (-par -inc) on 3 
 nodes, I started receiving Repair failed with error Did not get positive 
 replies from all endpoints. from nodetool on all remaining nodes :
 [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges 
 for keyspace  (seq=false, full=false)
 [2014-11-14 09:12:47,919] Repair failed with error Did not get positive 
 replies from all endpoints.
 All the nodes are up and running and the local system log shows that the 
 repair commands got started and that's it.
 I've also noticed that soon after the repair, several nodes started having 
 more cpu load indefinitely without any particular reason (no tasks / queries, 
 nothing in the logs). I then restarted C* on these nodes and retried the 
 repair on several nodes, which were successful until facing the issue again.
 I tried to repro on our 3 nodes preproduction cluster without success
 It looks like I'm not the only one having this issue: 
 http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html
 Any idea?
 Thanks
 Loic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair

2014-11-14 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212055#comment-14212055
 ] 

Loic Lambiel commented on CASSANDRA-8316:
-

Nope, nothing special noticed on other nodes (except load on few nodes)

  Did not get positive replies from all endpoints error on incremental repair
 --

 Key: CASSANDRA-8316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: cassandra 2.1.2
Reporter: Loic Lambiel

 Hi,
 I've got an issue with incremental repairs on our production 15 nodes 2.1.2 
 (new cluster, not yet loaded, RF=3)
 After having successfully performed an incremental repair (-par -inc) on 3 
 nodes, I started receiving Repair failed with error Did not get positive 
 replies from all endpoints. from nodetool on all remaining nodes :
 [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges 
 for keyspace  (seq=false, full=false)
 [2014-11-14 09:12:47,919] Repair failed with error Did not get positive 
 replies from all endpoints.
 All the nodes are up and running and the local system log shows that the 
 repair commands got started and that's it.
 I've also noticed that soon after the repair, several nodes started having 
 more cpu load indefinitely without any particular reason (no tasks / queries, 
 nothing in the logs). I then restarted C* on these nodes and retried the 
 repair on several nodes, which were successful until facing the issue again.
 I tried to repro on our 3 nodes preproduction cluster without success
 It looks like I'm not the only one having this issue: 
 http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html
 Any idea?
 Thanks
 Loic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair

2014-11-14 Thread Loic Lambiel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212103#comment-14212103
 ] 

Loic Lambiel commented on CASSANDRA-8316:
-

I forgot to mention that I'm using LCS, in case of

  Did not get positive replies from all endpoints error on incremental repair
 --

 Key: CASSANDRA-8316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: cassandra 2.1.2
Reporter: Loic Lambiel
Assignee: Ryan McGuire

 Hi,
 I've got an issue with incremental repairs on our production 15 nodes 2.1.2 
 (new cluster, not yet loaded, RF=3)
 After having successfully performed an incremental repair (-par -inc) on 3 
 nodes, I started receiving Repair failed with error Did not get positive 
 replies from all endpoints. from nodetool on all remaining nodes :
 [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges 
 for keyspace  (seq=false, full=false)
 [2014-11-14 09:12:47,919] Repair failed with error Did not get positive 
 replies from all endpoints.
 All the nodes are up and running and the local system log shows that the 
 repair commands got started and that's it.
 I've also noticed that soon after the repair, several nodes started having 
 more cpu load indefinitely without any particular reason (no tasks / queries, 
 nothing in the logs). I then restarted C* on these nodes and retried the 
 repair on several nodes, which were successful until facing the issue again.
 I tried to repro on our 3 nodes preproduction cluster without success
 It looks like I'm not the only one having this issue: 
 http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html
 Any idea?
 Thanks
 Loic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)