Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
On Wed, Sep 12, 2012 at 9:38 AM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: OK, so what's the worst case here? Data loss? Bad performance? Low performance is for sure a side effect. I can't comment on data loss (and I'm curious about as well) because it depends on how data off of an out-of-order sstable was being indexed and served prior to Cassandra 1.1.1 (that the bug became apparent) which is essential for counter repairs, for example. -- Omid The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and I agree it would be helpful to have it on NEWS.txt. I'll file a bug on this, unless someone can get to it first :) /Janne
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
On 12 Sep 2012, at 00:50, Omid Aladini wrote: On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: Does this mean that LCS on 1.0.x should be considered unsafe to use? I'm using them for semi-wide frequently-updated CounterColumns and they're performing much better on LCS than on STCS. That's true. Unsafe in the sense that your data might not be in the right shape with respect to order of keys in sstables and LCS's properties and you might need to offline-scrub when you upgrade to the latest 1.1.x. OK, so what's the worst case here? Data loss? Bad performance? The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and I agree it would be helpful to have it on NEWS.txt. I'll file a bug on this, unless someone can get to it first :) /Janne
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
Which version of Cassandra has your data been created initially with? A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables and inter-level overlaps in CFs with Leveled Compaction. Your sstables generated with 1.1.3 and later should not have this issue [1] [2]. In case you have old Leveled-compacted sstables (generated with 1.1.2 or earlier. including 1.0.x) you need to run offline scrub using Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix out-of-order sstables and inter-level overlaps caused by previous versions of LCS. You need to take nodes down in order to run offline scrub. After 3 hours the job is done and there are 11390 compaction tasks pending. My question: Can these assertions be ignored? Or do I need to worry about it? They can't be ignored since pending compactions elevate the upper bound on number of disk seeks you need to make to read a row and you don't get the nice guarantees of leveled compaction. Cheers, Omid [1] https://issues.apache.org/jira/browse/CASSANDRA-4411 [2] https://issues.apache.org/jira/browse/CASSANDRA-4321 On Mon, Sep 10, 2012 at 6:37 PM, Rudolf van der Leeden rudolf.vanderlee...@scoreloop.com wrote: Hi, I'm getting 5 identical assertions while running 'nodetool cleanup' on a Cassandra 1.1.4 node with Load=104G and 80m keys. From system.log : ERROR [CompactionExecutor:576] 2012-09-10 11:25:50,265 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:576,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:992) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) After 3 hours the job is done and there are 11390 compaction tasks pending. My question: Can these assertions be ignored? Or do I need to worry about it? Thanks for your help and best regards, -Rudolf.
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
Which version of Cassandra has your data been created initially with? A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables and inter-level overlaps in CFs with Leveled Compaction. Your sstables generated with 1.1.3 and later should not have this issue [1] [2]. In case you have old Leveled-compacted sstables (generated with 1.1.2 or earlier. including 1.0.x) you need to run offline scrub using Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix out-of-order sstables and inter-level overlaps caused by previous versions of LCS. You need to take nodes down in order to run offline scrub. The data was orginally created on a 1.1.2 cluster with STCS (i.e. NOT leveled compaction). After the upgrade to 1.1.4 we changed from STCS to LCS w/o problems. Then we ran more tests and created more and very big keys with millions of columns. The assertion only shows up with one particular CF containing these big keys. So, from your explanation, I don't think an offline scrub will help. Thanks, -Rudolf.
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
Could you, as Aaron suggested, open a ticket? -- Omid On Tue, Sep 11, 2012 at 2:35 PM, Rudolf van der Leeden rudolf.vanderlee...@scoreloop.com wrote: Which version of Cassandra has your data been created initially with? A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables and inter-level overlaps in CFs with Leveled Compaction. Your sstables generated with 1.1.3 and later should not have this issue [1] [2]. In case you have old Leveled-compacted sstables (generated with 1.1.2 or earlier. including 1.0.x) you need to run offline scrub using Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix out-of-order sstables and inter-level overlaps caused by previous versions of LCS. You need to take nodes down in order to run offline scrub. The data was orginally created on a 1.1.2 cluster with STCS (i.e. NOT leveled compaction). After the upgrade to 1.1.4 we changed from STCS to LCS w/o problems. Then we ran more tests and created more and very big keys with millions of columns. The assertion only shows up with one particular CF containing these big keys. So, from your explanation, I don't think an offline scrub will help. Thanks, -Rudolf.
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
Could you, as Aaron suggested, open a ticket? Done: https://issues.apache.org/jira/browse/CASSANDRA-4644
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables and inter-level overlaps in CFs with Leveled Compaction. Your sstables generated with 1.1.3 and later should not have this issue [1] [2]. Does this mean that LCS on 1.0.x should be considered unsafe to use? I'm using them for semi-wide frequently-updated CounterColumns and they're performing much better on LCS than on STCS. In case you have old Leveled-compacted sstables (generated with 1.1.2 or earlier. including 1.0.x) you need to run offline scrub using Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix out-of-order sstables and inter-level overlaps caused by previous versions of LCS. You need to take nodes down in order to run offline scrub. The 1.1.5 README does not mention this. Should it? /Janne
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables and inter-level overlaps in CFs with Leveled Compaction. Your sstables generated with 1.1.3 and later should not have this issue [1] [2]. Does this mean that LCS on 1.0.x should be considered unsafe to use? I'm using them for semi-wide frequently-updated CounterColumns and they're performing much better on LCS than on STCS. That's true. Unsafe in the sense that your data might not be in the right shape with respect to order of keys in sstables and LCS's properties and you might need to offline-scrub when you upgrade to the latest 1.1.x. In case you have old Leveled-compacted sstables (generated with 1.1.2 or earlier. including 1.0.x) you need to run offline scrub using Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix out-of-order sstables and inter-level overlaps caused by previous versions of LCS. You need to take nodes down in order to run offline scrub. The 1.1.5 README does not mention this. Should it? The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and I agree it would be helpful to have it on NEWS.txt. Cheers, Omid /Janne
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
Based on the steps outlined here https://issues.apache.org/jira/browse/CASSANDRA-4644?focusedCommentId=13453156page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13453156it seems that LCS was not used until after 1.1.4 and they were able to do a full repair cleanup compact cycle on 1.1.4 before running into problems. I don't see any major bugfixes for LCS in 1.1.5 either, so this appears to be a legitimate bug if the timeline is correct. On Tue, Sep 11, 2012 at 2:50 PM, Omid Aladini omidalad...@gmail.com wrote: On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables and inter-level overlaps in CFs with Leveled Compaction. Your sstables generated with 1.1.3 and later should not have this issue [1] [2]. Does this mean that LCS on 1.0.x should be considered unsafe to use? I'm using them for semi-wide frequently-updated CounterColumns and they're performing much better on LCS than on STCS. That's true. Unsafe in the sense that your data might not be in the right shape with respect to order of keys in sstables and LCS's properties and you might need to offline-scrub when you upgrade to the latest 1.1.x. In case you have old Leveled-compacted sstables (generated with 1.1.2 or earlier. including 1.0.x) you need to run offline scrub using Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix out-of-order sstables and inter-level overlaps caused by previous versions of LCS. You need to take nodes down in order to run offline scrub. The 1.1.5 README does not mention this. Should it? The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and I agree it would be helpful to have it on NEWS.txt. Cheers, Omid /Janne
Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
Hi, I'm getting 5 identical assertions while running 'nodetool cleanup' on a Cassandra 1.1.4 node with Load=104G and 80m keys. From system.log : ERROR [CompactionExecutor:576] 2012-09-10 11:25:50,265 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:576,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:992) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) After 3 hours the job is done and there are 11390 compaction tasks pending. My question: Can these assertions be ignored? Or do I need to worry about it? Thanks for your help and best regards, -Rudolf.
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
My question: Can these assertions be ignored? Or do I need to worry about it? That looks like a problem. Can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA ? May be good to include information on: * how long you've been using Levelled Compaction. * Is this all CF's or just one ? * If you can identify the CF can you include the .json file that is kept on disk. It contains information about levelled compaction. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/09/2012, at 4:37 AM, Rudolf van der Leeden rudolf.vanderlee...@scoreloop.com wrote: Hi, I'm getting 5 identical assertions while running 'nodetool cleanup' on a Cassandra 1.1.4 node with Load=104G and 80m keys. From system.log : ERROR [CompactionExecutor:576] 2012-09-10 11:25:50,265 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:576,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:992) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) After 3 hours the job is done and there are 11390 compaction tasks pending. My question: Can these assertions be ignored? Or do I need to worry about it? Thanks for your help and best regards, -Rudolf.