Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-13 Thread Omid Aladini
On Wed, Sep 12, 2012 at 9:38 AM, Janne Jalkanen
janne.jalka...@ecyrd.com wrote:
 OK, so what's the worst case here? Data loss? Bad performance?

Low performance is for sure a side effect. I can't comment on data
loss (and I'm curious about as well) because it depends on how data
off of an out-of-order sstable was being indexed and served prior to
Cassandra 1.1.1 (that the bug became apparent) which is essential for
counter repairs, for example.

-- Omid

 The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and
 I agree it would be helpful to have it on NEWS.txt.

 I'll file a bug on this, unless someone can get to it first :)

 /Janne


Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-12 Thread Janne Jalkanen

On 12 Sep 2012, at 00:50, Omid Aladini wrote:

 On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen
 janne.jalka...@ecyrd.com wrote:
 
 Does this mean that LCS on 1.0.x should be considered unsafe to
 use? I'm using them for semi-wide frequently-updated CounterColumns
 and they're performing much better on LCS than on STCS.
 
 That's true. Unsafe in the sense that your data might not be in the
 right shape with respect to order of keys in sstables and LCS's
 properties and you might need to offline-scrub when you upgrade to the
 latest 1.1.x.

OK, so what's the worst case here? Data loss? Bad performance?

 The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and
 I agree it would be helpful to have it on NEWS.txt.

I'll file a bug on this, unless someone can get to it first :)

/Janne


Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Omid Aladini
Which version of Cassandra has your data been created initially with?

A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
and inter-level overlaps in CFs with Leveled Compaction. Your sstables
generated with 1.1.3 and later should not have this issue [1] [2].

In case you have old Leveled-compacted sstables (generated with 1.1.2
or earlier. including 1.0.x) you need to run offline scrub using
Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
out-of-order sstables and inter-level overlaps caused by previous
versions of LCS. You need to take nodes down in order to run offline
scrub.

 After 3 hours the job is done and there are 11390 compaction tasks pending.
 My question: Can these assertions be ignored? Or do I need to worry about
 it?

They can't be ignored since pending compactions elevate the upper
bound on number of disk seeks you need to make to read a row and you
don't get the nice guarantees of leveled compaction.

Cheers,
Omid

[1] https://issues.apache.org/jira/browse/CASSANDRA-4411
[2] https://issues.apache.org/jira/browse/CASSANDRA-4321

On Mon, Sep 10, 2012 at 6:37 PM, Rudolf van der Leeden
rudolf.vanderlee...@scoreloop.com wrote:
 Hi,

 I'm getting 5 identical assertions while running 'nodetool cleanup' on a
 Cassandra 1.1.4 node with Load=104G and 80m keys.
 From  system.log :

 ERROR [CompactionExecutor:576] 2012-09-10 11:25:50,265
 AbstractCassandraDaemon.java (line 134) Exception in thread
 Thread[CompactionExecutor:576,1,main]
 java.lang.AssertionError
 at
 org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
 at
 org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
 at
 org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
 at
 org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
 at
 org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:992)
 at
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
 at
 org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
 at
 org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

 After 3 hours the job is done and there are 11390 compaction tasks pending.
 My question: Can these assertions be ignored? Or do I need to worry about
 it?

 Thanks for your help and best regards,
 -Rudolf.



Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Rudolf van der Leeden

 Which version of Cassandra has your data been created initially with?
 A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
 and inter-level overlaps in CFs with Leveled Compaction. Your sstables
 generated with 1.1.3 and later should not have this issue [1] [2].
 In case you have old Leveled-compacted sstables (generated with 1.1.2
 or earlier. including 1.0.x) you need to run offline scrub using
 Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
 out-of-order sstables and inter-level overlaps caused by previous
 versions of LCS. You need to take nodes down in order to run offline
 scrub.


The data was orginally created on a 1.1.2 cluster with STCS (i.e. NOT
leveled compaction).
After the upgrade to 1.1.4 we changed from STCS to LCS w/o problems.
Then we ran more tests and created more and very big keys with millions of
columns.
The assertion only shows up with one particular CF containing these big
keys.
So, from your explanation, I don't think an offline scrub will help.

Thanks,
-Rudolf.


Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Omid Aladini
Could you, as Aaron suggested, open a ticket?

-- Omid

On Tue, Sep 11, 2012 at 2:35 PM, Rudolf van der Leeden
rudolf.vanderlee...@scoreloop.com wrote:
 Which version of Cassandra has your data been created initially with?
 A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
 and inter-level overlaps in CFs with Leveled Compaction. Your sstables
 generated with 1.1.3 and later should not have this issue [1] [2].
 In case you have old Leveled-compacted sstables (generated with 1.1.2
 or earlier. including 1.0.x) you need to run offline scrub using
 Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
 out-of-order sstables and inter-level overlaps caused by previous
 versions of LCS. You need to take nodes down in order to run offline
 scrub.


 The data was orginally created on a 1.1.2 cluster with STCS (i.e. NOT
 leveled compaction).
 After the upgrade to 1.1.4 we changed from STCS to LCS w/o problems.
 Then we ran more tests and created more and very big keys with millions of
 columns.
 The assertion only shows up with one particular CF containing these big
 keys.
 So, from your explanation, I don't think an offline scrub will help.

 Thanks,
 -Rudolf.



Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Rudolf van der Leeden
 Could you, as Aaron suggested, open a ticket?


Done:  https://issues.apache.org/jira/browse/CASSANDRA-4644


Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Janne Jalkanen

 A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
 and inter-level overlaps in CFs with Leveled Compaction. Your sstables
 generated with 1.1.3 and later should not have this issue [1] [2].

Does this mean that LCS on 1.0.x should be considered unsafe to use? I'm using 
them for semi-wide frequently-updated CounterColumns and they're performing 
much better on LCS than on STCS.

 In case you have old Leveled-compacted sstables (generated with 1.1.2
 or earlier. including 1.0.x) you need to run offline scrub using
 Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
 out-of-order sstables and inter-level overlaps caused by previous
 versions of LCS. You need to take nodes down in order to run offline
 scrub.

The  1.1.5 README does not mention this. Should it?

/Janne



Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Omid Aladini
On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen
janne.jalka...@ecyrd.com wrote:

 A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
 and inter-level overlaps in CFs with Leveled Compaction. Your sstables
 generated with 1.1.3 and later should not have this issue [1] [2].

 Does this mean that LCS on 1.0.x should be considered unsafe to
 use? I'm using them for semi-wide frequently-updated CounterColumns
 and they're performing much better on LCS than on STCS.

That's true. Unsafe in the sense that your data might not be in the
right shape with respect to order of keys in sstables and LCS's
properties and you might need to offline-scrub when you upgrade to the
latest 1.1.x.

 In case you have old Leveled-compacted sstables (generated with 1.1.2
 or earlier. including 1.0.x) you need to run offline scrub using
 Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
 out-of-order sstables and inter-level overlaps caused by previous
 versions of LCS. You need to take nodes down in order to run offline
 scrub.

 The  1.1.5 README does not mention this. Should it?

The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and
I agree it would be helpful to have it on NEWS.txt.

Cheers,
Omid

 /Janne



Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Mikhail Panchenko
Based on the steps outlined here
https://issues.apache.org/jira/browse/CASSANDRA-4644?focusedCommentId=13453156page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13453156it
seems that LCS was not used until after 1.1.4 and they were able to do
a
full repair cleanup compact cycle on 1.1.4 before running into problems.

I don't see any major bugfixes for LCS in 1.1.5 either, so this appears to
be a legitimate bug if the timeline is correct.

On Tue, Sep 11, 2012 at 2:50 PM, Omid Aladini omidalad...@gmail.com wrote:

 On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen
 janne.jalka...@ecyrd.com wrote:
 
  A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
  and inter-level overlaps in CFs with Leveled Compaction. Your sstables
  generated with 1.1.3 and later should not have this issue [1] [2].
 
  Does this mean that LCS on 1.0.x should be considered unsafe to
  use? I'm using them for semi-wide frequently-updated CounterColumns
  and they're performing much better on LCS than on STCS.

 That's true. Unsafe in the sense that your data might not be in the
 right shape with respect to order of keys in sstables and LCS's
 properties and you might need to offline-scrub when you upgrade to the
 latest 1.1.x.

  In case you have old Leveled-compacted sstables (generated with 1.1.2
  or earlier. including 1.0.x) you need to run offline scrub using
  Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
  out-of-order sstables and inter-level overlaps caused by previous
  versions of LCS. You need to take nodes down in order to run offline
  scrub.
 
  The  1.1.5 README does not mention this. Should it?

 The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and
 I agree it would be helpful to have it on NEWS.txt.

 Cheers,
 Omid

  /Janne
 



Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-10 Thread Rudolf van der Leeden
Hi,

I'm getting 5 identical assertions while running 'nodetool cleanup' on a
Cassandra 1.1.4 node with Load=104G and 80m keys.
From  system.log :

ERROR [CompactionExecutor:576] 2012-09-10 11:25:50,265
AbstractCassandraDaemon.java (line 134) Exception in thread
Thread[CompactionExecutor:576,1,main]
java.lang.AssertionError
at
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
at
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
at
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
at
org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
at
org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:992)
at
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
at
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

After 3 hours the job is done and there are 11390 compaction tasks pending.
My question: Can these assertions be ignored? Or do I need to worry about
it?

Thanks for your help and best regards,
-Rudolf.


Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-10 Thread aaron morton
 My question: Can these assertions be ignored? Or do I need to worry about it?
That looks like a problem.

Can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA ?

May be good to include information on:
* how long you've been using Levelled Compaction.
* Is this all CF's or just one ?
* If you can identify the CF can you include the .json file that is kept on 
disk. It contains information about levelled compaction. 

Cheers
  
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 11/09/2012, at 4:37 AM, Rudolf van der Leeden 
rudolf.vanderlee...@scoreloop.com wrote:

 Hi,
 
 I'm getting 5 identical assertions while running 'nodetool cleanup' on a 
 Cassandra 1.1.4 node with Load=104G and 80m keys.
 From  system.log :
 
 ERROR [CompactionExecutor:576] 2012-09-10 11:25:50,265 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:576,1,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
 at 
 org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
 at 
 org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
 at 
 org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:992)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
 at 
 org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 
 After 3 hours the job is done and there are 11390 compaction tasks pending.
 My question: Can these assertions be ignored? Or do I need to worry about it?
 
 Thanks for your help and best regards,
 -Rudolf.