> 1) What did I wrong? - why cassandra was throwing exceptions on first startup? In 1.0.X the history of schema changes was replayed to the node when it rejoined the cluster. If the node is receiving traffic while this is going on it will log those errors until the schema mutation that created 1012 is replayed.
> 2) Why the keyspace data was invalidated ? Is it expected? The data will have remained on the disk. The load is calculated based on the CF's in the schema, this can mean that the load will not return to full until the schema is fully replayed. Did you lose data ? > 3) If answer to #2 is "yes it's expected" then that's the point in doing > http://wiki.apache.org/cassandra/FAQ#schema_disagreement > then all keyspace data is lost anyway? It makes more sense to just do > http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node Answer as no. Checking, did you delete just the Schema-* and Migration-* files or all of the files in data/system? Also in the first log there is a log of commit log mutation being skipped because the schema is not there. Drain should have removed these, but it can take a little time (I think). > 4) afaiu i could also stop cassandra again move old sstables from snapshot > back to keyspace data dir and run repair for all keyspace CFs? So that it > finishes faster > and makes less load than running a repair which has no previous keyspace data > at all? The approach you followed was the correct one. I've updated the wiki to say the errors are expected. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/05/2012, at 6:34 AM, Piavlo wrote: > Hi, > > I had a schema disagreement problem in cassandra 1.0.9 cluster, where one > node had different schema version. > So I followed the faq at > http://wiki.apache.org/cassandra/FAQ#schema_disagreement > disabled gossip, disabled thrift, drained and finally stopped the cassandra > process, on startup > noticed > INFO [main] 2012-05-18 16:23:11,879 DatabaseDescriptor.java (line 467) > Couldn't detect any schema definitions in local storage. > in the log, and after > INFO [main] 2012-05-18 16:23:15,463 StorageService.java (line 619) > Bootstrap/Replace/Move completed! Now serving reads. > it started throwing Fatal exceptions for all read/write operations endlessly. > > I had to stop cassandra process again(no draining was done) > > On second start it did came up ok immediately loading the correct cluster > schema version > INFO [main] 2012-05-18 16:54:44,303 DatabaseDescriptor.java (line 499) > Loading schema version 9db34ef0-a0be-11e1-0000-f9687e034cf7 > > But now this node appears to have started with no data from keyspace which > had schema disagreement. > The original keyspace sstables now appear under snapshots dir. > > # nodetool -h localhost ring > Address DC Rack Status State Load Owns > Token > > 141784319550391026443072753096570088106 > 10.49.127.4 eu-west 1a Up Normal 8.19 GB 16.67% > 0 > 10.241.29.65 eu-west 1b Up Normal 8.18 GB 16.67% > 28356863910078205288614550619314017621 > 10.59.46.236 eu-west 1c Up Normal 8.22 GB 16.67% > 56713727820156410577229101238628035242 > 10.50.33.232 eu-west 1a Up Normal 8.2 GB 16.67% > 85070591730234615865843651857942052864 > 10.234.71.33 eu-west 1b Up Normal 8.15 GB 16.67% > 113427455640312821154458202477256070485 > 10.58.249.118 eu-west 1c Up Normal 660.98 MB 16.67% > 141784319550391026443072753096570088106 > # > > The node is the one with 660.98 MB data( which is opscenter keyspace data > which was not invalidated) > > So i have some questions: > > 1) What did I wrong? - why cassandra was throwing exceptions on first startup? > 2) Why the keyspace data was invalidated ? Is it expected? > 3) If answer to #2 is "yes it's expected" then that's the point in doing > http://wiki.apache.org/cassandra/FAQ#schema_disagreement > then all keyspace data is lost anyway? It makes more sense to just do > http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node > 4) afaiu i could also stop cassandra again move old sstables from snapshot > back to keyspace data dir and run repair for all keyspace CFs? So that it > finishes faster > and makes less load than running a repair which has no previous keyspace data > at all? > > The first startup log is below: > > INFO [main] 2012-05-18 16:23:07,367 AbstractCassandraDaemon.java (line 105) > Logging initialized > INFO [main] 2012-05-18 16:23:07,382 AbstractCassandraDaemon.java (line 126) > JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_24 > INFO [main] 2012-05-18 16:23:07,383 AbstractCassandraDaemon.java (line 127) > Heap size: 2600468480/2600468480 > INFO [main] 2012-05-18 16:23:07,383 AbstractCassandraDaemon.java (line 128) > Classpath: > /etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/java/mx4j-tools.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-1.0.9.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-1.0.9.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-1.0.9.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.2.jar:/usr/share/cassandra/lib/guava-r08.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/joda-time-1.6.2.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.6.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra//lib/jamm-0.2.5.jar > INFO [main] 2012-05-18 16:23:10,661 CLibrary.java (line 109) JNA mlockall > successful > INFO [main] 2012-05-18 16:23:10,692 DatabaseDescriptor.java (line 114) > Loading settings from file:/etc/cassandra/ssa/cassandra.yaml > INFO [main] 2012-05-18 16:23:10,868 DatabaseDescriptor.java (line 168) > DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap > INFO [main] 2012-05-18 16:23:10,911 DatabaseDescriptor.java (line 233) Global > memtable threshold is enabled at 826MB > INFO [main] 2012-05-18 16:23:10,970 Ec2Snitch.java (line 63) EC2Snitch using > region: eu-west, zone: 1c. > INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,652 SSTableReader.java (line > 153) Opening /var/lib/cassandra/data/system/Versions-hc-5 (247 bytes) > INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,767 SSTableReader.java (line > 153) Opening /var/lib/cassandra/data/system/IndexInfo-hc-2 (128 bytes) > INFO [SSTableBatchOpen:2] 2012-05-18 16:23:11,768 SSTableReader.java (line > 153) Opening /var/lib/cassandra/data/system/IndexInfo-hc-3 (124 bytes) > INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,802 SSTableReader.java (line > 153) Opening /var/lib/cassandra/data/system/IndexInfo-hc-1 (124 bytes) > INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,825 SSTableReader.java (line > 153) Opening /var/lib/cassandra/data/system/LocationInfo-hc-35 (89 bytes) > INFO [SSTableBatchOpen:2] 2012-05-18 16:23:11,825 SSTableReader.java (line > 153) Opening /var/lib/cassandra/data/system/LocationInfo-hc-34 (558 bytes) > INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,835 SSTableReader.java (line > 153) Opening /var/lib/cassandra/data/system/HintsColumnFamily-hc-56 (235748 > bytes) > INFO [main] 2012-05-18 16:23:11,879 DatabaseDescriptor.java (line 467) > Couldn't detect any schema definitions in local storage. > INFO [main] 2012-05-18 16:23:11,880 DatabaseDescriptor.java (line 493) Found > table data in data directories. Consider using the CLI to define your schema. > INFO [main] 2012-05-18 16:23:11,914 CommitLogSegment.java (line 60) Creating > new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1337358191914.log > INFO [main] 2012-05-18 16:23:11,928 CommitLog.java (line 172) Replaying > /var/lib/cassandra/commitlog/CommitLog-1337356780099.log > INFO [main] 2012-05-18 16:23:15,237 CommitLog.java (line 355) Finished > reading /var/lib/cassandra/commitlog/CommitLog-1337356780099.log > INFO [main] 2012-05-18 16:23:15,238 CommitLog.java (line 360) Skipped 77886 > mutations from unknown (probably removed) CF with id 1016 > INFO [main] 2012-05-18 16:23:15,238 CommitLog.java (line 360) Skipped 37804 > mutations from unknown (probably removed) CF with id 1008 > INFO [main] 2012-05-18 16:23:15,238 CommitLog.java (line 360) Skipped 3 > mutations from unknown (probably removed) CF with id 1024 > INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 67759 > mutations from unknown (probably removed) CF with id 1009 > INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 51268 > mutations from unknown (probably removed) CF with id 1010 > INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 59319 > mutations from unknown (probably removed) CF with id 1011 > INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 61110 > mutations from unknown (probably removed) CF with id 1012 > INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 182908 > mutations from unknown (probably removed) CF with id 1013 > INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped 10038 > mutations from unknown (probably removed) CF with id 1014 > INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped 14783 > mutations from unknown (probably removed) CF with id 1001 > INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped 4256 > mutations from unknown (probably removed) CF with id 1000 > INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped 746 > mutations from unknown (probably removed) CF with id 1003 > INFO [main] 2012-05-18 16:23:15,241 CommitLog.java (line 360) Skipped 3013 > mutations from unknown (probably removed) CF with id 1002 > INFO [main] 2012-05-18 16:23:15,241 CommitLog.java (line 179) Log replay > complete, 0 replayed mutations > INFO [main] 2012-05-18 16:23:15,259 StorageService.java (line 400) Cassandra > version: 1.0.9 > INFO [main] 2012-05-18 16:23:15,260 StorageService.java (line 401) Thrift API > version: 19.20.0 > INFO [main] 2012-05-18 16:23:15,260 StorageService.java (line 414) Loading > persisted ring state > INFO [main] 2012-05-18 16:23:15,288 StorageService.java (line 494) Starting > up server gossip > INFO [main] 2012-05-18 16:23:15,292 ColumnFamilyStore.java (line 705) > Enqueuing flush of Memtable-LocationInfo@189237805(29/36 serialized/live > bytes, 1 ops) > INFO [FlushWriter:1] 2012-05-18 16:23:15,293 Memtable.java (line 246) Writing > Memtable-LocationInfo@189237805(29/36 serialized/live bytes, 1 ops) > INFO [FlushWriter:1] 2012-05-18 16:23:15,350 Memtable.java (line 283) > Completed flushing /var/lib/cassandra/data/system/LocationInfo-hc-36-Data.db > (80 bytes) > INFO [main] 2012-05-18 16:23:15,365 Ec2Snitch.java (line 113) Ec2Snitch > adding ApplicationState ec2region=eu-west ec2zone=1c > INFO [main] 2012-05-18 16:23:15,379 MessagingService.java (line 269) Starting > Messaging Service on port 7000 > INFO [main] 2012-05-18 16:23:15,399 StorageService.java (line 609) Using > saved token 141784319550391026443072753096570088106 > INFO [main] 2012-05-18 16:23:15,403 ColumnFamilyStore.java (line 705) > Enqueuing flush of Memtable-LocationInfo@559001765(53/66 serialized/live > bytes, 2 ops) > INFO [FlushWriter:1] 2012-05-18 16:23:15,404 Memtable.java (line 246) Writing > Memtable-LocationInfo@559001765(53/66 serialized/live bytes, 2 ops) > INFO [FlushWriter:1] 2012-05-18 16:23:15,451 Memtable.java (line 283) > Completed flushing /var/lib/cassandra/data/system/LocationInfo-hc-37-Data.db > (163 bytes) > INFO [main] 2012-05-18 16:23:15,454 StorageService.java (line 989) Node > dsc2c.internal/10.58.249.118 state jump to normal > INFO [CompactionExecutor:1] 2012-05-18 16:23:15,461 CompactionTask.java (line > 115) Compacting > [SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-37-Data.db'), > > SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-35-Data.db'), > > SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-34-Data.db'), > > SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-36-Data.db')] > INFO [main] 2012-05-18 16:23:15,463 StorageService.java (line 619) > Bootstrap/Replace/Move completed! Now serving reads. > INFO [CompactionExecutor:1] 2012-05-18 16:23:15,507 CompactionTask.java (line > 223) Compacted to > [/var/lib/cassandra/data/system/LocationInfo-hc-38-Data.db,]. 890 to 558 > (~62% of original) bytes for 4 keys at 0.015651MB/s. Time: 34ms. > ERROR [MutationStage:1] 2012-05-18 16:23:15,664 RowMutationVerbHandler.java > (line 61) Error in row mutation > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1012 > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126) > at > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401) > at > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409) > at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > ERROR [MutationStage:2] 2012-05-18 16:23:15,666 RowMutationVerbHandler.java > (line 61) Error in row mutation > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1010 > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126) > at > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401) > at > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409) > at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > ERROR [ReadStage:1] 2012-05-18 16:23:15,668 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[ReadStage:1,5,main] > java.lang.IllegalArgumentException: Unknown ColumnFamily UserData in keyspace > PRODUCTION > at org.apache.cassandra.config.Schema.getComparator(Schema.java:223) > at > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:320) > at > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:92) > at > org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44) > at > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:106) > at > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:74) > at > org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132) > at > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > ERROR [ReadStage:3] 2012-05-18 16:23:15,669 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[ReadStage:3,5,main] > java.lang.IllegalArgumentException: Unknown ColumnFamily UserFraudParameters > in keyspace PRODUCTION > at org.apache.cassandra.config.Schema.getComparator(Schema.java:223) > at > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:320) > at > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:92) > at > org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44) > at > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:106) > at > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:74) > at > org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132) > at > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > ... > and so endlessly on > > > Thanks > Alex > > >
