Hi We just tried to migrate our production cluster from C* 1.1.6 to 1.2.2.
This has been a disaster. I just switch one node to 1.2.2, updated its configuration (cassandra.yaml / cassandra-env.sh) and restart it. It resulted on error on all the 5 remaining 1.1.6 nodes : ERROR [RequestResponseStage:2] 2013-03-14 09:53:25,750 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[RequestResponseStage:2,5,main] java.io.IOError: java.io.EOFException at org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71) at org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:155) at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100) at org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81) at org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64) ... 6 more I had this a lot of times, and my entire cluster wasn't reachable by our 4 clients (phpCassa, Hector, Cassie, Helenus) I decommissioned the 1.2.2 node to get our cluster answering queries. It worked. Then I tried to replace this node by a new C*1.1.6 one with the same token as the previous node decommissioned. The node joined the ring and before getting any data switch to normal status. In all the other nodes I had : ERROR [MutationStage:8] 2013-03-14 10:21:01,288 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[MutationStage:8,5,main] java.lang.AssertionError at org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:304) at org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:371) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) So I decommissioned this new 1.1.6 node and we are now running with 5 servers, not balanced along the ring, without any possibility of adding nodes, nor upgradinc C* version. We are quite desperate over here. If someone has any idea of what could happened and how to stabilize the cluster, it will be very appreciated. It's quite an emergency since we can't add nodes and are under heavy load.