We have it set to 0.0.0.0 but anyway, as told before, I don't think our problem come from this bug.
2013/3/14 Michal Michalski <mich...@opera.com> > > It will happen if your rpc_address is set to 0.0.0.0. >> > > Ops, it's not what I meant ;-) > It will happen, if your rpc_address is set to IP that is not defined in > your cluster's config (e.g. in cassandra-topology.properties for > PropertyFileSnitch) > > > M. > > >> M. >> >> W dniu 14.03.2013 13:03, Alain RODRIGUEZ pisze: >> >>> Thanks for this pointer but I don't think this is the source of our >>> problem >>> since we use 1 data center and Ec2Snitch. >>> >>> >>> >>> 2013/3/14 Jean-Armel Luce <jaluc...@gmail.com> >>> >>> Hi Alain, >>>> >>>> Maybe it is due to https://issues.apache.org/** >>>> jira/browse/CASSANDRA-5299<https://issues.apache.org/jira/browse/CASSANDRA-5299> >>>> >>>> A patch is provided with this ticket. >>>> >>>> Regards. >>>> >>>> Jean Armel >>>> >>>> >>>> 2013/3/14 Alain RODRIGUEZ <arodr...@gmail.com> >>>> >>>> Hi >>>>> >>>>> We just tried to migrate our production cluster from C* 1.1.6 to 1.2.2. >>>>> >>>>> This has been a disaster. I just switch one node to 1.2.2, updated its >>>>> configuration (cassandra.yaml / cassandra-env.sh) and restart it. >>>>> >>>>> It resulted on error on all the 5 remaining 1.1.6 nodes : >>>>> >>>>> ERROR [RequestResponseStage:2] 2013-03-14 09:53:25,750 >>>>> AbstractCassandraDaemon.java (line 135) Exception in thread >>>>> Thread[RequestResponseStage:2,**5,main] >>>>> java.io.IOError: java.io.EOFException >>>>> at >>>>> org.apache.cassandra.service.**AbstractRowResolver.**preprocess(** >>>>> AbstractRowResolver.java:71) >>>>> >>>>> at >>>>> org.apache.cassandra.service.**ReadCallback.response(** >>>>> ReadCallback.java:155) >>>>> >>>>> at >>>>> org.apache.cassandra.net.**ResponseVerbHandler.doVerb(** >>>>> ResponseVerbHandler.java:45) >>>>> >>>>> at >>>>> org.apache.cassandra.net.**MessageDeliveryTask.run(** >>>>> MessageDeliveryTask.java:59) >>>>> >>>>> at >>>>> java.util.concurrent.**ThreadPoolExecutor$Worker.** >>>>> runTask(ThreadPoolExecutor.**java:886) >>>>> >>>>> at >>>>> java.util.concurrent.**ThreadPoolExecutor$Worker.run(** >>>>> ThreadPoolExecutor.java:908) >>>>> >>>>> at java.lang.Thread.run(Thread.**java:662) >>>>> Caused by: java.io.EOFException >>>>> at java.io.DataInputStream.**readFully(DataInputStream.** >>>>> java:180) >>>>> at >>>>> org.apache.cassandra.db.**ReadResponseSerializer.** >>>>> deserialize(ReadResponse.java:**100) >>>>> >>>>> at >>>>> org.apache.cassandra.db.**ReadResponseSerializer.** >>>>> deserialize(ReadResponse.java:**81) >>>>> >>>>> at >>>>> org.apache.cassandra.service.**AbstractRowResolver.**preprocess(** >>>>> AbstractRowResolver.java:64) >>>>> >>>>> ... 6 more >>>>> >>>>> I had this a lot of times, and my entire cluster wasn't reachable by >>>>> our >>>>> 4 clients (phpCassa, Hector, Cassie, Helenus) >>>>> >>>>> I decommissioned the 1.2.2 node to get our cluster answering >>>>> queries. It >>>>> worked. >>>>> >>>>> Then I tried to replace this node by a new C*1.1.6 one with the same >>>>> token as the previous node decommissioned. The node joined the ring and >>>>> before getting any data switch to normal status. >>>>> >>>>> In all the other nodes I had : >>>>> >>>>> ERROR [MutationStage:8] 2013-03-14 10:21:01,288 >>>>> AbstractCassandraDaemon.java (line 135) Exception in thread >>>>> Thread[MutationStage:8,5,main] >>>>> java.lang.AssertionError >>>>> at >>>>> org.apache.cassandra.locator.**TokenMetadata.getToken(** >>>>> TokenMetadata.java:304) >>>>> >>>>> at >>>>> org.apache.cassandra.service.**StorageProxy$5.runMayThrow(** >>>>> StorageProxy.java:371) >>>>> >>>>> at >>>>> org.apache.cassandra.utils.**WrappedRunnable.run(** >>>>> WrappedRunnable.java:30) >>>>> at >>>>> java.util.concurrent.**Executors$RunnableAdapter.** >>>>> call(Executors.java:439) >>>>> at >>>>> java.util.concurrent.**FutureTask$Sync.innerRun(**FutureTask.java:303) >>>>> at java.util.concurrent.**FutureTask.run(FutureTask.** >>>>> java:138) >>>>> at >>>>> java.util.concurrent.**ThreadPoolExecutor$Worker.** >>>>> runTask(ThreadPoolExecutor.**java:886) >>>>> >>>>> at >>>>> java.util.concurrent.**ThreadPoolExecutor$Worker.run(** >>>>> ThreadPoolExecutor.java:908) >>>>> >>>>> at java.lang.Thread.run(Thread.**java:662) >>>>> >>>>> So I decommissioned this new 1.1.6 node and we are now running with 5 >>>>> servers, not balanced along the ring, without any possibility of adding >>>>> nodes, nor upgradinc C* version. >>>>> >>>>> We are quite desperate over here. >>>>> >>>>> If someone has any idea of what could happened and how to stabilize the >>>>> cluster, it will be very appreciated. >>>>> >>>>> It's quite an emergency since we can't add nodes and are under heavy >>>>> load. >>>>> >>>>> >>>>> >>>> >>> >