Re: Failed migration from 1.1.6 to 1.2.2

Alain RODRIGUEZ Thu, 14 Mar 2013 05:32:24 -0700

We have it set to 0.0.0.0 but anyway, as told before, I don't think our
problem come from this bug.



2013/3/14 Michal Michalski <mich...@opera.com>

>
>  It will happen if your rpc_address is set to 0.0.0.0.
>>
>
> Ops, it's not what I meant ;-)
> It will happen, if your rpc_address is set to IP that is not defined in
> your cluster's config (e.g. in cassandra-topology.properties for
> PropertyFileSnitch)
>
>
> M.
>
>
>> M.
>>
>> W dniu 14.03.2013 13:03, Alain RODRIGUEZ pisze:
>>
>>> Thanks for this pointer but I don't think this is the source of our
>>> problem
>>> since we use 1 data center and Ec2Snitch.
>>>
>>>
>>>
>>> 2013/3/14 Jean-Armel Luce <jaluc...@gmail.com>
>>>
>>>  Hi Alain,
>>>>
>>>> Maybe it is due to https://issues.apache.org/**
>>>> jira/browse/CASSANDRA-5299<https://issues.apache.org/jira/browse/CASSANDRA-5299>
>>>>
>>>> A patch is provided with this ticket.
>>>>
>>>> Regards.
>>>>
>>>> Jean Armel
>>>>
>>>>
>>>> 2013/3/14 Alain RODRIGUEZ <arodr...@gmail.com>
>>>>
>>>>  Hi
>>>>>
>>>>> We just tried to migrate our production cluster from C* 1.1.6 to 1.2.2.
>>>>>
>>>>> This has been a disaster. I just switch one node to 1.2.2, updated its
>>>>> configuration (cassandra.yaml / cassandra-env.sh) and restart it.
>>>>>
>>>>> It resulted on error on all the 5 remaining 1.1.6 nodes :
>>>>>
>>>>> ERROR [RequestResponseStage:2] 2013-03-14 09:53:25,750
>>>>> AbstractCassandraDaemon.java (line 135) Exception in thread
>>>>> Thread[RequestResponseStage:2,**5,main]
>>>>> java.io.IOError: java.io.EOFException
>>>>>          at
>>>>> org.apache.cassandra.service.**AbstractRowResolver.**preprocess(**
>>>>> AbstractRowResolver.java:71)
>>>>>
>>>>>          at
>>>>> org.apache.cassandra.service.**ReadCallback.response(**
>>>>> ReadCallback.java:155)
>>>>>
>>>>>          at
>>>>> org.apache.cassandra.net.**ResponseVerbHandler.doVerb(**
>>>>> ResponseVerbHandler.java:45)
>>>>>
>>>>>          at
>>>>> org.apache.cassandra.net.**MessageDeliveryTask.run(**
>>>>> MessageDeliveryTask.java:59)
>>>>>
>>>>>          at
>>>>> java.util.concurrent.**ThreadPoolExecutor$Worker.**
>>>>> runTask(ThreadPoolExecutor.**java:886)
>>>>>
>>>>>          at
>>>>> java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
>>>>> ThreadPoolExecutor.java:908)
>>>>>
>>>>>          at java.lang.Thread.run(Thread.**java:662)
>>>>> Caused by: java.io.EOFException
>>>>>          at java.io.DataInputStream.**readFully(DataInputStream.**
>>>>> java:180)
>>>>>          at
>>>>> org.apache.cassandra.db.**ReadResponseSerializer.**
>>>>> deserialize(ReadResponse.java:**100)
>>>>>
>>>>>          at
>>>>> org.apache.cassandra.db.**ReadResponseSerializer.**
>>>>> deserialize(ReadResponse.java:**81)
>>>>>
>>>>>          at
>>>>> org.apache.cassandra.service.**AbstractRowResolver.**preprocess(**
>>>>> AbstractRowResolver.java:64)
>>>>>
>>>>>          ... 6 more
>>>>>
>>>>> I had this a lot of times, and my entire cluster wasn't reachable by
>>>>> our
>>>>> 4 clients (phpCassa, Hector, Cassie, Helenus)
>>>>>
>>>>> I decommissioned the 1.2.2 node to get our cluster answering
>>>>> queries. It
>>>>> worked.
>>>>>
>>>>> Then I tried to replace this node by a new C*1.1.6 one with the same
>>>>> token as the previous node decommissioned. The node joined the ring and
>>>>> before getting any data switch to normal status.
>>>>>
>>>>> In all the other nodes I had :
>>>>>
>>>>> ERROR [MutationStage:8] 2013-03-14 10:21:01,288
>>>>> AbstractCassandraDaemon.java (line 135) Exception in thread
>>>>> Thread[MutationStage:8,5,main]
>>>>> java.lang.AssertionError
>>>>>          at
>>>>> org.apache.cassandra.locator.**TokenMetadata.getToken(**
>>>>> TokenMetadata.java:304)
>>>>>
>>>>>          at
>>>>> org.apache.cassandra.service.**StorageProxy$5.runMayThrow(**
>>>>> StorageProxy.java:371)
>>>>>
>>>>>          at
>>>>> org.apache.cassandra.utils.**WrappedRunnable.run(**
>>>>> WrappedRunnable.java:30)
>>>>>          at
>>>>> java.util.concurrent.**Executors$RunnableAdapter.**
>>>>> call(Executors.java:439)
>>>>>          at
>>>>> java.util.concurrent.**FutureTask$Sync.innerRun(**FutureTask.java:303)
>>>>>          at java.util.concurrent.**FutureTask.run(FutureTask.**
>>>>> java:138)
>>>>>          at
>>>>> java.util.concurrent.**ThreadPoolExecutor$Worker.**
>>>>> runTask(ThreadPoolExecutor.**java:886)
>>>>>
>>>>>          at
>>>>> java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
>>>>> ThreadPoolExecutor.java:908)
>>>>>
>>>>>          at java.lang.Thread.run(Thread.**java:662)
>>>>>
>>>>> So I decommissioned this new 1.1.6 node and we are now running with 5
>>>>> servers, not balanced along the ring, without any possibility of adding
>>>>> nodes, nor upgradinc C* version.
>>>>>
>>>>> We are quite desperate over here.
>>>>>
>>>>> If someone has any idea of what could happened and how to stabilize the
>>>>> cluster, it will be very appreciated.
>>>>>
>>>>> It's quite an emergency since we can't add nodes and are under heavy
>>>>> load.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Failed migration from 1.1.6 to 1.2.2

Reply via email to