Re: Failed migration from 1.1.6 to 1.2.2

Alain RODRIGUEZ Thu, 14 Mar 2013 05:24:46 -0700

Well it seems I have nothing like this when I run a $grep "Unknown host"
/var/log/cassandra/system.log.


This issue was reported in 1.2.1 and commited to the trunk. It may have
been fixed in 1.2.2 even if I can't see the release version from the jira
nor can I see it in the changelog.

Thanks again even if I am still in troubles.


2013/3/14 Michal Michalski <mich...@opera.com>

> Just to make it clear: This bug will occur on single-DC configuration too.
>
> In our case it resulted in Exception like this at the very end of node
> startup:
>
> ERROR [WRITE-/<SOME-IP>] 2013-02-27 12:14:55,433 CassandraDaemon.java
> (line 133) Exception in thread Thread[WRITE-/<SOME-IP>,5,**main]
> java.lang.RuntimeException: Unknown host /0.0.0.0 with no default
> configured
>
> It will happen if your rpc_address is set to 0.0.0.0.
>
> M.
>
> W dniu 14.03.2013 13:03, Alain RODRIGUEZ pisze:
>
>  Thanks for this pointer but I don't think this is the source of our
>> problem
>> since we use 1 data center and Ec2Snitch.
>>
>>
>>
>> 2013/3/14 Jean-Armel Luce <jaluc...@gmail.com>
>>
>>  Hi Alain,
>>>
>>> Maybe it is due to https://issues.apache.org/**
>>> jira/browse/CASSANDRA-5299<https://issues.apache.org/jira/browse/CASSANDRA-5299>
>>>
>>> A patch is provided with this ticket.
>>>
>>> Regards.
>>>
>>> Jean Armel
>>>
>>>
>>> 2013/3/14 Alain RODRIGUEZ <arodr...@gmail.com>
>>>
>>>  Hi
>>>>
>>>> We just tried to migrate our production cluster from C* 1.1.6 to 1.2.2.
>>>>
>>>> This has been a disaster. I just switch one node to 1.2.2, updated its
>>>> configuration (cassandra.yaml / cassandra-env.sh) and restart it.
>>>>
>>>> It resulted on error on all the 5 remaining 1.1.6 nodes :
>>>>
>>>> ERROR [RequestResponseStage:2] 2013-03-14 09:53:25,750
>>>> AbstractCassandraDaemon.java (line 135) Exception in thread
>>>> Thread[RequestResponseStage:2,**5,main]
>>>> java.io.IOError: java.io.EOFException
>>>>          at
>>>> org.apache.cassandra.service.**AbstractRowResolver.**preprocess(**
>>>> AbstractRowResolver.java:71)
>>>>          at
>>>> org.apache.cassandra.service.**ReadCallback.response(**
>>>> ReadCallback.java:155)
>>>>          at
>>>> org.apache.cassandra.net.**ResponseVerbHandler.doVerb(**
>>>> ResponseVerbHandler.java:45)
>>>>          at
>>>> org.apache.cassandra.net.**MessageDeliveryTask.run(**
>>>> MessageDeliveryTask.java:59)
>>>>          at
>>>> java.util.concurrent.**ThreadPoolExecutor$Worker.**
>>>> runTask(ThreadPoolExecutor.**java:886)
>>>>          at
>>>> java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
>>>> ThreadPoolExecutor.java:908)
>>>>          at java.lang.Thread.run(Thread.**java:662)
>>>> Caused by: java.io.EOFException
>>>>          at java.io.DataInputStream.**readFully(DataInputStream.**
>>>> java:180)
>>>>          at
>>>> org.apache.cassandra.db.**ReadResponseSerializer.**
>>>> deserialize(ReadResponse.java:**100)
>>>>          at
>>>> org.apache.cassandra.db.**ReadResponseSerializer.**
>>>> deserialize(ReadResponse.java:**81)
>>>>          at
>>>> org.apache.cassandra.service.**AbstractRowResolver.**preprocess(**
>>>> AbstractRowResolver.java:64)
>>>>          ... 6 more
>>>>
>>>> I had this a lot of times, and my entire cluster wasn't reachable by our
>>>> 4 clients (phpCassa, Hector, Cassie, Helenus)
>>>>
>>>> I decommissioned the 1.2.2 node to get our cluster answering queries. It
>>>> worked.
>>>>
>>>> Then I tried to replace this node by a new C*1.1.6 one with the same
>>>> token as the previous node decommissioned. The node joined the ring and
>>>> before getting any data switch to normal status.
>>>>
>>>> In all the other nodes I had :
>>>>
>>>> ERROR [MutationStage:8] 2013-03-14 10:21:01,288
>>>> AbstractCassandraDaemon.java (line 135) Exception in thread
>>>> Thread[MutationStage:8,5,main]
>>>> java.lang.AssertionError
>>>>          at
>>>> org.apache.cassandra.locator.**TokenMetadata.getToken(**
>>>> TokenMetadata.java:304)
>>>>          at
>>>> org.apache.cassandra.service.**StorageProxy$5.runMayThrow(**
>>>> StorageProxy.java:371)
>>>>          at
>>>> org.apache.cassandra.utils.**WrappedRunnable.run(**
>>>> WrappedRunnable.java:30)
>>>>          at
>>>> java.util.concurrent.**Executors$RunnableAdapter.**
>>>> call(Executors.java:439)
>>>>          at
>>>> java.util.concurrent.**FutureTask$Sync.innerRun(**FutureTask.java:303)
>>>>          at java.util.concurrent.**FutureTask.run(FutureTask.**
>>>> java:138)
>>>>          at
>>>> java.util.concurrent.**ThreadPoolExecutor$Worker.**
>>>> runTask(ThreadPoolExecutor.**java:886)
>>>>          at
>>>> java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
>>>> ThreadPoolExecutor.java:908)
>>>>          at java.lang.Thread.run(Thread.**java:662)
>>>>
>>>> So I decommissioned this new 1.1.6 node and we are now running with 5
>>>> servers, not balanced along the ring, without any possibility of adding
>>>> nodes, nor upgradinc C* version.
>>>>
>>>> We are quite desperate over here.
>>>>
>>>> If someone has any idea of what could happened and how to stabilize the
>>>> cluster, it will be very appreciated.
>>>>
>>>> It's quite an emergency since we can't add nodes and are under heavy
>>>> load.
>>>>
>>>>
>>>>
>>>
>>
>

Re: Failed migration from 1.1.6 to 1.2.2

Reply via email to