Re: Error when bringing up nodes during failure testing

2011-03-08 Thread aaron morton
It looks like the node is sending out it application state and waiting the required time after which it expects to know about all other nodes in the cluster. INFO [main] 2011-03-07 17:04:06,660 StorageService.java (line 399) Joining: sleeping 3 ms for pending range setup For some reason

Re: Error when bringing up nodes during failure testing

2011-03-08 Thread Jonathan Ellis
Is he trying to bootstrap? What does that have to do with failure recovery? Doesn't make sense to me. On Tue, Mar 8, 2011 at 2:33 AM, aaron morton aa...@thelastpickle.com wrote: It looks like the node is sending out it application state and waiting the required time after which it expects to

Re: Error when bringing up nodes during failure testing

2011-03-08 Thread mcasandra
I turned the auto_bootstrap off and it worked fine. I don't think it's connectivity issue or network issue at all. I am very confused about what's going on here. Can you please let me know if this a bug that I am facing? Also, what are the disadvantage of turning off auto bootstrap? Do I need to

Re: Error when bringing up nodes during failure testing

2011-03-08 Thread Peter Schuller
Also, what are the disadvantage of turning off auto bootstrap? Do I need to do anything after the fact? Inserting a new node into a ring without auto_bootstrap implies that it will join the ring, but will not contain any data for which it is supposedly responsible. A 'nodetool repair' should

Re: Error when bringing up nodes during failure testing

2011-03-08 Thread Peter Schuller
2) When I brought 2 nodes down (out of 3), I was able to start one node (with 66 % load below) even though auto_bootstrap is set to true. Shouldn't it have failed for the same reason? This is a good point/question. As far as I can tell, a node being bootstrapped would need to receive data from

Re: Error when bringing up nodes during failure testing

2011-03-08 Thread mcasandra
I am as clear as mud with what is happening here :) But with some suggestions I can try to start my test from scratch and post results in that order. -- View this message in context:

Re: Error when bringing up nodes during failure testing

2011-03-07 Thread aaron morton
It's failing because when the node bootstraps it does not know about enough nodes to support the RF... replication factor (3) exceeds number of endpoints (2) I *think* the normal work around is to disable autobootstrap, bring the nodes up then run nodetool join or StorageService.joinRing()

Re: Error when bringing up nodes during failure testing

2011-03-07 Thread aaron morton
1) yes 2) um, not sure. The nodetool output below looks like there are only 2 nodes in that cluster, i.e. there are no down nodes. Aaron On 8/03/2011, at 2:11 PM, mcasandra wrote: aaron morton wrote: It's failing because when the node bootstraps it does not know about enough nodes to

Re: Error when bringing up nodes during failure testing

2011-03-07 Thread mcasandra
aaron morton wrote: 2) um, not sure. The nodetool output below looks like there are only 2 nodes in that cluster, i.e. there are no down nodes. There are actually 3 nodes. Not sure why it's not showing the other node in the output which is currently down. The error I am getting is from the

Re: Error when bringing up nodes during failure testing

2011-03-05 Thread aaron morton
) at org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:204) -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-nodes-during-failure-testing

Re: Error when bringing up nodes during failure testing

2011-03-05 Thread mcasandra
aaron morton wrote: Can you include the full error stack ? It's failing because of the reason stated. But I need some more info to understand what part of the startup process it's stuck at. Thanks for responding! I'll send it as soon as I can get on my network. But you mentioned that