Re: Simulating a failed node

Watanabe Maki Sat, 27 Oct 2012 21:37:45 -0700

What RF and CL are you using?


On 2012/10/28, at 13:13, Andrew Bialecki <andrew.biale...@gmail.com> wrote:

> Hey everyone,
> 
> I'm trying to simulate what happens when a node goes down to make sure my 
> cluster can gracefully handle node failures. For my setup I have a 3 node 
> cluster running 1.1.5. I'm then using the stress tool included in 1.1.5 
> coming from an external server and running it with the following arguments:
> 
> tools/bin/cassandra-stress -d <server1>,<server2>,<server3> -n 1000000
> 
> I start up the stress test and then down one of the nodes. The stress test 
> instantly fails with the following errors (which of course are the same error 
> from different threads) looking like:
> 
>           ...
> Operation [158320] retried 10 times - error inserting key 0158320 
> ((UnavailableException))
> Operation [158429] retried 10 times - error inserting key 0158429 
> ((UnavailableException))
> Operation [158439] retried 10 times - error inserting key 0158439 
> ((UnavailableException))
> Operation [158470] retried 10 times - error inserting key 0158470 
> ((UnavailableException))
> 158534,0,0,NaN,43
> FAILURE
> 
> I'm sure my naive setup is flawed in some way, but what I was hoping for was 
> when the node went down it would fail to write to the downed node and instead 
> write to one of the other nodes in the clusters. So question is why are 
> writes failing even after a retry? It might be the stress client doesn't pool 
> connections (I took a quick look, but might've not looked deeply enough), 
> however I also tried only specifying the first two server nodes and then 
> downing the third with the same failure.
> 
> Thanks in advance.
> 
> Andrew

Re: Simulating a failed node

Reply via email to