What RF and CL are you using?
On 2012/10/28, at 13:13, Andrew Bialecki <andrew.biale...@gmail.com> wrote: > Hey everyone, > > I'm trying to simulate what happens when a node goes down to make sure my > cluster can gracefully handle node failures. For my setup I have a 3 node > cluster running 1.1.5. I'm then using the stress tool included in 1.1.5 > coming from an external server and running it with the following arguments: > > tools/bin/cassandra-stress -d <server1>,<server2>,<server3> -n 1000000 > > I start up the stress test and then down one of the nodes. The stress test > instantly fails with the following errors (which of course are the same error > from different threads) looking like: > > ... > Operation [158320] retried 10 times - error inserting key 0158320 > ((UnavailableException)) > Operation [158429] retried 10 times - error inserting key 0158429 > ((UnavailableException)) > Operation [158439] retried 10 times - error inserting key 0158439 > ((UnavailableException)) > Operation [158470] retried 10 times - error inserting key 0158470 > ((UnavailableException)) > 158534,0,0,NaN,43 > FAILURE > > I'm sure my naive setup is flawed in some way, but what I was hoping for was > when the node went down it would fail to write to the downed node and instead > write to one of the other nodes in the clusters. So question is why are > writes failing even after a retry? It might be the stress client doesn't pool > connections (I took a quick look, but might've not looked deeply enough), > however I also tried only specifying the first two server nodes and then > downing the third with the same failure. > > Thanks in advance. > > Andrew