Hi Boris, Thanks for the suggestion, I didn't know there was one. I believe have finally figured it out and it turns out my last two questions are related.
First, my batch loading was ignoring a bunch of rows when reading the first file (so it took hundreds of potential mutations for the problem to show up) and secondly, the ReplicateOnWriteStage error was generated by the batch mutations themselves and explained the TimedOutException : I was doing multiple mutations on the same key in one batch 2011/8/8 Boris Yen <yulin...@gmail.com> > Maybe you could try to adjust the setting "cassandraThriftSocketTimeout" > of hector. https://github.com/rantav/hector/wiki/User-Guide > > > On Mon, Aug 8, 2011 at 6:54 AM, Philippe <watche...@gmail.com> wrote: > >> Quick followup. >> I have pushed the RPC timeout to 30s. Using Hector, I'm doing 1 thread >> doing batches of 10 mutates at a time so that's even slower than when I was >> doing 16 threads in parallel doing non-batched mutations. >> After a couple hundred execute() calls, I get a timeout for every node; I >> have a 15 second grace period between retries. tpstats indicate no pendings >> on any of the nodes. I never recover from that >> >> I then set the batch size to one and it seems to work a lot better. The >> only difference I note is that the Mutator.execute() method returns a result >> than sometimes has a null host and 0 microsecond time in the batch sizes of >> ten but never in batch sizes of 1. >> >> >> I'm stumped ! Any ideas ? >> >> Thanks >> > >