Re: nodetool repair fails after expansion

Dave Cowen Fri, 04 Oct 2013 14:30:30 -0700

I should clarify that we are running Cassandra 1.1.12.

Dave



On Fri, Oct 4, 2013 at 2:08 PM, Dave Cowen <[email protected]> wrote:

> We're testing expanding a 4-node cluster into an 8-node cluster, and we
> keep running into issues with the repair process near the end.
>
> We're bringing up nodes 1-by-1 into the cluster, retokening nodes for an
> 8-node configuration, running nodetool cleanup on the nodes after each
> retokening, and then increasing the replication factor to 5. This all works
> without issue, and the cluster appears to be healthy in that 8-node
> configuration with a replication factor of 5.
>
> However, when we then run nodetool repair on the nodes, it will at some
> point stall, even when being run on one of the new nodes.
>
> It doesn't appear to stall while it's performing a compaction or
> transferring CF data. We've monitored compactionstats and netstats closely,
> and things always stall when a repair command is started, ie:
>
> [2013-10-02 23:19:39,254] Starting repair command #9, repairing 5 ranges
> for keyspace ourkeyspace
>
> The last message from AntiEntropyService is usually something to the
> effect of:
>
> <190>Oct  3 00:01:02 myhost.com 1970947950 [AntiEntropySessions:24] INFO
>  org.apache.cassandra.service.AntiEntropyService  - [repair
> #9b17d310-2bbd-11e3-0000-e06ec6c436ff] session completed successfully
>
> ... and then things don't start for the next repair. Nothing in the logs
> that looks related.
>
> Where this occurs is arbitrary. If I run on individual CFs within
> ourkeyspace, some will succeed, and some will fail, but if we start over
> and do the 4-node to 8-node expansion again, things will fail at a
> different place.
>
> Advice as to what to look at next?
>
> Thanks,
>
> Dave
>

Re: nodetool repair fails after expansion

Reply via email to