> > > Short answer, yes it's safe to kill cassandra during a repair. It's one of > the nice things about never mutating data. > > Longer answer: If nodetool compactionstats says there are no Validation > compactions running (and the compaction queue is empty) and netstats says > there is nothing streaming there is a a good chance the repair is finished > or dead. If a neighbour dies during a repair the node it was started on will > wait for 48 hours(?) until it times out. Check the logs on the machines for > errors, particularly from the AntiEntropyService. And see what > compactionstats is saying on all the nodes involved in the repair. > > Thanks Aaron. One of the neighboring nodes did go down due to running out of memory so I will make sure the repair is dead and start it again per column family.
Even Longer: um, 3 TB of data is *way* to much data per node, generally > happy people have up to about 200 to 300GB per node. The reason for this > recommendation is so that things like repair, compaction, node moves, etc > are managable and because the loss of a single node has less of an impact. > I would not recommend running a live system with that much data per node. > > Thanks for the advice and this can be a separate discussion but that will make a Cassandra cluster way too costly , we would have to buy 16 systems for the same amount of data as opposed to 4 that we have now and my IT director will strangle me. -Adi