| That would work, but until CASSANDRA-6961 [1] there is no way to prevent this node from having a long window where it may serve stale | reads at CLs below QUORUM, until the rebuild completes.
Thanks Robert, this makes perfect sense. Do you know if CASSANDRA-6961 will be ported to 1.2.x ? And apologies if these appear to be dumb questions, but is a repair more suitable than a rebuild because the rebuild only contacts 1 replica (per range), which may itself contain stale data ? Thanks Matt On Wed, Jun 4, 2014 at 11:03 AM, Robert Coli <rc...@eventbrite.com> wrote: > On Tue, Jun 3, 2014 at 3:48 PM, Matthew Allen <matthew.j.al...@gmail.com> > wrote: > >> Just out of curiosity, for a dead node, would it be possible to just >> >> - replace the node (no data in data/commit dirs), same IP Address, same >> hostname. >> - restore the cassandra.yaml (initial_token etc) >> - set auto_bootstrap:false >> - start it up and then run a nodetool rebuild ? >> >> Or would the Host ID value change with the new node ? >> > > That would work, but until CASSANDRA-6961 [1] there is no way to prevent > this node from having a long window where it may serve stale reads at CLs > below QUORUM, until the rebuild completes. > > "rebuild" gets you exactly one replica's worth of data, just like > bootstrap does. If you want to actually sync a node with all of its > replicas and RF>2, you want "repair" and not "rebuild." I wish "rebuild" > had been named something else, because people seem to think it does > something it doesn't do. This property of decreasing what I call "unique > replica count" is why people like me prefer to back up their nodes with > something like tablesnap [2], so that losing a node does not decrease the > "unique replica count." A simpler solution if you want to avoid the chance > of inconsistency is to operate with CL.QUORUM instead of CL.ONE. > > You'd be better off leaving auto_bootstrap set to true and setting > -Dcassandra.replace_address, which bootstraps you (from a single-replica > source per range) to the token owned by the dead node. This is exactly like > your process above, except that you don't serve stale reads while doing so. > > That said, the single-replica source thing is why people want to first > bootstrap (which does the same single-replica source thing as "rebuild" but > does not serve writes while it does so) and then repair and then, finally, > join the ring. Note that if writes are incoming, this does not actually > *close* the race window for stale reads at ONE, it just makes it much > shorter. > > =Rob > [1] https://issues.apache.org/jira/browse/CASSANDRA-6961 > [2] https://github.com/JeremyGrosser/tablesnap >