In a few words: Bootstrap one node at once Wait for bootstrap to complete Next node
More details: datastax.com/docs (C* 2.0) Before decommissioning: nodetool cleanup Don't forget to do repairs (one node at a time) - this should be a regular admin task -- Sent from my iPhone > Am 17.08.2014 um 15:46 schrieb Maxime <maxim...@gmail.com>: > > Is there some unwritten wisdom with regards to the use 'nodetool compact' > before bootstrapping new nodes and decommissioning old ones? > > TL;DR: > I've been spending the last few days trying to move a cluster on DigitalOcean > 2GB machines to 4GB machines (same provider). To do so I wanted to create the > new nodes, bootstrap them, then decommission the old ones (one by one seems > to be the only available option). > > The bootstrapping was failing, eventually I figured out it was somehow > related to the TombstoneOverwhelmingException on the new nodes. I issued a > 'nodetool compact' on the entire cluster to try to minimize the number of > Tombstones. Once that was done I was able to bootstrap all my new nodes. > > Now is the time to decommission. From the very first node I tried to > decommission I've been getting 1 node dying after an almost endless loop of > "GC for ConcurrentMarkSweep" showing the heap getting fuller and fuller until > the node dies. On one node I've been able to bump the MAX_HEAP_SIZE by 400MB > and get it to work (it was a 4GB node), but now I'm getting the same symptoms > on a 2GB node where the heap is as big as it can be before the OS running out > of RAM itself, so I can't expand the MAX_HEAP_SIZE. It would seem I have > really painted myself into a scrap-the-cluster kind of way. > > Not knowing the inner-workings of Cassandra's bootstrap and decommission > mechanisms means all I can do is make an educated guesses that perhaps doing > another 'nodetool compact' on the nodes I'm about to decommission might help. > However I have not found any wisdom or documentation on anything relating to > this, which I find surprising as I can't be the first to have had this > problem. > > BOTTOM LINE: > Does anyone have a real-world production process for efficiently and reliably > bootstrap and decommission nodes in a cluster? Seems it might look like > <compact all>, <bootstrap one-by-one>, <compact all>, <decommission > one-by-one (really?!?)>. Or are all my problems due to me running on > "hardware" that doesn't have resources (RAM,CPU) to spare in the first place? > > Thanks