> 2) When I brought 2 nodes down (out of 3), I was able to start one node > (with 66 % load below) even though auto_bootstrap is set to true. Shouldn't > it have failed for the same reason?
This is a good point/question. As far as I can tell, a node being bootstrapped would need to receive data from a sufficient number of replicas to satisfy the maximum consistently level that the application(s) use, in order to avoid the potential for violating the consistency requirement expected by clients. Not knowing what the application expects, that would imply a quorum of nodes. I just checked the code, and my reading (untested) is that the intent is to receive data from all nodes responsible for the part of the ring that is being taken over. Meaning, it satisfies the above requirement. However, that reading is inconsistent with your test which suggests you were able to bootstrap with two nodes missing out of three. Is your nodetool output from the new node or the pre-existing online node? It only lists two nodes, rather than 3 or 4 (with some being Down). If the only remaining node doesn't know about the other two that are down, that may explain it. I may be mis-reading the code because it's suddenly unclear to me how this is supposed to work with respect to nodes being down (supposing it's truly down, forever, and needs to be replaced). Anyone? -- / Peter Schuller