Hi, We are currently implementing Solr cloud and as part of this effort we are investigating, which failure modes may happen between Solr and Zookeeper.
We have found quite a lot articles describing the "happy path" failure, when ZK stops (loses majority) and the Solr Cluster ceases to serve write requests (& read continues to work as expected). Once ZK cluster is reconciled and majority achieved again, everything continues working as expected. What we have not been able to find is what happens when ZK cluster catastrophically fails and loses its data. Either completely (scenario A) or is restarted from backup (scenario B). So now the questions: 1) Scenario A - Is existing Solr Cloud cluster able to start against a clean Zookeeper and reconstruct all the ZK data from its internal state (using some king of emergency recovery; it may take long)? 2) Scenario B - What is the worst case backup/restore scenario? For example when a. ZK is backed up b. Cluster performs some transition between states "X -> Y" (such as commit shard, elect new leader etc.) c. ZK fails completely d. ZK is restored from backup created in step a e. Solr Cloud is in state "Y", while ZK is in state "X" Thanks in advance, Pavel