Hi, There are no votes, no elections and there are no leader nodes.
CouchDB chooses availability over consistency and will accept reads/writes even if only one node (that hosts the shard ranges being read/written) is up. In a 3-node, 3-replica cluster, where every node hosts a copy of every shard, any single node can be up to allow all reads and writes to succeed. Every node in the cluster can coordinate a read or write. The coordinator creates N concurrent and independent read/write requests and sends them to the appropriate nodes (that the shard map indicates for that document id). The coordinator waits for a quorum of replies before merging those replies into the http response to the client, up to the request timeout parameter. If at least one write occurred CouchDB will return a 202 status code, if quorum was reached a 201 is returned, 200 is returned for reads (whether quorum reached or not, the difference is you'd get a faster reply if quorum is reached, otherwise you're waiting for the timeout). When couchdb believes nodes to be down, the quorum is implicitly lowered to avoid the latency penalty. In your scenario the two offline nodes would not get the writes at the time, for obvious reasons, but once up again they will receive those writes from the surviving nodes, restoring the expected N level of redundancy. B. > On 14 Jun 2023, at 07:11, Luca Morandini <luca.morandi...@gmail.com> wrote: > > Folks, > > A student (I teach CouchDB as part of a Cloud Computing course), pointed > out that, on a 4-node, 3-replica cluster, the database should stop > accepting requests when 2 nodes are down. > > His rationale is: the quorum (assuming its default value of 2) in principle > can be reached, but since some of the shards are not present on both nodes, > the quorum of replicas cannot be reached even when there are still 2 nodes > standing. > > This did not chime with my experience, hence I did a little experiment: > - set a cluster with 4 nodes and cluster parameters set to > "q":8,"n":3,"w":2,"r":2; > - created a database; > - added a few documents; > - stopped 2 nodes out of 4; > - added another 10,000 documents without a hiccup. > > I checked the two surviving nodes, and there were 6 shard files > representing the 8 shards in each node: 4 shards were replicated, and 4 > were not. > Therefore, about 5,000 of the write operations must have hit the > un-replicated shards. > > In other words, who has the vote in a quorum election: all the nodes, or > only the nodes that host the shard with the sought document? > > Cheers, > > Luca Morandini