Perhaps it wouldn't be too hard to make CouchDB scalable: a) When creating a view: add an option to a "key-range map": Maybe via callback(): "byte-range -> shard" Something like: from "aa" to "bb" -> node-0; and so on
b) During "indexing": Store the emitted view-index on the shard according to the "range-map" Similar to a partition key, but scalable c) During "query": e.g.: startkey=fooAAA & endkey=fooBBB & limit=10 Map-reduce to query "fooAAA" from shard-8 and "fooBBB" from shard-9 CouchDB scales out to 10000+ nodes. The query only asks 1, maybe 2 shards Am Mittwoch, dem 12.06.2024 um 16:47 -0400 schrieb Nick Vatamaniuc: > On Wed, Jun 12, 2024 at 3:54 PM Markus Doppelbauer > < > doppelba...@gmx.net.invalid > > wrote: > > > Dear Nick, > > > > This means: > > If Q+N are large enough to distribute the data to all > > nodes, e.g. 120 nodes Q=40, N=3 > > Then the view-query: startkey=foobar&endkey=foobaz > > has to ask all 120 nodes? > > > > In the initial startup phase it could query all 120 nodes, then it > will > pick 40 shard workers and will stream from those only. > > You do have some control where each of the 3 copies of shards end up: > https://docs.couchdb.org/en/stable/cluster/databases.html#placing-a-database-on-specific-nodes > > > > > If I use a partion-key, is it possible to distribute the partioned- > > view > > among multiple nodes? > > The docs say it should stay under 10GB. > > > > A partition cannot be split across multiple shard ranges. A single > shard > range can contain multiple partitions. So, for a given partition key, > it > would pick a particular shard range from Q, and then it would pick > one of > the N copies (usually 3) to stream from. > > Cheers, > > Markus > > > > > > > > Am Mittwoch, dem 12.06.2024 um 13:54 -0400 schrieb Nick Vatamaniuc: > > > Another feature related to efficient view querying are > > > partitioneddatabases: > > > https://docs.couchdb.org/en/stable/partitioned-dbs/index.html.It's > > > a > > > bit of a niche, as you'd need to have a good partition key, > > > butaside > > > from that, it can speed up your queries as responses would be > > > comingfrom a single shard only instead of Q shards. > > > > > > > > > On Wed, Jun 12, 2024 at 1:30 PM Markus Doppelbauer< > > > doppelba...@gmx.net.invalid > > > > wrote: > > > > Hi Nick,Thank you very much for your reply.This is exactly what > > > > we > > > > are lookingfor.There are so many DBs that store the secondary > > > > indexlocally(Cassandra, Aerospike, SyllaDB, ...)Thanks again > > > > for > > > > the answerMarcus > > > > > > > > Am Mittwoch, dem 12.06.2024 um 13:23 -0400 schrieb Nick > > > > Vatamaniuc: > > > > > Hi Marcus,The node handling the request only queries the > > > > > nodes > > > > > with shardcopies ofthat database. In a 100 node cluster the > > > > > shards for thatparticulardatabase might be present on only 6 > > > > > nodes, depending on theQ and Nsharding factors, so it will > > > > > query > > > > > 6 out 100 nodes. Forinstance, for N=3and Q=2 sharding > > > > > factors, it > > > > > will first send N*Q=6requests, and wait untilit gets at least > > > > > one > > > > > response for each of theQ=2 shard ranges. Thishappens very > > > > > quickly. Then, for the duration ofthe response, it will > > > > > onlystream responses from those Q=2 workers.So, to summarize > > > > > for > > > > > a Q=2database, it will be a streaming responsefrom 2 workers. > > > > > For > > > > > Q=4, from 4workers, etc...Cheers,-Nick > > > > > On Wed, Jun 12, 2024 at 1:00 PM Markus Doppelbauer< > > > > > doppelba...@gmx.net.invalid > > > > > > wrote: > > > > > > Hello,Is the CouchDB-view a "global" or "local" index?For > > > > > > example, if acluster has 100 nodes, would the query askfor > > > > > > a > > > > > > single node - or100 > > > > > > nodes?/.../_view/posts?startkey="foobar"&endkey="foobaz"Bes > > > > > > t > > > > > > wishesMarcus