Re: View performance

Markus Doppelbauer Fri, 14 Jun 2024 04:59:56 -0700

Perhaps it wouldn't be too hard to make CouchDB scalable:

a) When creating a view: add an option to a "key-range map":
   Maybe via callback(): "byte-range -> shard"
   Something like: from "aa" to "bb" -> node-0; and so on


b) During "indexing": Store the emitted view-index on the shard
   according to the "range-map"
   Similar to a partition key, but scalable

c) During "query": e.g.: startkey=fooAAA & endkey=fooBBB & limit=10
   Map-reduce to query "fooAAA" from shard-8 and "fooBBB" from shard-9

CouchDB scales out to 10000+ nodes.
The query only asks 1, maybe 2 shards



Am Mittwoch, dem 12.06.2024 um 16:47 -0400 schrieb Nick Vatamaniuc:
> On Wed, Jun 12, 2024 at 3:54 PM Markus Doppelbauer
> <
> doppelba...@gmx.net.invalid
> > wrote:
>
> > Dear Nick,
> >
> > This means:
> > If Q+N are large enough to distribute the data to all
> > nodes, e.g. 120 nodes Q=40, N=3
> > Then the view-query: startkey=foobar&endkey=foobaz
> > has to ask all 120 nodes?
> >
>
> In the initial startup phase it could query all 120 nodes, then it
> will
> pick 40 shard workers and will stream from those only.
>
> You do have some control where each of the 3 copies of shards end up:
> https://docs.couchdb.org/en/stable/cluster/databases.html#placing-a-database-on-specific-nodes
>
>
>
> > If I use a partion-key, is it possible to distribute the partioned-
> > view
> > among multiple nodes?
> > The docs say it should stay under 10GB.
> >
>
> A partition cannot be split across multiple shard ranges. A single
> shard
> range can contain multiple partitions. So, for a given partition key,
> it
> would pick a particular shard range from Q, and then it would pick
> one of
> the N copies (usually 3) to stream from.
>
> Cheers,
> > Markus
> >
> >
> >
> > Am Mittwoch, dem 12.06.2024 um 13:54 -0400 schrieb Nick Vatamaniuc:
> > > Another feature related to efficient view querying are
> > > partitioneddatabases:
> > > https://docs.couchdb.org/en/stable/partitioned-dbs/index.html.It's
> > >  a
> > > bit of a niche, as you'd need to have a good partition key,
> > > butaside
> > > from that, it can speed up your queries as responses would be
> > > comingfrom a single shard only instead of Q shards.
> > >
> > >
> > > On Wed, Jun 12, 2024 at 1:30 PM Markus Doppelbauer<
> > > doppelba...@gmx.net.invalid
> > > > wrote:
> > > > Hi Nick,Thank you very much for your reply.This is exactly what
> > > > we
> > > > are lookingfor.There are so many DBs that store the secondary
> > > > indexlocally(Cassandra, Aerospike, SyllaDB, ...)Thanks again
> > > > for
> > > > the answerMarcus
> > > >
> > > > Am Mittwoch, dem 12.06.2024 um 13:23 -0400 schrieb Nick
> > > > Vatamaniuc:
> > > > > Hi Marcus,The node handling the request only queries the
> > > > > nodes
> > > > > with shardcopies ofthat database. In a 100 node cluster the
> > > > > shards for thatparticulardatabase might be present on only 6
> > > > > nodes, depending on theQ and Nsharding factors, so it will
> > > > > query
> > > > > 6 out 100 nodes. Forinstance, for N=3and Q=2 sharding
> > > > > factors, it
> > > > > will first send N*Q=6requests, and wait untilit gets at least
> > > > > one
> > > > > response for each of theQ=2 shard ranges. Thishappens very
> > > > > quickly. Then, for the duration ofthe response, it will
> > > > > onlystream responses from those Q=2 workers.So, to summarize
> > > > > for
> > > > > a Q=2database, it will be a streaming responsefrom 2 workers.
> > > > > For
> > > > > Q=4, from 4workers, etc...Cheers,-Nick
> > > > > On Wed, Jun 12, 2024 at 1:00 PM Markus Doppelbauer<
> > > > > doppelba...@gmx.net.invalid
> > > > > > wrote:
> > > > > > Hello,Is the CouchDB-view a "global" or "local" index?For
> > > > > > example, if acluster has 100 nodes, would the query askfor
> > > > > > a
> > > > > > single node - or100
> > > > > > nodes?/.../_view/posts?startkey="foobar"&endkey="foobaz"Bes
> > > > > > t
> > > > > > wishesMarcus

Re: View performance

Reply via email to