Re: View performance

Matthieu Rakotojaona Thu, 13 Jun 2024 23:52:24 -0700

Hey,

Thanks for the reply, it is clearer now! Only one point remains"mysterious" to me:

> - quorum applies to document operations (read, write, delete), butnot view queries (R is inherently 1)

This means that queries might be inconsistent. View queries aren'tnecessarily different from reads, so I don't get why it's ok in this case


On 13/06/2024 14:40, Robert Newson wrote:

Hi,

Right. a database is divided into Q shards, each of which covers part of a 
numeric range from 0 to 2^32-1, the doc id is run through a hashing function 
that outputs a number in that range, and the document is then stored inside the 
shard whose range contains that number.

as this effectively randomises which doc goes in which shard when we need to 
perform a query across the database we need to ask each shard range to perform 
the query and then combine the results.

This is good in some ways (the shards can be stored across multiple machines 
and thus benefit from more cores/disks/memory/etc) it's bad in others, the 
number of workers that need to respond goes up as Q goes up.

"Partition" is just our word (and arguably a misleading word) for allowing users to control which 
shard range their documents end up in. Only part of the doc id is passed through the hashing function(doc id 
"foo:bar" would be partition foo, and we'd hash only "foo"). Thus the user can ensure 
that a group of documents lands in the same shard range. The benefit is as Nick described, when querying 
_within_ a given partition we know that only one shard range is in play, so we do not consult shards for 
other ranges (we already know they will have no results).

So, to your specific list of questions;

- yes
- yes, but the reason to keep partitions small is because of the fixed 
assignment to a given shard range. the default scheme would load up each shard 
range fairly evenly. it's about disk space management and predictable 
performance.
- when querying a global view we need Q responses (and will issue Q*N requests, 
but cancel Q*(N-1) of those when the first response from each range has been 
received, though they may have already done some, or all, of their work, before 
that happens). for a partitioned view we need 1 response (but will issue N 
requests).
- quorum applies to document operations (read, write, delete), but not view 
queries (R is inherently 1)

B.

On 13 Jun 2024, at 09:01, Matthieu Rakotojaona 
<matthieu.rakotojaona-rainimangav...@imt-atlantique.fr> wrote:

Hello, I'm still a bit confused with shards versus partitions, is this correct ?

- A database is composed of Q shards, shards splice the database "automatically"

- A shard is composed of 1 or more partitions, partitions splice the database 
according to the user's choice. A partition should thus be made so that it is 
not huge, in fact must be under (total size/Q)

- When querying a view, the whole database needs to be processed, that means 
all Q shards

- When querying a view only on documents inside a partition, since one 
partition is on 1 shard, only one shard is queried

- Whatever the number of shards, quorum still needs to be verified, ie for Q=2 
and N=3 I still need to wait for (3+1)/2 = 2 replies for each shard

On 12/06/2024 19:54, Nick Vatamaniuc wrote:

Another feature related to efficient view querying are partitioned
databases: https://docs.couchdb.org/en/stable/partitioned-dbs/index.html.
It's a bit of a niche, as you'd need to have a good partition key, but
aside from that, it can speed up your queries as responses would be coming
from a single shard only instead of Q shards.



On Wed, Jun 12, 2024 at 1:30 PM Markus Doppelbauer
<doppelba...@gmx.net.invalid> wrote:

Hi Nick,
Thank you very much for your reply.This is exactly what we are looking
for.There are so many DBs that store the secondary indexlocally
(Cassandra, Aerospike, SyllaDB, ...)
Thanks again for the answerMarcus


Am Mittwoch, dem 12.06.2024 um 13:23 -0400 schrieb Nick Vatamaniuc:

Hi Marcus,
The node handling the request only queries the nodes with shard
copies ofthat database. In a 100 node cluster the shards for that
particulardatabase might be present on only 6 nodes, depending on the
Q and Nsharding factors, so it will query 6 out 100 nodes. For
instance, for N=3and Q=2 sharding factors, it will first send N*Q=6
requests, and wait untilit gets at least one response for each of the
Q=2 shard ranges. Thishappens very quickly. Then, for the duration of
the response, it will onlystream responses from those Q=2 workers.
So, to summarize for a Q=2database, it will be a streaming response
from 2 workers. For Q=4, from 4workers, etc...
Cheers,-Nick

On Wed, Jun 12, 2024 at 1:00 PM Markus Doppelbauer<
doppelba...@gmx.net.invalid> wrote:

Hello,
Is the CouchDB-view a "global" or "local" index?For example, if a
cluster has 100 nodes, would the query askfor a single node - or
100 nodes?
/.../_view/posts?startkey="foobar"&endkey="foobaz"
Best wishesMarcus

Re: View performance

Reply via email to