Re: Sharding: all databases or just one?

Adam Kocoloski Sun, 14 Jan 2018 15:53:31 -0800

Hi Ulrich,

Assuming usersdb-XYZ already existed, yes you should have needed a revision ID 
to issue a successful PUT request there.


The statement in the docs is a statement about the availability of the system 
for reads and writes. The cluster will continue to function in that degraded 
state with two nodes down, but most of the time you wouldn’t want to conduct 
planned maintenance in a way that lowers the number of live replicas for a 
shard below your desired replica count.

Appreciate the feedback about the docs being vague in this respect. I suspect 
part of the problem is a relative lack of open source tooling and documentation 
around this specific process, and so folks are left to try to infer the best 
practice from other parts of the docs.

The number of unique shards per database has little to no bearing on the 
durability and availability of the system. Rather, it affects the overall 
throughput achievable for that database. More shards means higher throughput 
for writes, view indexing, and document lookups that specify the _id directly.

 Cheers, Adam

> On Jan 14, 2018, at 7:56 AM, Ulrich Mayring <[email protected]> wrote:
> 
> Hi Adam,
> 
> this is interesting, I was able to send PUT requests to 
> http://localhost:5986/_dbs/userdb-XYZ without giving a revision. Is this 
> intended or should I try to reproduce the issue and file a bug report?
> 
> If I understand you correctly, then with default settings (replica level of 3 
> and 8 shards) I cannot remove a node from a 3-node cluster, else I would lose 
> some shards. Including perhaps users from _users database. So I would need 4 
> or 5 nodes, then I could remove one?
> 
> The docs might be a little confusing (to me) in that regard. They say:
> 
> n=3 Any 2 nodes can be down
> 
> I believe this is only true if you have as many nodes as shards (8 per 
> default)?
> 
> Ulrich
> 
> Am 12.01.18 um 03:11 schrieb Adam Kocoloski:
>> Hi Ulrich, sharding is indeed per-database. This allows for an important 
>> degree of flexibility but it does introduce maintenance overhead when you 
>> have a lot of databases. The system databases you mentioned do have their 
>> own sharding documents which can be modified if you want to redistribute 
>> them across the cluster. Note that this is not required as you scale the 
>> cluster; nodes can still access the information in those databases 
>> regardless of the presence of a “local” shard. Of course if you’re planning 
>> on removing a node hosting shards of those databases you should move the 
>> shards first to preserve the replica level.
>> The sharding document is a normal document and absolutely does have 
>> revisions. We found the changelog to be a used asset when resolving any 
>> merge conflicts introduced in a concurrent rebalancing exercise. Cheers,
>> Adam
>>> On Jan 7, 2018, at 6:08 AM, Ulrich Mayring <[email protected]> wrote:
>>> 
>>> Hello,
>>> 
>>> I haven't quite understood the 2.1.1 documentation for sharding in one 
>>> aspect: it is described how to get the sharding document for one database, 
>>> how to edit it by e. g. adding a node to it and how to upload it again. 
>>> I've tried that and it works fine.
>>> 
>>> However, if I have the couch_per_user feature turned on, then there are 
>>> potentially thousands of databases. Suppose I add a new node to the 
>>> cluster, do I then need to follow this procedure for all databases in order 
>>> to balance data? Or is it enough to do it for one database? I suppose an 
>>> equivalent question would be: are the shards per database or per cluster?
>>> 
>>> And, somewhat related: what about the _users, _global_changes and 
>>> _replicator databases? Do I need to edit their sharding document as well, 
>>> whenever I add or remove a cluster node?
>>> 
>>> I also find it interesting that the sharding document has no revisions and 
>>> instead relies on changelog entries.
>>> 
>>> many thanks in advance for any enlightenment,
>>> 
>>> Ulrich
>>> 
> 
>

Re: Sharding: all databases or just one?

Reply via email to