Re: Decision on Number of shards and collection

Shawn Heisey Wed, 11 Apr 2018 08:37:51 -0700

On 4/11/2018 4:15 AM, neotorand wrote:
> I believe heterogeneous data can be indexed to same collection and i can
> have multiple shards for the index to be partitioned.So whats the need of a
> second collection?. yes when collection size grows i should look for more
> collection.what exactly that size is? what KPI drives the decision of having
> more collection?Any pointers or links for best practice.


There are no hard rules.  Many factors affect these decisions.

https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Creating multiple collections should be done when there is a logical or
business reason for keeping different sets of data separate from each
other.  If there's never any need for people to query all the data at
once, then it might make sense to use separate collections.  Or you
might want to put them together just for convenience, and use data in
the index to filter the results to only the information that the user is
allowed to access.

> when should i go for multiple shards?
> yes when shard size grows.Right? whats the size and how do i benchmark.

Some indexes function really well with 300 million documents or more per
shard.  Other indexes struggle with less than a million per shard.  It's
impossible to give you any specific number.  It depends on a bunch of
factors.

If query rate is very high, then you want to keep the shard count low. 
Using one shard might not be possible due to index size, but it should
be as low as you can make it.  You're also going to want to have a lot
of replicas to handle the load.

If query rate is extremely low, then sharding the index can actually
*improve* performance, because there will be idle CPU capacity that can
be used for the subqueries.

Thanks,
Shawn

Re: Decision on Number of shards and collection

Reply via email to