Jilani, you did say "My team needs that option if at all possible", my
first response would be "why?".   Why do they want to limit the number of
documents per shard, what's the rationale/use case behind that
requirement?  Once we understand that, we can explain why its a bad idea. :)

I suspect I'm re-iterating Jack's comments, but why are you sharding in the
first place? 8 shards split across 4 machines, so 2 shards per machine.
But you have 2 replicas of each shard, so you have 16 Solr core, and hence
4 Solr cores per machine?  Since you need an instance of all 8 shards to be
up in order to service requests, you can get away with everything on 2
machines, but you still have 8 Solr cores to manage in order to have a
fully functioning system.  What's the benefit of sharding in this
scenario?  Sharding adds complexity, so you normally only add sharding if
your search times are too slow without it.

You need to work out how much disk space the whole 20m docs is going to
take (maybe index 1m or 5m docs and extrapolate if they are all equivalent
in size), then split it across 4 machines.  But as Erick points out you
need to allow for merges to occur, so whatever the space of the "static"
data set, you need to allow for double that from time to time if background
merges are happening.


On 7 May 2015 at 16:05, Jack Krupansky <jack.krupan...@gmail.com> wrote:

> A leader is also a replica - SolrCloud is not a master/slave architecture.
> Any replica can be elected to be the leader, but that is only temporary and
> can change over time.
>
> You can place multiple shards on a single node, but was that really your
> intention?
>
> Generally, number of nodes equals number of shards times the replication
> factor. But then divided by shards per node if you do place more than one
> shard per node.
>
> -- Jack Krupansky
>
> On Thu, May 7, 2015 at 1:29 AM, Jilani Shaik <jilani24...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Is it possible to restrict number of documents per shard in Solr cloud?
> >
> > Lets say we have Solr cloud with 4 nodes, and on each node we have one
> > leader and one replica. Like wise total we have 8 shards that includes
> > replicas. Now I need to index my documents in such a way that each shard
> > will have only 5 million documents. Total documents in Solr cloud should
> be
> > 20 million documents.
> >
> >
> > Thanks,
> > Jilani
> >
>

Reply via email to