Hard to say. I've seen 20M doc be the place you need to consider
sharding/SolrCloud. I've seen 300M docs be the place you need to start
sharding. That said I'm quite sure you'll need to shard before you get
to 2B. There's no good reason to delay that process.

You'll have to do something about the join issue though, that's the
problem you might want to solve first. The new streaming aggregation
stuff might help there, you'll have to figure that out.

The first thing I'd explore is whether you can denormlized your way
out of the need to join. Or whether you can use block joins instead.

Best,
Erick

On Wed, May 27, 2015 at 11:15 AM, Vishal Swaroop <vishal....@gmail.com> wrote:
> Currently, we have SOLR configured on single linux server (24 GB physical
> memory) with multiple cores.
> We are using SOLR joins (https://wiki.apache.org/solr/Join) across cores on
> this single server.
>
> But, as data will grow to ~2 billion we need to assess whether we’ll need
> to run SolrCloud as "In a DistributedSearch environment, you can not Join
> across cores on multiple nodes"
>
> Please suggest at what point or index size should we start considering to
> run SolrCloud ?
>
> Regards

Reply via email to