Hard to say. I've seen 20M doc be the place you need to consider sharding/SolrCloud. I've seen 300M docs be the place you need to start sharding. That said I'm quite sure you'll need to shard before you get to 2B. There's no good reason to delay that process.
You'll have to do something about the join issue though, that's the problem you might want to solve first. The new streaming aggregation stuff might help there, you'll have to figure that out. The first thing I'd explore is whether you can denormlized your way out of the need to join. Or whether you can use block joins instead. Best, Erick On Wed, May 27, 2015 at 11:15 AM, Vishal Swaroop <vishal....@gmail.com> wrote: > Currently, we have SOLR configured on single linux server (24 GB physical > memory) with multiple cores. > We are using SOLR joins (https://wiki.apache.org/solr/Join) across cores on > this single server. > > But, as data will grow to ~2 billion we need to assess whether we’ll need > to run SolrCloud as "In a DistributedSearch environment, you can not Join > across cores on multiple nodes" > > Please suggest at what point or index size should we start considering to > run SolrCloud ? > > Regards