Still, 50M is not excessive for a single shard although it's getting into the range that I'd like proof that my hardware etc. is adequate before committing to it. I've seen up to 300M docs on a single machine, admittedly they were tweets. YMMV based on hardware and index complexity of course. Here's a long blog about sizing: https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
In this case I'd be pretty comfortable by creating a test harness (using jMeter or the like) and faking the extra 30M documents by re-indexing the current corpus but assigning new IDs (<uniqueKey). Keep doing this until your target machine breaks (i.e. either blows up by exhausting memory or the response slows unacceptably) and that'll give you a good upper bound. Note that you should plan on a couple of rounds of tuning/testing when you start to have problems. I'll warn you up front, though, that unless you have an existing app to mine for _real_ user queries, generating say 5,000 "typical" queries is more of a challenge than you might expect ;)... Now, all that said all is not lost if you do go with a single shard. Let's say that 6 months down the road your requirements change. Or the initial estimate was off. Or.... There are a couple of options: 1> create a new collection with more shards and re-index from scratch 2> use the SPLITSHARD Collections API all to, well, split the shard. In this latter case, a shard is split into two pieces of roughly equal size, which does mean that you can only grow your shard count by powers of 2. And even if you do have a single shard, using SolrCloud is still a good thing as the failover is automagically handled assuming you have more than one replica... Best, Erick On Mon, Mar 7, 2016 at 4:05 PM, shamik <sham...@gmail.com> wrote: > Thanks a lot, Erick. You are right, it's a tad small with around 20 million > documents, but the growth projection around 50 million in next 6-8 months. > It'll continue to grow, but maybe not at the same rate. From the index size > point of view, the size can grow up to half a TB from its current state. > Honestly, my perception of "big" index is still vague :-) . All I'm trying > to make sure is that decision I take is scalable in the long term and will > be able to sustain the growth without compromising the performance. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Cloud-sharding-strategy-tp4262274p4262304.html > Sent from the Solr - User mailing list archive at Nabble.com.