Re: Solr Cloud sharding strategy

Erick Erickson Mon, 07 Mar 2016 16:28:18 -0800

Still, 50M is not excessive for a single shard although it's getting
into the range that I'd like proof that my hardware etc. is adequate
before committing to it. I've seen up to 300M docs on a single
machine, admittedly they were tweets. YMMV based on hardware and index
complexity of course. Here's a long blog about sizing:
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

In this case I'd be pretty comfortable by creating a test harness
(using jMeter or the like) and faking the extra 30M documents by
re-indexing the current corpus but assigning new IDs (<uniqueKey).
Keep doing this until your target machine breaks (i.e. either blows up
by exhausting memory or the response slows unacceptably) and that'll
give you a good upper bound. Note that you should plan on a couple of
rounds of tuning/testing when you start to have problems.

I'll warn you up front, though, that unless you have an existing app
to mine for _real_ user queries, generating say 5,000 "typical"
queries is more of a challenge than you might expect ;)...

Now, all that said all is not lost if you do go with a single shard.
Let's say that 6 months down the road your requirements change. Or the
initial estimate was off. Or....

There are a couple of options:
1> create a new collection with more shards and re-index from scratch
2> use the SPLITSHARD Collections API all to, well, split the shard.

In this latter case, a shard is split into two pieces of roughly equal
size, which does mean that you can only grow your shard count by
powers of 2.

And even if you do have a single shard, using SolrCloud is still a
good thing as the failover is automagically handled assuming you have
more than one replica...

Best,
Erick

On Mon, Mar 7, 2016 at 4:05 PM, shamik <sham...@gmail.com> wrote:
> Thanks a lot, Erick. You are right, it's a tad small with around 20 million
> documents, but the growth projection around 50 million in next 6-8 months.
> It'll continue to grow, but maybe not at the same rate. From the index size
> point of view, the size can grow up to half a TB from its current state.
> Honestly, my perception of "big" index is still vague :-) . All I'm trying
> to make sure is that decision I take is scalable in the long term and will
> be able to sustain the growth without compromising the performance.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Cloud-sharding-strategy-tp4262274p4262304.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud sharding strategy

Reply via email to