Re: Clarification about Solr Cloud and Shard

Shawn Heisey Tue, 02 Oct 2018 13:41:39 -0700

On 10/2/2018 9:33 AM, Rekha wrote:

Dear Solr Team, I need following clarification from you, please check and give 
suggestion to me, 1. I want to store and search 200 Billions of documents(Each 
document contains 16 fields). For my case can I able to achieve by using Solr 
cloud? 2. For my case how many shard and nodes will be needed? 3. In future can 
I able to increase the nodes and shards? Thanks, Rekha Karthick

In a nutshell: It's not possible to give generic advice. The contentsof the fields will affect exactly what you need. The nature of thequeries that you send will affect exactly what you need. The query ratewill affect exactly what you need. The overall size of the index (diskspace, as well as document count) will affect what you need.

In the "not very helpful" department, but I promise this is absolutetruth, there's this blog post:


https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

To handle 200 billion documents *in a single collection*, you'reprobably going to want at least 200 shards, and there are good reasonsto go with even more shards than that. But you need to be warned thatthere can be serious scalability problems when SolrCloud must keep trackof that many different indexes. Here's an issue I filed for scalabilityproblems with thousands of collections ... there can be similar problemswith lots of shards as well. This issue says it is fixed, but no codechanges that I am aware of were ever made related to the issue, and asfar as I can tell, it's still a problem even in the latest version:


https://issues.apache.org/jira/browse/SOLR-7191

That many shard/replicas on one collection is likely to need zookeeper'smaximum znode size (jute.maxbuffer) boosted, because it will probablyrequire more than one megabyte to hold the JSON structure describing thecollection.

As for how many machines you'll need ... absolutely no idea. If queryrate will be insanely high, you'll want a dedicated machine for eachshard replica, and you may need many replicas, which is going to meanhundreds, possibly thousands, of servers. If the query rate is reallylow and/or each document is very small, you might be able to house morethan one shard per server. But you should know that handling 200billion documents is going to require a lot of hardware even if it turnsout that you're not going to be handling tons of data (per document) orqueries.


Thanks,
Shawn

Re: Clarification about Solr Cloud and Shard

Reply via email to