I've seen single nodes handle 10M docs using 64G of heap (using Zing). I've seen 300M in 12G of memory. There's absolutely no way to tell.
See: https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ for a methodology to answer the question with _your_ data and _your_ query pattern... Best, Erick On Thu, Mar 23, 2017 at 5:08 AM, Emir Arnautovic <emir.arnauto...@sematext.com> wrote: > Hi Vrindavda, > > It is hard to tell anything without testing and details on what/how is > indexed, how it is going to be queried and what are latency/throughput > requirements. > > 25M or 12.5M documents per shard might be too much if you have strict > latency requirements, but testing is the only way to tell. I would suggest > that you set up index with single shard and see how many documents you can > put into it to meet latency requirements under expected load (if you plan to > have 2 replicas that is roughly the half of the expected load). Leave some > room for distributed query overhead. After you get that number you can see > how many shards you need to have. > > HTH, > Emir > > > > On 23.03.2017 09:46, vrindavda wrote: >> >> Hello, >> >> My production index is expected to contain 50 million documents, with >> addition of around 1 million every year. >> >> Should I go for 64GB RAM (4 Shards /4 Replicas) Or 128GB (2 Shards/ 2 >> Replicas) ? >> >> Please suggest if above assumptions are incorrect. What all parameters >> should I consider ? >> >> >> Thank you, >> Vrinda Davda >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Architecture-suggestions-tp4326436.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ >