Short form: You really have to prototype. Here's the long form: https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
I've seen between 20M and 200M docs fit on a single piece of hardware, so you'll absolutely have to shard. And the other thing you haven't told us is whether you plan on _adding_ 2B docs a day or whether that number is the total corpus size and you are re-indexing the 2B docs/day. IOW, if you are adding 2B docs/day, 30 days later do you have 2B docs or 60B docs in your corpus? Best, Erick On Mon, Feb 8, 2016 at 8:09 AM, Susheel Kumar <susheel2...@gmail.com> wrote: > Also if you are expecting indexing of 2 billion docs as NRT or if it will > be offline (during off hours etc). For more accurate sizing you may also > want to index say 10 million documents which may give you idea how much is > your index size and then use that for extrapolation to come up with memory > requirements. > > Thanks, > Susheel > > On Mon, Feb 8, 2016 at 11:00 AM, Emir Arnautovic < > emir.arnauto...@sematext.com> wrote: > >> Hi Mark, >> Can you give us bit more details: size of docs, query types, are docs >> grouped somehow, are they time sensitive, will they update or it is rebuild >> every time, etc. >> >> Thanks, >> Emir >> >> >> On 08.02.2016 16:56, Mark Robinson wrote: >> >>> Hi, >>> We have a requirement where we would need to index around 2 Billion docs >>> in >>> a day. >>> The queries against this indexed data set can be around 80K queries per >>> second during peak time and during non peak hours around 12K queries per >>> second. >>> >>> Can Solr realize this huge volumes. >>> >>> If so, assuming we have no constraints for budget what would be a >>> recommended Solr set up (number of shards, number of Solr instances >>> etc...) >>> >>> Thanks! >>> Mark >>> >>> >> -- >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management >> Solr & Elasticsearch Support * http://sematext.com/ >> >>