Short form: You really have to prototype. Here's the long form:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

I've seen between 20M and 200M docs fit on a single piece of hardware,
so you'll absolutely have to shard.

And the other thing you haven't told us is whether you plan on
_adding_ 2B docs a day or whether that number is the total corpus size
and you are re-indexing the 2B docs/day. IOW, if you are adding 2B
docs/day, 30 days later do you have 2B docs or 60B docs in your
corpus?

Best,
Erick

On Mon, Feb 8, 2016 at 8:09 AM, Susheel Kumar <susheel2...@gmail.com> wrote:
> Also if you are expecting indexing of 2 billion docs as NRT or if it will
> be offline (during off hours etc).  For more accurate sizing you may also
> want to index say 10 million documents which may give you idea how much is
> your index size and then use that for extrapolation to come up with memory
> requirements.
>
> Thanks,
> Susheel
>
> On Mon, Feb 8, 2016 at 11:00 AM, Emir Arnautovic <
> emir.arnauto...@sematext.com> wrote:
>
>> Hi Mark,
>> Can you give us bit more details: size of docs, query types, are docs
>> grouped somehow, are they time sensitive, will they update or it is rebuild
>> every time, etc.
>>
>> Thanks,
>> Emir
>>
>>
>> On 08.02.2016 16:56, Mark Robinson wrote:
>>
>>> Hi,
>>> We have a requirement where we would need to index around 2 Billion docs
>>> in
>>> a day.
>>> The queries against this indexed data set can be around 80K queries per
>>> second during peak time and during non peak hours around 12K queries per
>>> second.
>>>
>>> Can Solr realize this huge volumes.
>>>
>>> If so, assuming we have no constraints for budget what would be a
>>> recommended Solr set up (number of shards, number of Solr instances
>>> etc...)
>>>
>>> Thanks!
>>> Mark
>>>
>>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>

Reply via email to