Here's a blog outlining why this is so hard to answer:
http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Just one example from your post, you mention index size as
a metric. It's often useless. Stored data ('stored="true" ') is placed
in files with special extensions (*.fdt and *.fdx). These have
virtually no effect on search requirements. They can occupy
10% of your on-disk space or 90% of your disk space.....

Gotta prototype and measure....

Best
Erick

On Wed, Aug 29, 2012 at 5:45 PM, Michael Della Bitta
<michael.della.bi...@appinions.com> wrote:
> Unfortunately the answer for this can vary quite a bit based on a
> number of factors:
>
> 1. Whether or not fields are stored,
> 2. Document size,
> 3. Total term count,
> 4. Solr version
>
> etc.
>
> We have two major indexes, one for servicing online queries, and one
> for batch processing. Our batch index is performance critical and
> therefore was optimized for throughput, was stored in RAM, and has
> less stored fields than the online query one. The batch index shards
> are 25Gb or less, and we're trending toward smaller and more numerous
> shards. This is with 1.4, and I'm just finishing up on our migration
> to 3.6.1.
>
> Michael Della Bitta
>
> P.S. Why'd you CC honeybadger? Honeybadger don't care...
>
> ------------------------------------------------
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> www.appinions.com
> Where Influence Isn’t a Game
>
>
> On Wed, Aug 29, 2012 at 5:17 PM, Michael Brandt
> <michael.j.bra...@colorado.edu> wrote:
>> Hi all,
>>
>> I am looking for information on how many documents may be indexed by a
>> single instance of Solr (not using shards) before performance issues are
>> encountered. In searching the internet I've come across some varying
>> answers; one answer suggest 50GBs is
>> problematic<http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-tp3656484p3656848.html>;
>> this blog 
>> post<http://harish11g.blogspot.com/2012/02/apache-solr-sharding-amazon-ec2.html>on
>> sharding Solr in AWS says sharding is not necessary until you have
>> "millions of records," but is no more specific.
>>
>> What experiences have you had with this? At what point did you find it
>> necessary to scale up Solr, in terms of both number of records and size of
>> index (whether MB, GB, etc.)?
>>
>> Thanks,
>> Michael Brandt

Reply via email to