Some anecdotal information. Alfresco is document management system that
uses Solr. We did scale testing with documents meant to simulate typical
office documents. We found with larger documents that 50 million documents
and 500 GB of index size per shard provided acceptable performance.

But you will need to experiment with your document set and performance
requirements to  find your optimal shard size.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Sep 23, 2016 at 5:16 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 9/23/2016 2:33 PM, Jeffery Yuan wrote:
> > In our application, every data there is about 800mb raw data, we are
> > going to store this data for 5 years, then it's about 1 or 2 TB data.
>
> As long as the filesystem can do it, Solr can handle that much data.
> Getting good performance with that much data is the hard part.
>
> > I am wondering whether solr can support this much data? Usually how
> > much data we store per node, how many nodes we can have in solr cloud,
> > what hardware configuration each node should be?
>
> It's nearly impossible to give you generic advice about hardware
> configurations.
>
> https://lucidworks.com/blog/sizing-hardware-in-the-
> abstract-why-we-dont-have-a-definitive-answer/
>
> In general, there's no problems with having terabytes of data in Solr.
> There may be some scalability challenges, and it will probably cost more
> to build than you may have planned.
>
> Query performance will be greatly affected by the ratio of index data
> size to memory size.  Good performance with Solr requires sufficient
> memory for the operating system to effectively cache the index data.
> This is over and above the Java heap memory required for Solr itself to
> run.
>
> Without actually attempting to build it, you won't really know how large
> your Solr index will be with 1-2TB of raw input data.
>
> You may be *very* surprised by the amount of memory that's required for
> good Solr performance.  See this page for a discussion about memory and
> Solr:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> The challenges that a large-scale Solr install entails will be similar
> with other search products, assuming that they have a similar
> configuration and similar capabilities to Solr.
>
> As mentioned by the first link above, generic advice about memory isn't
> really possible.  There are simply too many variables that can affect
> minimum requirements.
>
> Thanks,
> Shawn
>
>

Reply via email to