Re: Whether solr can support 2 TB data?
Jeffery Yuanwrote: > In our application, every data there is about 800mb raw data, we are going > to store this data for 5 years, then it's about 1 or 2 TB data. > I am wondering whether solr can support this much data? Yes it can. Or rather: You could probably construct a scenario where it is not feasible, but you would have to be very creative. > Usually how much data we store per node, how many nodes we can have in > solr cloud, what hardware configuration each node should be? As Shawn states, it is very hard to give advice on hardware (and I applaud him from refraining from giving the usual "free RAM == index size"-advice). However, we love to guesstimate, but to do that you really need to provide more details. 2TB of index that has hundreds of concurrent users, thousands of updates per seconds and heavy aggregations (grouping, faceting, streaming...) is a task that takes experimentation and beefy hardware. 2TB of index that is rarely updated and accessed by a few people at a time, which are okay with multi-second response times, can be handled by a desktop-class machine with SSDs. Tell us about query types, update rates, latency requirements, document types and concurrent users. Then we can begin to guess. - Toke Eskildsen
Re: Whether solr can support 2 TB data?
Some anecdotal information. Alfresco is document management system that uses Solr. We did scale testing with documents meant to simulate typical office documents. We found with larger documents that 50 million documents and 500 GB of index size per shard provided acceptable performance. But you will need to experiment with your document set and performance requirements to find your optimal shard size. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Sep 23, 2016 at 5:16 PM, Shawn Heiseywrote: > On 9/23/2016 2:33 PM, Jeffery Yuan wrote: > > In our application, every data there is about 800mb raw data, we are > > going to store this data for 5 years, then it's about 1 or 2 TB data. > > As long as the filesystem can do it, Solr can handle that much data. > Getting good performance with that much data is the hard part. > > > I am wondering whether solr can support this much data? Usually how > > much data we store per node, how many nodes we can have in solr cloud, > > what hardware configuration each node should be? > > It's nearly impossible to give you generic advice about hardware > configurations. > > https://lucidworks.com/blog/sizing-hardware-in-the- > abstract-why-we-dont-have-a-definitive-answer/ > > In general, there's no problems with having terabytes of data in Solr. > There may be some scalability challenges, and it will probably cost more > to build than you may have planned. > > Query performance will be greatly affected by the ratio of index data > size to memory size. Good performance with Solr requires sufficient > memory for the operating system to effectively cache the index data. > This is over and above the Java heap memory required for Solr itself to > run. > > Without actually attempting to build it, you won't really know how large > your Solr index will be with 1-2TB of raw input data. > > You may be *very* surprised by the amount of memory that's required for > good Solr performance. See this page for a discussion about memory and > Solr: > > https://wiki.apache.org/solr/SolrPerformanceProblems > > The challenges that a large-scale Solr install entails will be similar > with other search products, assuming that they have a similar > configuration and similar capabilities to Solr. > > As mentioned by the first link above, generic advice about memory isn't > really possible. There are simply too many variables that can affect > minimum requirements. > > Thanks, > Shawn > >
Re: Whether solr can support 2 TB data?
On 9/23/2016 2:33 PM, Jeffery Yuan wrote: > In our application, every data there is about 800mb raw data, we are > going to store this data for 5 years, then it's about 1 or 2 TB data. As long as the filesystem can do it, Solr can handle that much data. Getting good performance with that much data is the hard part. > I am wondering whether solr can support this much data? Usually how > much data we store per node, how many nodes we can have in solr cloud, > what hardware configuration each node should be? It's nearly impossible to give you generic advice about hardware configurations. https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ In general, there's no problems with having terabytes of data in Solr. There may be some scalability challenges, and it will probably cost more to build than you may have planned. Query performance will be greatly affected by the ratio of index data size to memory size. Good performance with Solr requires sufficient memory for the operating system to effectively cache the index data. This is over and above the Java heap memory required for Solr itself to run. Without actually attempting to build it, you won't really know how large your Solr index will be with 1-2TB of raw input data. You may be *very* surprised by the amount of memory that's required for good Solr performance. See this page for a discussion about memory and Solr: https://wiki.apache.org/solr/SolrPerformanceProblems The challenges that a large-scale Solr install entails will be similar with other search products, assuming that they have a similar configuration and similar capabilities to Solr. As mentioned by the first link above, generic advice about memory isn't really possible. There are simply too many variables that can affect minimum requirements. Thanks, Shawn
Re: Whether solr can support 2 TB data?
You can only put 2 billion documents in one core, I would suggest to use solr cloud. you need calculate how many solr document in your data and then decide how many shards to go. you can get many useful resource in website, I just provide one here. http://www.slideshare.net/anshumg/best-practices-for-highly-available-and-large-scale-solrcloud 2016-09-23 13:33 GMT-07:00 Jeffery Yuan <yuanyun...@gmail.com>: > Hi, Dear all: > > In our application, every data there is about 800mb raw data, we are > going > to store this data for 5 years, then it's about 1 or 2 TB data. > > I am wondering whether solr can support this much data? > Usually how much data we store per node, how many nodes we can have in > solr cloud, what hardware configuration each node should be? > > Thanks very much for your help. > > > > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Whether-solr-can-support-2-TB-data-tp4297790.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Whether solr can support 2 TB data?
Hi, Dear all: In our application, every data there is about 800mb raw data, we are going to store this data for 5 years, then it's about 1 or 2 TB data. I am wondering whether solr can support this much data? Usually how much data we store per node, how many nodes we can have in solr cloud, what hardware configuration each node should be? Thanks very much for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/Whether-solr-can-support-2-TB-data-tp4297790.html Sent from the Solr - User mailing list archive at Nabble.com.