Re: Whether solr can support 2 TB data?

2016-09-24 Thread Toke Eskildsen
Jeffery Yuan  wrote:
>  In our application, every data there is about 800mb raw data, we are going
> to store this data for 5 years, then it's about 1 or 2 TB data.

>  I am wondering whether solr can support this much data?

Yes it can.

Or rather: You could probably construct a scenario where it is not feasible, 
but you would have to be very creative.

>  Usually how much data we store per node, how many nodes we can have in
> solr cloud, what hardware configuration each node should be?

As Shawn states, it is very hard to give advice on hardware (and I applaud him 
from refraining from giving the usual "free RAM == index size"-advice). 
However, we love to guesstimate, but to do that you really need to provide more 
details.


2TB of index that has hundreds of concurrent users, thousands of updates per 
seconds and heavy aggregations (grouping, faceting, streaming...) is a task 
that takes experimentation and beefy hardware.

2TB of index that is rarely updated and accessed by a few people at a time, 
which are okay with multi-second response times, can be handled by a 
desktop-class machine with SSDs.


Tell us about query types, update rates, latency requirements, document types 
and concurrent users. Then we can begin to guess.

- Toke Eskildsen


Re: Whether solr can support 2 TB data?

2016-09-23 Thread Joel Bernstein
Some anecdotal information. Alfresco is document management system that
uses Solr. We did scale testing with documents meant to simulate typical
office documents. We found with larger documents that 50 million documents
and 500 GB of index size per shard provided acceptable performance.

But you will need to experiment with your document set and performance
requirements to  find your optimal shard size.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Sep 23, 2016 at 5:16 PM, Shawn Heisey  wrote:

> On 9/23/2016 2:33 PM, Jeffery Yuan wrote:
> > In our application, every data there is about 800mb raw data, we are
> > going to store this data for 5 years, then it's about 1 or 2 TB data.
>
> As long as the filesystem can do it, Solr can handle that much data.
> Getting good performance with that much data is the hard part.
>
> > I am wondering whether solr can support this much data? Usually how
> > much data we store per node, how many nodes we can have in solr cloud,
> > what hardware configuration each node should be?
>
> It's nearly impossible to give you generic advice about hardware
> configurations.
>
> https://lucidworks.com/blog/sizing-hardware-in-the-
> abstract-why-we-dont-have-a-definitive-answer/
>
> In general, there's no problems with having terabytes of data in Solr.
> There may be some scalability challenges, and it will probably cost more
> to build than you may have planned.
>
> Query performance will be greatly affected by the ratio of index data
> size to memory size.  Good performance with Solr requires sufficient
> memory for the operating system to effectively cache the index data.
> This is over and above the Java heap memory required for Solr itself to
> run.
>
> Without actually attempting to build it, you won't really know how large
> your Solr index will be with 1-2TB of raw input data.
>
> You may be *very* surprised by the amount of memory that's required for
> good Solr performance.  See this page for a discussion about memory and
> Solr:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> The challenges that a large-scale Solr install entails will be similar
> with other search products, assuming that they have a similar
> configuration and similar capabilities to Solr.
>
> As mentioned by the first link above, generic advice about memory isn't
> really possible.  There are simply too many variables that can affect
> minimum requirements.
>
> Thanks,
> Shawn
>
>


Re: Whether solr can support 2 TB data?

2016-09-23 Thread Shawn Heisey
On 9/23/2016 2:33 PM, Jeffery Yuan wrote:
> In our application, every data there is about 800mb raw data, we are
> going to store this data for 5 years, then it's about 1 or 2 TB data. 

As long as the filesystem can do it, Solr can handle that much data. 
Getting good performance with that much data is the hard part.

> I am wondering whether solr can support this much data? Usually how
> much data we store per node, how many nodes we can have in solr cloud,
> what hardware configuration each node should be?

It's nearly impossible to give you generic advice about hardware
configurations.

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

In general, there's no problems with having terabytes of data in Solr. 
There may be some scalability challenges, and it will probably cost more
to build than you may have planned.

Query performance will be greatly affected by the ratio of index data
size to memory size.  Good performance with Solr requires sufficient
memory for the operating system to effectively cache the index data. 
This is over and above the Java heap memory required for Solr itself to run.

Without actually attempting to build it, you won't really know how large
your Solr index will be with 1-2TB of raw input data.

You may be *very* surprised by the amount of memory that's required for
good Solr performance.  See this page for a discussion about memory and
Solr:

https://wiki.apache.org/solr/SolrPerformanceProblems

The challenges that a large-scale Solr install entails will be similar
with other search products, assuming that they have a similar
configuration and similar capabilities to Solr.

As mentioned by the first link above, generic advice about memory isn't
really possible.  There are simply too many variables that can affect
minimum requirements.

Thanks,
Shawn



Re: Whether solr can support 2 TB data?

2016-09-23 Thread Ray Niu
You can only put 2 billion documents in one core, I would suggest to use solr
cloud. you need calculate how many solr document in your data and then
decide how many shards to go. you can get many useful resource in website,
I just provide one here.
http://www.slideshare.net/anshumg/best-practices-for-highly-available-and-large-scale-solrcloud


2016-09-23 13:33 GMT-07:00 Jeffery Yuan <yuanyun...@gmail.com>:

> Hi, Dear all:
>
>   In our application, every data there is about 800mb raw data, we are
> going
> to store this data for 5 years, then it's about 1 or 2 TB data.
>
>   I am wondering whether solr can support this much data?
>   Usually how much data we store per node, how many nodes we can have in
> solr cloud, what hardware configuration each node should be?
>
> Thanks very much for your help.
>
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Whether-solr-can-support-2-TB-data-tp4297790.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Whether solr can support 2 TB data?

2016-09-23 Thread Jeffery Yuan
Hi, Dear all:

  In our application, every data there is about 800mb raw data, we are going
to store this data for 5 years, then it's about 1 or 2 TB data.

  I am wondering whether solr can support this much data? 
  Usually how much data we store per node, how many nodes we can have in
solr cloud, what hardware configuration each node should be?

Thanks very much for your help.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whether-solr-can-support-2-TB-data-tp4297790.html
Sent from the Solr - User mailing list archive at Nabble.com.