Re: Whether SolrCloud can support 2 TB data?

2016-09-24 Thread Erick Erickson
John: The MapReduceIndexerTool (in contrib) is intended for bulk indexing in a Hadoop ecosystem. This doesn't preclude home-grown setups of course, but it's available OOB. The only tricky bit is at the end. Either you have your Solr indexes on HDFS in which case MRIT can merge them into a live

Re: Whether SolrCloud can support 2 TB data?

2016-09-24 Thread Toke Eskildsen
Regarding a 12TB index: Yago Riveiro wrote: > Our cluster is small for the data we hold (12 machines with SSD and 32G of > RAM), but we don't need sub-second queries, we need facet with high > cardinality (in worst case scenarios we aggregate 5M unique string values) >

Re: Whether SolrCloud can support 2 TB data?

2016-09-24 Thread Toke Eskildsen
John Bickerstaff wrote: > As an aside - I just spoke with somone the other day who is using Hadoop > for re-index in order to save a lot of time. If you control which documents goes into which shards, then that is certainly a possibility. We have a collection with long

Re: Whether SolrCloud can support 2 TB data?

2016-09-24 Thread John Bickerstaff
As an aside - I just spoke with somone the other day who is using Hadoop for re-index in order to save a lot of time. I don't know the details, but I assume they're using Hadoop to call Lucene code and index documents using the map-reduce approach... This was made in their own shop - I don't

Re: Whether SolrCloud can support 2 TB data?

2016-09-24 Thread Yago Riveiro
"LucidWorks achieved 150k docs/second" This is only valid is you don't have replication, I don't know your use case, but a realistic use case normally use some type of redundancy to not lost data in a hardware failure, at least 2 replicas, more implicates a reduction of throughput. Also

Re: Whether SolrCloud can support 2 TB data?

2016-09-24 Thread S G
Hey Yago, 12 T is very impressive. Can you also share some numbers about the shards, replicas, machine count/specs and docs/second for your case? I think you would not be having a single index of 12 TB too. So some details on that would be really helpful too.

Re: Whether SolrCloud can support 2 TB data?

2016-09-23 Thread Yago Riveiro
In my company we have a SolrCloud cluster with 12T. My advices: Be nice with CPU you will needed in some point (very important if you have not control over the kind of queries to the cluster, clients are greedy, the want all results at the same time) SSD and memory (as many as you can afford

Re: Whether SolrCloud can support 2 TB data?

2016-09-23 Thread Pushkar Raste
Solr is RAM hungry. Make sure that you have enough RAM to have most if the index of a core in the RAM itself. You should also consider using really good SSDs. That would be a good start. Like others said, test and verify your setup. --Pushkar Raste On Sep 23, 2016 4:58 PM, "Jeffery Yuan"

Re: Whether SolrCloud can support 2 TB data?

2016-09-23 Thread Jeffery Yuan
Thanks so much for your prompt reply. We are definitely going to use SolrCloud. I am just wondering whether SolrCloud can scale even at TB data level and what kind of hardware configuration it should be. Thanks. -- View this message in context: