Solrcloud Batch Indexing

Bin Wang Mon, 07 Mar 2016 12:29:41 -0800

Hi there,

I have a fairly big data set that I need to quick index into Solrcloud.


I have done some research and none of them looked really good to me.

(1) Kite Morphline: I managed to get it working, the mapreduce finished in
a few minutes which is good, however, it took a really long time, like one
hour (60 million), to merge the indexes into Solrcloud, the go-live part.

(2) Mapreduce Using Solrcloud Server:
<http://techuserhadoop.blogspot.com/2014/09/mapreduce-job-for-indexing-documents-to.html>
this
approach is pretty straightforward, however, every document has to funnel
through the solrserver which is really not optimized for bulk loading.

Here is what I am thinking, is it possible to use Mapreduce to create a few
Lucene indexes first, for example, using 3 reducers to write three indexes.
Then create a Solr collection with three shards pointing to the generated
indexes. Can Solr easily pick up generated indexes?

I am really new to Solr and wondering if this is feasible, and if there is
any work that has already been done. I am not really interested in cutting
the edge and any existing work should be appreciated!

Best regards,

Bin

Solrcloud Batch Indexing

Reply via email to