Hi there, I have a fairly big data set that I need to quick index into Solrcloud.
I have done some research and none of them looked really good to me. (1) Kite Morphline: I managed to get it working, the mapreduce finished in a few minutes which is good, however, it took a really long time, like one hour (60 million), to merge the indexes into Solrcloud, the go-live part. (2) Mapreduce Using Solrcloud Server: <http://techuserhadoop.blogspot.com/2014/09/mapreduce-job-for-indexing-documents-to.html> this approach is pretty straightforward, however, every document has to funnel through the solrserver which is really not optimized for bulk loading. Here is what I am thinking, is it possible to use Mapreduce to create a few Lucene indexes first, for example, using 3 reducers to write three indexes. Then create a Solr collection with three shards pointing to the generated indexes. Can Solr easily pick up generated indexes? I am really new to Solr and wondering if this is feasible, and if there is any work that has already been done. I am not really interested in cutting the edge and any existing work should be appreciated! Best regards, Bin