SOLR-1045 it is. More details will be available in that issue. Marc, you can check out Hadoop contrib/index which builds a Lucene index using Hadoop MapReduce. However, it does not handle duplicate detection.
Cheers, Ning On Mon, Mar 2, 2009 at 4:25 PM, Marc Sturlese <[email protected]> wrote: > > I am doing some research about creating lucene/solr index using hadoop but > there's not so much info around, would be great to see some code!!! (I am > experiencing problems specially in duplication detection) > Thanks > > Shalin Shekhar Mangar wrote: >> >> On Mon, Mar 2, 2009 at 11:24 PM, Ning Li <[email protected]> wrote: >> >>> Hi, >>> >>> I wonder if there is interest in a contrib module that builds Solr >>> index using Hadoop MapReduce? >>> >> >> Absolutely! >> >> >>> It is different from the Solr support in Nutch. The Solr support in >>> Nutch sends a document to a Solr server in a reduce task. Here, I aim >>> at building/updating Solr index within map/reduce tasks. Also, it >>> achieves better parallelism when the number of map tasks is greater >>> than the number of reduce tasks, which is usually the case. >>> >>> I worked out a very simple initial version. But I want to check if >>> there is any interest before proceeding. If so, I'll open a Jira >>> issue. >>> >> >> +1 >> >> Please do. It'd be great to see this in Solr. >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> >> > > -- > View this message in context: > http://www.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p22296832.html > Sent from the Solr - Dev mailing list archive at Nabble.com. > >
