Re: Build Solr index using Hadoop MapReduce

Ning Li Mon, 02 Mar 2009 16:48:09 -0800

SOLR-1045 it is. More details will be available in that issue.

Marc, you can check out Hadoop contrib/index which builds a Lucene
index using Hadoop MapReduce. However, it does not handle duplicate
detection.


Cheers,
Ning


On Mon, Mar 2, 2009 at 4:25 PM, Marc Sturlese <[email protected]> wrote:
>
> I am doing some research about creating lucene/solr index using hadoop but
> there's not so much info around, would be great to see some code!!! (I am
> experiencing problems specially in duplication detection)
> Thanks
>
> Shalin Shekhar Mangar wrote:
>>
>> On Mon, Mar 2, 2009 at 11:24 PM, Ning Li <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I wonder if there is interest in a contrib module that builds Solr
>>> index using Hadoop MapReduce?
>>>
>>
>> Absolutely!
>>
>>
>>> It is different from the Solr support in Nutch. The Solr support in
>>> Nutch sends a document to a Solr server in a reduce task. Here, I aim
>>> at building/updating Solr index within map/reduce tasks. Also, it
>>> achieves better parallelism when the number of map tasks is greater
>>> than the number of reduce tasks, which is usually the case.
>>>
>>> I worked out a very simple initial version. But I want to check if
>>> there is any interest before proceeding. If so, I'll open a Jira
>>> issue.
>>>
>>
>> +1
>>
>> Please do. It'd be great to see this in Solr.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p22296832.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>

Re: Build Solr index using Hadoop MapReduce

Reply via email to