Re: Distributed Indexer?

Andrzej Bialecki Fri, 21 Mar 2008 00:31:09 -0700

[EMAIL PROTECTED] wrote:

Hi,


I see Andrzej is getting busy with JIRA.... new release in April?


That's the idea ... I could use some help, too ;)


I just noticed something nice in Hadoop's svn repo - distributed
Lucene indexer.  Is anyone thinking about providing support for that
in Nutch?  Or do people think this is not needed because in the end
people tend to create a number of relatively small indices (5-10M
docs) as opposed to one larger index?

It caught my eye, too. I think it's nice that this tool uses low-levelknowledge of Lucene segments to minimize the index churn - however, if Iunderstand it correctly, the resulting indexes it maintains are stilllocated on HDFS. Also, this doesn't address the extended concept of"shard" in Nutch, which consists not only of a Lucene index but alsocontains binary content, parse data and parse text ...

From our point of view it would be useful if it were to move two stepsfurther, i.e. include the management of other binary data (no longer sotrivial, eh?), and then offer a functionality to do this transparentlyso that the shards end up on local filesystems of search servers, andthe low-level segment management is done there ...



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Distributed Indexer?

Reply via email to