[Nutch-general] Re: Question about Index Writing/Merging

Doug Cutting Thu, 02 Mar 2006 16:35:02 -0800


Tim Patton wrote:

I'm working on a project that uses pieces of Nutch to store a Lucene index
in Hadoop (basically I am using the FsDirectory and related classes).  When
trying to write to an index I got an unsupported exception since FsDirectory
doesn't support "seek" which Lucene uses on closing an IndexWriter, the file
system is write-once.  After looking through the Nutch code I saw that an
index is worked on locally, either with writing or merging, then transferred
into the dfs when finished.  I just was checking to make sure I understood
this correctly.


Yes, this is correct.

If I was to work on a multi-gigabyte index I would need
that much free space on my local drive to transfer the index to and it would
take a while to copy each way.  How does this work for the really huge
indexes people want to build with Nutch?  Would there be many smaller Lucene
indexes in the dfs, since obviously one huge terabyte index couldn't be
downloaded?  I'm just trying to have a better understanding of how Nutch
works.

Terabyte indexes aren't actually very useful, since they take too longto search. So with big collections (>100M pages) one will keep multipleindexes and use distributed search to search them all in parallel.


Doug


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: Question about Index Writing/Merging

Reply via email to