Thanks, that's exactly what I was thinking. Do you have any recommendations on maximum index size (obviously we'd be testing ourselves, but its good to get an idea)?
Tim -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Thursday, March 02, 2006 7:34 PM To: [email protected] Subject: Re: Question about Index Writing/Merging Tim Patton wrote: > I'm working on a project that uses pieces of Nutch to store a Lucene index > in Hadoop (basically I am using the FsDirectory and related classes). When > trying to write to an index I got an unsupported exception since FsDirectory > doesn't support "seek" which Lucene uses on closing an IndexWriter, the file > system is write-once. After looking through the Nutch code I saw that an > index is worked on locally, either with writing or merging, then transferred > into the dfs when finished. I just was checking to make sure I understood > this correctly. Yes, this is correct. > If I was to work on a multi-gigabyte index I would need > that much free space on my local drive to transfer the index to and it would > take a while to copy each way. How does this work for the really huge > indexes people want to build with Nutch? Would there be many smaller Lucene > indexes in the dfs, since obviously one huge terabyte index couldn't be > downloaded? I'm just trying to have a better understanding of how Nutch > works. Terabyte indexes aren't actually very useful, since they take too long to search. So with big collections (>100M pages) one will keep multiple indexes and use distributed search to search them all in parallel. Doug ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
