I'm working on a project that uses pieces of Nutch to store a Lucene index in Hadoop (basically I am using the FsDirectory and related classes). When trying to write to an index I got an unsupported exception since FsDirectory doesn't support "seek" which Lucene uses on closing an IndexWriter, the file system is write-once. After looking through the Nutch code I saw that an index is worked on locally, either with writing or merging, then transferred into the dfs when finished. I just was checking to make sure I understood this correctly. If I was to work on a multi-gigabyte index I would need that much free space on my local drive to transfer the index to and it would take a while to copy each way. How does this work for the really huge indexes people want to build with Nutch? Would there be many smaller Lucene indexes in the dfs, since obviously one huge terabyte index couldn't be downloaded? I'm just trying to have a better understanding of how Nutch works.
Thanks, Tim
