Fetcher can fetch also with "protocol" file. This is not as efficient as it could be because you still need to go through full crawling cycle. It would be more efficient to use (write) a special crawler that would start from a submitted path and follow all sub directories and files.
Such crawler could also be succesfully used for efficient crawling of smb, ftp and webdaw resources, -- Sami Siren 2006/8/27, Sandy Polanski <[EMAIL PROTECTED]>:
This maybe more of a straight Lucene task, but I thought I'd ask anyway. Rather than using Nutch as a crawler, I'd rather just send the Nutch parser and indexer over to a directory on my server and have it detect content-type by the file extension. I'd prefer to skip the whole crawling part since all of my data is local, and increase the reliability of getting all of my proper data indexed. Is this possible? --------------------------------- All-new Yahoo! Mail - Fire up a more powerful email and get things done faster.
