Re: Is there a way to get Nutch to parse/index by file access directly (not over HTTP)?

sami siren Mon, 28 Aug 2006 10:00:44 -0700

Fetcher can fetch also with "protocol" file. This is not as efficient as it
could be because you still need to
go through full crawling cycle. It would be more efficient to use (write) a
special crawler that would start from a submitted path and follow all sub
directories and files.


Such crawler could also be succesfully used for efficient crawling of smb,
ftp  and webdaw resources,

--
Sami Siren

2006/8/27, Sandy Polanski <[EMAIL PROTECTED]>:


This maybe more of a straight Lucene task, but I thought I'd ask
anyway.  Rather than using Nutch as a crawler, I'd rather just send the
Nutch parser and indexer over to a directory on my server and have it detect
content-type by the file extension.

I'd prefer to skip the whole crawling part since all of my data is local,
and increase the reliability of getting all of my proper data indexed.  Is
this possible?


---------------------------------
All-new Yahoo! Mail - Fire up a more powerful email and get things done
faster.

Re: Is there a way to get Nutch to parse/index by file access directly (not over HTTP)?

Reply via email to