Bruno Thiel wrote:
All,
I want to get nutch to index the file system. My first approach was to
nfs-mount the file system and et nutch crawl through the hierachary over
http/Apache. This turned out to be fairly slow ~3,000 fetches per hour.
Next approach was to go via file:/// <file:///> and to generate a file list
to be crawled. This file list is fairly big ~200,000 entries, and with the
current 0.8.1 release of nutch the fetcher just freezes right at the end of
a crawl.
What exactly happens when your fetcher freezes? 200 000 entries is not a
big list to
be fetched.
--
Sami Siren