I am a new nutch user and hopeful future dev, but so far I am mainly focus
on learning to use nutch, then later delving into the code.

I am using nutch 0.8.1 release under red hat linux enterprise 4

I am curious what the effects are of running a stage of the crawl processing
more than once? I ask this because several times now I have started a
restricted internet crawl, to find several days later it crashes for an
unknown reason on the map reduce at the end of the fetch cycle. The logs do
not indicate the reason for the crash, and intermediate files (the cached
pages) are lost. I'd like to restart the fetch from the last iteration, but
am worried that the partial fetch may have damaged the crawldb.

Basically I'd like to know the effects of restarting the cycle, generating,
fetching, etc. when previously the cycle did not complete through to the end
of updating the crawldb.

Thanks,

-Charlie Williams
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to