I'm having multiple problems recrawling with nutch 0.9. Here are 2 questions. :-)
Right now, using the script I find here ( http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html ), I think I'm close to a workable solution, but the recrawl doesn't respect the crawl-urlfilter.txt. Is there a way to specify this configuration for the recrawl? Our final implementation will be a single-sited crawl with close-to-realtime search results (ideally, we'll crawl about every 30 minutes or 1 hour). In that regard, is there any way to have nutch respect cache value response codes (304 Not Modified) instead of the fetcher time in the configuration file? Thanks! -Josh Pavel