Hi,

When I run crawl the second time, it always complains that 'crawled' already
exists. I always need to remove this directory using 'hadoop dfs -rm
crawled' to get going.
Is there some way to avoid this error and tell nutch that its a recrawl?

bin/nutch crawl urls -dir crawled -depth 1  2>&1 | tee /tmp/foo.log


Exception in thread "main" java.lang.RuntimeException: crawled already
exists.
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:85)

Thanks,

Manoj.

-- 
Tired of reading blogs? Listen to  your favorite blogs at
http://www.blogbard.com   !!!!

Reply via email to