Hi,
When I run crawl the second time, it always complains that 'crawled' already
exists. I always need to remove this directory using 'hadoop dfs -rm
crawled' to get going.
Is there some way to avoid this error and tell nutch that its a recrawl?
bin/nutch crawl urls -dir crawled -depth 1 2>&1 | tee /tmp/foo.log
Exception in thread "main" java.lang.RuntimeException: crawled already
exists.
at org.apache.nutch.crawl.Crawl.main(Crawl.java:85)
Thanks,
Manoj.
--
Tired of reading blogs? Listen to your favorite blogs at
http://www.blogbard.com !!!!