'crawled already exists' - how do I recrawl?

Manoj Bist Sat, 12 Jan 2008 19:07:28 -0800

Hi,

When I run crawl the second time, it always complains that 'crawled' already
exists. I always need to remove this directory using 'hadoop dfs -rm
crawled' to get going.
Is there some way to avoid this error and tell nutch that its a recrawl?


bin/nutch crawl urls -dir crawled -depth 1  2>&1 | tee /tmp/foo.log


Exception in thread "main" java.lang.RuntimeException: crawled already
exists.
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:85)

Thanks,

Manoj.

-- 
Tired of reading blogs? Listen to  your favorite blogs at
http://www.blogbard.com   !!!!

'crawled already exists' - how do I recrawl?

Reply via email to