You can try the crawl script: http://wiki.apache.org/nutch/Crawl
Regards, Susam Pal On Jan 13, 2008 8:36 AM, Manoj Bist <[EMAIL PROTECTED]> wrote: > Hi, > > When I run crawl the second time, it always complains that 'crawled' already > exists. I always need to remove this directory using 'hadoop dfs -rm > crawled' to get going. > Is there some way to avoid this error and tell nutch that its a recrawl? > > bin/nutch crawl urls -dir crawled -depth 1 2>&1 | tee /tmp/foo.log > > > Exception in thread "main" java.lang.RuntimeException: crawled already > exists. > at org.apache.nutch.crawl.Crawl.main(Crawl.java:85) > > Thanks, > > Manoj. > > -- > Tired of reading blogs? Listen to your favorite blogs at > http://www.blogbard.com !!!! >
