Re: 'crawled already exists' - how do I recrawl?

Susam Pal Sat, 12 Jan 2008 21:01:13 -0800

You can try the crawl script: http://wiki.apache.org/nutch/Crawl


Regards,
Susam Pal

On Jan 13, 2008 8:36 AM, Manoj Bist <[EMAIL PROTECTED]> wrote:
> Hi,
>
> When I run crawl the second time, it always complains that 'crawled' already
> exists. I always need to remove this directory using 'hadoop dfs -rm
> crawled' to get going.
> Is there some way to avoid this error and tell nutch that its a recrawl?
>
> bin/nutch crawl urls -dir crawled -depth 1  2>&1 | tee /tmp/foo.log
>
>
> Exception in thread "main" java.lang.RuntimeException: crawled already
> exists.
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:85)
>
> Thanks,
>
> Manoj.
>
> --
> Tired of reading blogs? Listen to  your favorite blogs at
> http://www.blogbard.com   !!!!
>

Re: 'crawled already exists' - how do I recrawl?

Reply via email to