Re: Recrawling without deleting crawl directory

Susam Pal Fri, 14 Mar 2008 09:39:33 -0700

The recrawl patch in https://issues.apache.org/jira/browse/NUTCH-601
got committed today. So if you check out the latest trunk, you can
recrawl without deleting the crawl directory.


However, if you are using an older version, you may use the script at:
http://wiki.apache.org/nutch/Crawl

Regards,
Susam Pal

On Fri, Mar 14, 2008 at 3:48 AM, Bradford Stephens
<[EMAIL PROTECTED]> wrote:
> Greetings,
>
>  A coworker and I are experimenting with Nutch in anticipation of a
>  pretty large rollout at our company. However, we seem to be stuck on
>  something -- after the crawler is finished, we can't manually re-crawl
>  into the same directory/index! It says "Directory already exists" when
>  we try to initiate a new crawl. Any ideas?
>
>  Cheers,
>  Bradford
>

Re: Recrawling without deleting crawl directory

Reply via email to