Hi Lee,

Thanks for the feedback. The script posted in the mailing list has
some bugs. Please use the latest script from
http://wiki.apache.org/nutch/Crawl

I have also made some minor changes to make it work with Nutch 1.0-dev
in trunk. I have tested this with Nutch 1.0-dev. I believe this should
work fine for Nutch 0.9 too.

We had a discussion on re-crawling for Nutch 1.0-dev here:-
http://www.mail-archive.com/[email protected]/msg09514.html

Please try this script for re-crawling with Nutch-0.9 and let us know
how it goes.

Regards,
Susam Pal

On Nov 20, 2007 2:11 AM, Moore, Lee C <[EMAIL PROTECTED]> wrote:
>
>
> Hello:
>
> I am trying do recrawling with Nutch-0.9.  I have done some Google searching
> but I haven't an answer that works.
>
> I had hopes for the script located at:
>
>     http://www.mail-archive.com/[email protected]/msg09096.html
>
> I tried this script for re-crawling and it has the same problem after a
> couple of re-crawls:
>
> ----- Merge Indexes (Step 7 of 8) -----
> merging indexes to: crawl/index
> Adding crawl/NEWindexes/part-00000
> IndexMerger: java.io.IOException: Target crawl/index/merge-output already
> exists
> (also, this script has a un-related bug as it references the variable $rank
> but $rank is not defined. I guess this is supposed to be topN.)
>
> Has anybody found the solution to sucessfully re-crawling with 0.9?
>
> thanks,
>
>  -Lee
>

Reply via email to