Hi Susam Pal,

Thanks for the pointer to the latest crawl/recrawl script. It has worked
very well with Nutch-0.9. It is the answer to my problem!

Thanks again!

 -Lee

-----Original Message-----
From: Susam Pal [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 20, 2007 1:39 AM
To: [email protected]
Subject: Re:
http://www.mail-archive.com/[email protected]/msg09096.html

Hi Lee,

Thanks for the feedback. The script posted in the mailing list has
some bugs. Please use the latest script from
http://wiki.apache.org/nutch/Crawl

I have also made some minor changes to make it work with Nutch 1.0-dev
in trunk. I have tested this with Nutch 1.0-dev. I believe this should
work fine for Nutch 0.9 too.

We had a discussion on re-crawling for Nutch 1.0-dev here:-
http://www.mail-archive.com/[email protected]/msg09514.html

Please try this script for re-crawling with Nutch-0.9 and let us know
how it goes.

Regards,
Susam Pal

On Nov 20, 2007 2:11 AM, Moore, Lee C <[EMAIL PROTECTED]> wrote:
>
>
> Hello:
>
> I am trying do recrawling with Nutch-0.9.  I have done some Google
searching
> but I haven't an answer that works.
>
> I had hopes for the script located at:
>
>
http://www.mail-archive.com/[email protected]/msg09096.html
>
> I tried this script for re-crawling and it has the same problem after
a
> couple of re-crawls:
>
> ----- Merge Indexes (Step 7 of 8) -----
> merging indexes to: crawl/index
> Adding crawl/NEWindexes/part-00000
> IndexMerger: java.io.IOException: Target crawl/index/merge-output
already
> exists
> (also, this script has a un-related bug as it references the variable
$rank
> but $rank is not defined. I guess this is supposed to be topN.)
>
> Has anybody found the solution to sucessfully re-crawling with 0.9?
>
> thanks,
>
>  -Lee
>

Reply via email to