Hi Lee, Thanks for the feedback. The script posted in the mailing list has some bugs. Please use the latest script from http://wiki.apache.org/nutch/Crawl
I have also made some minor changes to make it work with Nutch 1.0-dev in trunk. I have tested this with Nutch 1.0-dev. I believe this should work fine for Nutch 0.9 too. We had a discussion on re-crawling for Nutch 1.0-dev here:- http://www.mail-archive.com/[email protected]/msg09514.html Please try this script for re-crawling with Nutch-0.9 and let us know how it goes. Regards, Susam Pal On Nov 20, 2007 2:11 AM, Moore, Lee C <[EMAIL PROTECTED]> wrote: > > > Hello: > > I am trying do recrawling with Nutch-0.9. I have done some Google searching > but I haven't an answer that works. > > I had hopes for the script located at: > > http://www.mail-archive.com/[email protected]/msg09096.html > > I tried this script for re-crawling and it has the same problem after a > couple of re-crawls: > > ----- Merge Indexes (Step 7 of 8) ----- > merging indexes to: crawl/index > Adding crawl/NEWindexes/part-00000 > IndexMerger: java.io.IOException: Target crawl/index/merge-output already > exists > (also, this script has a un-related bug as it references the variable $rank > but $rank is not defined. I guess this is supposed to be topN.) > > Has anybody found the solution to sucessfully re-crawling with 0.9? > > thanks, > > -Lee >
