Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

Tomislav Poljak Thu, 20 Sep 2007 03:41:00 -0700

Hi,
I had the same problem using re-crawl scripts from wiki. They all work
fine with nutch versions up to 0.9 (0.9 included), but when using
nutch-1.0-dev (from trunk) they brak at merge of indexes. Reason is that
merge in nutch-0.9 (from re-crawl scripts):

bin/nutch merge crawl/indexes crawl/NEWindexes

did the merging of old indexes from crawl/indexes and the new indexes
from crawl/NEWindexes and stored it in crawl/indexes. But with
nutch-1.0-dev (from trunk) merge requires empty (new) output folder. 

Solution that works (I have tried it) is to do following:

bin/nutch merge crawl/index crawl/indexes crawl/NEWindexes

where crawl/index is new (output) folder, crawl/indexes is old indexes
and crawl/NEWindexes is the new indexes. It is important to know that
you can do this with as many indexes you want to merge (as many
re-crawls), you only have to do:

bin/nutch merge crawl/index crawl/indexes1 crawl/indexes2 ...

but crawl/index must not exist (delete it or backup it).

Nutch search web application will use merged index form crawl/index,
this is from my web application log:

2007-09-09 20:30:58,949 INFO  searcher.NutchBean - creating new bean
2007-09-09 20:30:59,128 INFO  searcher.NutchBean - opening merged index
in /home/nutch/test/trunk/crawl/index

Hope this will help,

Tomislav

On Thu, 2007-09-20 at 14:54 +0800, Lyndon Maydwell wrote:
> /nutch mergesegs $merged_segment -dir $segments
> if [ $? -ne 0 ]

Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

Reply via email to