Hi,
I had the same problem using re-crawl scripts from wiki. They all work
fine with nutch versions up to 0.9 (0.9 included), but when using
nutch-1.0-dev (from trunk) they brak at merge of indexes. Reason is that
merge in nutch-0.9 (from re-crawl scripts):
bin/nutch merge crawl/indexes crawl/NEWindexes
did the merging of old indexes from crawl/indexes and the new indexes
from crawl/NEWindexes and stored it in crawl/indexes. But with
nutch-1.0-dev (from trunk) merge requires empty (new) output folder.
Solution that works (I have tried it) is to do following:
bin/nutch merge crawl/index crawl/indexes crawl/NEWindexes
where crawl/index is new (output) folder, crawl/indexes is old indexes
and crawl/NEWindexes is the new indexes. It is important to know that
you can do this with as many indexes you want to merge (as many
re-crawls), you only have to do:
bin/nutch merge crawl/index crawl/indexes1 crawl/indexes2 ...
but crawl/index must not exist (delete it or backup it).
Nutch search web application will use merged index form crawl/index,
this is from my web application log:
2007-09-09 20:30:58,949 INFO searcher.NutchBean - creating new bean
2007-09-09 20:30:59,128 INFO searcher.NutchBean - opening merged index
in /home/nutch/test/trunk/crawl/index
Hope this will help,
Tomislav
On Thu, 2007-09-20 at 14:54 +0800, Lyndon Maydwell wrote:
> /nutch mergesegs $merged_segment -dir $segments
> if [ $? -ne 0 ]