Hi all, wondering if anybody else had been having problem with the script at:
http://wiki.apache.org/nutch/MergeCrawl with nutch-0.9? I am doing the simple crawl like this: bin/nutch url1 -dir crawl1 -depth 2 bin/nutch url2 -dir crawl2 -depth 2 # cwd at /nutch/search - since mergecrawl require absolute path bin/mergecrawl /nutch/search/merged /nutch/search/crawl1 /nutch/search/crawl2 The individual crawl result was fine but however the merged result was not. I suspect the result is with the final merge stage with index, since if i manually reindex with: bin/nutch index merged/indexes merged/crawldb merged/linkdb merged/segments/<the_merged_segmentid> then the will work perfectly fine (i.e. searchable via the nutch searcher). How would one go about debugging this? Is there any way to read the index similar to the readdb for reading crawldb? Many thanks in advance boris
