I am crawling daily and putting each new crawl into a new directory every day using the "nutch crawl" command.
Lets say i get up to having a bunch(20+) of crawls(793 MB each) crawled. I can merge all of their segments into 1 large segment (3.4 GB) at the same time with no problem in approx 3 hours or so. The problem happens when I add an additional crawl. If i try to merge segments of a new individual craw (793 MB)l to the existing merged segments, the merge takes 13+ hours and almost 500GB, at which point i usually cancel because I believe something is wrong. I would appreciate if anyone has some insight as to what is going wrong here and also any tips on how to improve/fix this. Thanks, Mina