Hi, I have been crawling using nutch. What I have understood is that for each crawl cycle it creates a segment and for the next crawl cycle it uses the outlinks from previous segment to generate and fetch next set of urls to crawl. Then it creates a new segment with those urls.
I want to know once a new segment is generated is there any use of previous segments and can they be deleted? I also see a command line tool mergesegs <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=122916832>. Does it make sense to use this to merge old segments into new segment before deleting old segments? Also when we then start the fresh crawl cycle how do we instruct nutch to use this new merged segment, or it automatically picks up the newest segment as starting point? Thanks Sachin