Re: what happens to older segments

2019-10-22 Thread Sebastian Nagel
Hi Sachin, > does mergesegs by default updates the > crawldb once it merges all the segments? No it does not. That's already evident from the command-line help (no CrawlDb passed as parameter): $> bin/nutch mergesegs SegmentMerger output_dir (-dir segments | seg1 seg2 ...) [-filter] ... > Or d

Re: what happens to older segments

2019-10-22 Thread Sachin Mittal
Ok. Understood. I had one question though is that does mergesegs by default updates the crawldb once it merges all the segments? Or do we have to call the updatedb command on the merged segment to update the crawldb so that it has all the information for next cycle. Thanks Sachin On Tue, Oct 22

Re: what happens to older segments

2019-10-22 Thread Sebastian Nagel
Hi Sachin, > I want to know once a new segment is generated is there any use of > previous segments and can they be deleted? As soon as a segment is indexed and the CrawlDb is updated from this segment, you may delete it. But keeping older segments allows - reindexing in case something went wron

what happens to older segments

2019-10-21 Thread Sachin Mittal
Hi, I have been crawling using nutch. What I have understood is that for each crawl cycle it creates a segment and for the next crawl cycle it uses the outlinks from previous segment to generate and fetch next set of urls to crawl. Then it creates a new segment with those urls. I want to know once