cybercouf wrote: > If I'm not wrong, segments are used by nutch to store parsed data, and after > update the crawldb, and finally build an index. > > But when the crawl is finished, for a next recrawl nutch only need the last > crawldb? so not my old segments. > And for building the new index, it only needs my new indexes and the old > index, not the old segs. > (and it seems for the search engine part segment are used just for "show > page cache copy" ?) > > It could be nice space saved to delete the segments, but do my argument is > right? > Well, your argument is actually not correct. crawl db only holds the information about the crawl status of the url, not the contents. and in the index, the contents of the url is not stored, just indexed. So, how would you give summaries without the segments? You can delete the segments only if you do not need them for cached results, or summaries.
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
