Hi All, I like to know when and how to delete segments (directories) in Nutch 1.0.
I searched through mailing list archive, but I can't find the answers. Following is my background information. My crawl-fetch-index process is executed once a day by scheduled job. My "db.fetch.interval.max" is 1, so I am expecting urls to be fetched and indexed everyday. I am not merging segments in my crawl-fetch-index process because I can't afford Storage Space and RAM. (Merging segment is one of the popular discussion in this thread I guess). On First day, I have 6 folders in /segments/ (because i crawled 6 depth). Total of 1 GiB. Second day I have another 6 more folders worth of 1 GiB++ Now I have total of 2 GiB. Third day, 1 GiB++ and now I have around 3GIB++. My question is when can I remove those old folder from /segments/? And how do I remove it? I tried deleting previous segment (e.g from first day) by linux "rm" command and they are gone. But searcher no longer works. I saw suggestion on one entry "segments are no longer being referenced by indexes which are using in searches, simply delete the segments/xxxxxxxxxx directory. " Is that correct? If so how exactly? Thanks for your time, YT Thet -- View this message in context: http://lucene.472066.n3.nabble.com/When-and-how-properly-to-delete-segments-directory-Nutch-1-0-tp1890600p1890600.html Sent from the Nutch - User mailing list archive at Nabble.com.

