why are you so afraid of segment merger? It appears to be the only "official" way to get rid of excessive folders. of course it's time/resource consuming but is your system so high loaded?
Also I might be wrong but if you are not planning to return summaries and content from nutch when you can remove folders by rm. And you can completely get rid of segments by using the solr indexer. After that you perform indexing you can delete fetched segments. I presume this is what you saw in other threads. Best Regards Alexander Aristov On 12 November 2010 21:27, ytthet <[email protected]> wrote: > > Hi All, > > I like to know when and how to delete segments (directories) in Nutch 1.0. > > I searched through mailing list archive, but I can't find the answers. > > Following is my background information. > > My crawl-fetch-index process is executed once a day by scheduled job. My > "db.fetch.interval.max" is 1, so I am expecting urls to be fetched and > indexed everyday. I am not merging segments in my crawl-fetch-index process > because I can't afford Storage Space and RAM. (Merging segment is one of > the > popular discussion in this thread I guess). > > On First day, I have 6 folders in /segments/ (because i crawled 6 depth). > Total of 1 GiB. Second day I have another 6 more folders worth of 1 GiB++ > Now I have total of 2 GiB. Third day, 1 GiB++ and now I have around 3GIB++. > > My question is when can I remove those old folder from /segments/? And how > do I remove it? > > I tried deleting previous segment (e.g from first day) by linux "rm" > command > and they are gone. But searcher no longer works. > > I saw suggestion on one entry "segments are no longer being referenced by > indexes which are > using in searches, simply delete the segments/xxxxxxxxxx directory. " Is > that correct? > > If so how exactly? > > Thanks for your time, > > YT Thet > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/When-and-how-properly-to-delete-segments-directory-Nutch-1-0-tp1890600p1890600.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

