Hi All,

I like to know when and how to delete segments (directories) in Nutch 1.0.

I searched through mailing list archive, but I can't find the answers.

Following is my background information. 

My crawl-fetch-index process is executed once a day by scheduled job. My
"db.fetch.interval.max" is 1, so I am expecting urls to be fetched and
indexed everyday. I am not merging segments in my crawl-fetch-index process
because I can't afford Storage Space and RAM. (Merging segment is one of the
popular discussion in this thread I guess).

On First day, I have 6 folders in /segments/ (because i crawled 6 depth).
Total of 1 GiB. Second day I have another 6 more folders worth of 1 GiB++
Now I have total of 2 GiB. Third day, 1 GiB++ and now I have around 3GIB++.

My question is when can I remove those old folder from /segments/? And how
do I remove it?

I tried deleting previous segment (e.g from first day) by linux "rm" command
and they are gone. But searcher no longer works.

I saw suggestion on one entry "segments are no longer being referenced by
indexes which are
using in searches, simply delete the segments/xxxxxxxxxx directory. " Is
that correct?

If so how exactly?

Thanks for your time,

YT Thet

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/When-and-how-properly-to-delete-segments-directory-Nutch-1-0-tp1890600p1890600.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to