Hi Chetan, I like the idea. But I wonder: how do you envision that this new index cleanup would locate indexes in the content-addressed DS?
Michael > On 10 Mar 2015, at 07:46, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > > Hi Team, > > With storing of Lucene index files within DataStore our usage pattern > of DataStore has changed between JR2 and Oak. > > With JR2 the writes were mostly application based i.e. if application > stores a pdf/image file then that would be stored in DataStore. JR2 by > default would not write stuff to DataStore. Further in deployment > where large number of binary content is present then systems tend to > share the DataStore to avoid duplication of storage. In such cases > running Blob GC is a non trivial task as it involves a manual step and > coordination across multiple deployments. Due to this systems tend to > delay frequency of GC > > Now with Oak apart from application the Oak system itself *actively* > uses the DataStore to store the index files for Lucene and there the > churn might be much higher i.e. frequency of creation and deletion of > index file is lot higher. This would accelerate the rate of garbage > generation and thus put lot more pressure on the DataStore storage > requirements. > > Any thoughts on how to avoid/reduce the requirement to increase the > frequency of Blob GC? > > One possible way would be to provide a special cleanup tool which can > look for such old Lucene index files and deletes them directly without > going through the full fledged MarkAndSweep logic > > Thoughts? > > Chetan Mehrotra