Hi Chetan,

I like the idea.
But I wonder: how do you envision that this new index cleanup would locate 
indexes in the content-addressed DS?

Michael

> On 10 Mar 2015, at 07:46, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote:
> 
> Hi Team,
> 
> With storing of Lucene index files within DataStore our usage pattern
> of DataStore has changed between JR2 and Oak.
> 
> With JR2 the writes were mostly application based i.e. if application
> stores a pdf/image file then that would be stored in DataStore. JR2 by
> default would not write stuff to DataStore. Further in deployment
> where large number of binary content is present then systems tend to
> share the DataStore to avoid duplication of storage. In such cases
> running Blob GC is a non trivial task as it involves a manual step and
> coordination across multiple deployments. Due to this systems tend to
> delay frequency of GC
> 
> Now with Oak apart from application the Oak system itself *actively*
> uses the DataStore to store the index files for Lucene and there the
> churn might be much higher i.e. frequency of creation and deletion of
> index file is lot higher. This would accelerate the rate of garbage
> generation and thus put lot more pressure on the DataStore storage
> requirements.
> 
> Any thoughts on how to avoid/reduce the requirement to increase the
> frequency of Blob GC?
> 
> One possible way would be to provide a special cleanup tool which can
> look for such old Lucene index files and deletes them directly without
> going through the full fledged MarkAndSweep logic
> 
> Thoughts?
> 
> Chetan Mehrotra

Reply via email to