To avoid missing this issue opened OAK-2808. Data collected from recent runs suggest that this aspect would need to be looked into going forward Chetan Mehrotra
On Tue, Mar 10, 2015 at 9:49 PM, Thomas Mueller <muel...@adobe.com> wrote: > Hi, > > I think removing binaries directly without going though the GC logic is > dangerous, because we can't be sure if there are other references. There > is one exception, it is if each file is guaranteed to be unique. For that, > we could for example append a unique UUID to each file. The Lucene file > system implementation would need to be changed for that (write the UUID, > but ignore it when reading and reading the file size). > > Even in that case, there is still a risk, for example if the binary > _reference_ is copied, or if an old revision is accessed. How do we ensure > this does not happen? > > Regards, > Thomas > > > On 10/03/15 07:46, "Chetan Mehrotra" <chetan.mehro...@gmail.com> wrote: > >>Hi Team, >> >>With storing of Lucene index files within DataStore our usage pattern >>of DataStore has changed between JR2 and Oak. >> >>With JR2 the writes were mostly application based i.e. if application >>stores a pdf/image file then that would be stored in DataStore. JR2 by >>default would not write stuff to DataStore. Further in deployment >>where large number of binary content is present then systems tend to >>share the DataStore to avoid duplication of storage. In such cases >>running Blob GC is a non trivial task as it involves a manual step and >>coordination across multiple deployments. Due to this systems tend to >>delay frequency of GC >> >>Now with Oak apart from application the Oak system itself *actively* >>uses the DataStore to store the index files for Lucene and there the >>churn might be much higher i.e. frequency of creation and deletion of >>index file is lot higher. This would accelerate the rate of garbage >>generation and thus put lot more pressure on the DataStore storage >>requirements. >> >>Any thoughts on how to avoid/reduce the requirement to increase the >>frequency of Blob GC? >> >>One possible way would be to provide a special cleanup tool which can >>look for such old Lucene index files and deletes them directly without >>going through the full fledged MarkAndSweep logic >> >>Thoughts? >> >>Chetan Mehrotra >