To avoid missing this issue opened OAK-2808. Data collected from
recent runs suggest that this aspect would need to be looked into
going forward
Chetan Mehrotra


On Tue, Mar 10, 2015 at 9:49 PM, Thomas Mueller <muel...@adobe.com> wrote:
> Hi,
>
> I think removing binaries directly without going though the GC logic is
> dangerous, because we can't be sure if there are other references. There
> is one exception, it is if each file is guaranteed to be unique. For that,
> we could for example append a unique UUID to each file. The Lucene file
> system implementation would need to be changed for that (write the UUID,
> but ignore it when reading and reading the file size).
>
> Even in that case, there is still a risk, for example if the binary
> _reference_ is copied, or if an old revision is accessed. How do we ensure
> this does not happen?
>
> Regards,
> Thomas
>
>
> On 10/03/15 07:46, "Chetan Mehrotra" <chetan.mehro...@gmail.com> wrote:
>
>>Hi Team,
>>
>>With storing of Lucene index files within DataStore our usage pattern
>>of DataStore has changed between JR2 and Oak.
>>
>>With JR2 the writes were mostly application based i.e. if application
>>stores a pdf/image file then that would be stored in DataStore. JR2 by
>>default would not write stuff to DataStore. Further in deployment
>>where large number of binary content is present then systems tend to
>>share the DataStore to avoid duplication of storage. In such cases
>>running Blob GC is a non trivial task as it involves a manual step and
>>coordination across multiple deployments. Due to this systems tend to
>>delay frequency of GC
>>
>>Now with Oak apart from application the Oak system itself *actively*
>>uses the DataStore to store the index files for Lucene and there the
>>churn might be much higher i.e. frequency of creation and deletion of
>>index file is lot higher. This would accelerate the rate of garbage
>>generation and thus put lot more pressure on the DataStore storage
>>requirements.
>>
>>Any thoughts on how to avoid/reduce the requirement to increase the
>>frequency of Blob GC?
>>
>>One possible way would be to provide a special cleanup tool which can
>>look for such old Lucene index files and deletes them directly without
>>going through the full fledged MarkAndSweep logic
>>
>>Thoughts?
>>
>>Chetan Mehrotra
>

Reply via email to