Thanks Amit for your insights.

Is it documented that DS GC is ineffective if no prior tar compaction
is performed? IMHO we should make this as clear as possible, because
the behaviour deviates from JR2 and thus has the potential to throw
lots of users. Possibly even mention it as a possible reason in the
log message if DS GC was ineffective.

Would it be possible to improve the heuristic without traversing the
node tree? I.e. do the segment tar files contain sufficient
information in their indexes to safely determine that some binary
references are dead? I'm looking for no false positives but possibly
many false negatives.

Regards
Julian


On Mon, Oct 3, 2016 at 10:37 AM, Amit Jain <am...@ieee.org> wrote:
> Hi,
>
> On Mon, Oct 3, 2016 at 1:29 PM, Julian Sedding <jsedd...@apache.org> wrote:
>
>> I just became aware that on a system configured with SegmentNodeStore
>> and FileDatastore a Datastore garbage collection can only free up
>> space *after* a Tar Compaction was run.
>>
>>
> Yes that is a pre-requisite.
>
>
>> I would like to discuss whether it is desirable to require a Tar
>> Compaction prior to a DS GC. If someone knows about the rationale
>> behind this behaviour, I would also appreciate these insights!
>>
>> The alternative behaviour, which I would have expected, is to collect
>> only binaries that are referenced from the root NodeState or any of
>> the checkpoint's root NodeStates (i.e. "live" NodeStates).
>>
>> From an implementation perspective, I assume that the current
>> behaviour can be implemented with better performance than a solution
>> that checks only "live" NodeStates. However, IMHO that should not be
>> the only relevant factor in the discussion.
>>
>
> I believe the performance impact of loading all nodes to check whether the
> node has a binary property
> is quite high. What you are referring to was how it is implemented in
> Jackrabbit and
> the reference collection phase took days on larger repositories. But with
> the NodeStore specific implementation for
> blob reference collection this phase takes only a few hours. For example
> there is also an enhancement already implemented in oak-segment-tar
> to have the index of binary reference OAK-4201.
>
> Thanks
> Amit

Reply via email to