Hi,

On Mon, Oct 3, 2016 at 1:29 PM, Julian Sedding <[email protected]> wrote:

> I just became aware that on a system configured with SegmentNodeStore
> and FileDatastore a Datastore garbage collection can only free up
> space *after* a Tar Compaction was run.
>
>
Yes that is a pre-requisite.


> I would like to discuss whether it is desirable to require a Tar
> Compaction prior to a DS GC. If someone knows about the rationale
> behind this behaviour, I would also appreciate these insights!
>
> The alternative behaviour, which I would have expected, is to collect
> only binaries that are referenced from the root NodeState or any of
> the checkpoint's root NodeStates (i.e. "live" NodeStates).
>
> From an implementation perspective, I assume that the current
> behaviour can be implemented with better performance than a solution
> that checks only "live" NodeStates. However, IMHO that should not be
> the only relevant factor in the discussion.
>

I believe the performance impact of loading all nodes to check whether the
node has a binary property
is quite high. What you are referring to was how it is implemented in
Jackrabbit and
the reference collection phase took days on larger repositories. But with
the NodeStore specific implementation for
blob reference collection this phase takes only a few hours. For example
there is also an enhancement already implemented in oak-segment-tar
to have the index of binary reference OAK-4201.

Thanks
Amit

Reply via email to