I haven't used the internals of Oak itself yet, so I will have to do some
initial knowledge gathering on how to get a record id of a specific jcr Node,
but I guess that won't be the hardest work.
> On 7 Mar 2018, at 15:47, Michael Dürig <mdue...@apache.org> wrote:
> I just came across the
> org.apache.jackrabbit.oak.segment.RecordUsageAnalyser class in Oak,
> which I completely forgot about before. I think you can use that one to
> parse nodes and have it list some statistics about them. Alternatively
> you should be able to relatively easy come up with your own tooling
> based on org.apache.jackrabbit.oak.segment.SegmentParser (which is also
> the base for RecordUsageAnalyser). Please take care though, these tools
> are not very deeply tested and any results obtained by them should be
> placed under scrutiny.
> On 04.03.18 15:22, Roy Teeuwen wrote:
>> Hey guys,
>> I am using Oak 1.6.6 with an authoring system and a few publish systems. We
>> are using the latest TarMK that is available on the 1.6.6 branch and also
>> using the separate file datastore instead of embedded in the segment store.
>> What I have noticed so far is that the segment store of the author is 16GB
>> with 165GB datastore while the publishes are 1.5GB with only 50GB datastore.
>> I would like to investigate where the big difference is between those two
>> systems, seeing as all the content nodes are as good as all published. The
>> offline compaction happens daily so that can't be the problem, also the
>> online compaction is enabled. Are there any tools / methods available to
>> list out what the disk usage is of every node? This being both in the
>> segmentstore and the related datastore files? I can make wild guesses as to
>> it being for example sling event / job nodes and stuff like that but I would
>> like some real numbers.