Hey Michael,

Cool, thanks!

I haven't used the internals of Oak itself yet, so I will have to do some 
initial knowledge gathering on how to get a record id of a specific jcr Node, 
but I guess that won't be the hardest work.

Thanks,
Roy

> On 7 Mar 2018, at 15:47, Michael Dürig <mdue...@apache.org> wrote:
> 
> Hi,
> 
> I just came across the
> org.apache.jackrabbit.oak.segment.RecordUsageAnalyser class in Oak,
> which I completely forgot about before. I think you can use that one to
> parse nodes and have it list some statistics about them. Alternatively
> you should be able to relatively easy come up with your own tooling
> based on org.apache.jackrabbit.oak.segment.SegmentParser (which is also
> the base for RecordUsageAnalyser). Please take care though, these tools
> are not very deeply tested and any results obtained by them should be
> placed under scrutiny.
> 
> Michael
> 
> On 04.03.18 15:22, Roy Teeuwen wrote:
>> Hey guys,
>> 
>> I am using Oak 1.6.6 with an authoring system and a few publish systems. We 
>> are using the latest TarMK that is available on the 1.6.6 branch and also 
>> using the separate file datastore instead of embedded in the segment store.
>> 
>> What I have noticed so far is that the segment store of the author is 16GB 
>> with 165GB datastore while the publishes are 1.5GB with only 50GB datastore. 
>> I would like to investigate where the big difference is between those two 
>> systems, seeing as all the content nodes are as good as all published. The 
>> offline compaction happens daily so that can't be the problem, also the 
>> online compaction is enabled. Are there any tools / methods available to 
>> list out what the disk usage is of every node? This being both in the 
>> segmentstore and the related datastore files? I can make wild guesses as to 
>> it being for example sling event / job nodes and stuff like that but I would 
>> like some real numbers.
>> 
>> Thanks!
>> Roy
>> 

Reply via email to