Hi, I just came across the org.apache.jackrabbit.oak.segment.RecordUsageAnalyser class in Oak, which I completely forgot about before. I think you can use that one to parse nodes and have it list some statistics about them. Alternatively you should be able to relatively easy come up with your own tooling based on org.apache.jackrabbit.oak.segment.SegmentParser (which is also the base for RecordUsageAnalyser). Please take care though, these tools are not very deeply tested and any results obtained by them should be placed under scrutiny.
Michael On 04.03.18 15:22, Roy Teeuwen wrote: > Hey guys, > > I am using Oak 1.6.6 with an authoring system and a few publish systems. We > are using the latest TarMK that is available on the 1.6.6 branch and also > using the separate file datastore instead of embedded in the segment store. > > What I have noticed so far is that the segment store of the author is 16GB > with 165GB datastore while the publishes are 1.5GB with only 50GB datastore. > I would like to investigate where the big difference is between those two > systems, seeing as all the content nodes are as good as all published. The > offline compaction happens daily so that can't be the problem, also the > online compaction is enabled. Are there any tools / methods available to list > out what the disk usage is of every node? This being both in the segmentstore > and the related datastore files? I can make wild guesses as to it being for > example sling event / job nodes and stuff like that but I would like some > real numbers. > > Thanks! > Roy >