Hey Michael, Cool, thanks!
I haven't used the internals of Oak itself yet, so I will have to do some initial knowledge gathering on how to get a record id of a specific jcr Node, but I guess that won't be the hardest work. Thanks, Roy > On 7 Mar 2018, at 15:47, Michael Dürig <mdue...@apache.org> wrote: > > Hi, > > I just came across the > org.apache.jackrabbit.oak.segment.RecordUsageAnalyser class in Oak, > which I completely forgot about before. I think you can use that one to > parse nodes and have it list some statistics about them. Alternatively > you should be able to relatively easy come up with your own tooling > based on org.apache.jackrabbit.oak.segment.SegmentParser (which is also > the base for RecordUsageAnalyser). Please take care though, these tools > are not very deeply tested and any results obtained by them should be > placed under scrutiny. > > Michael > > On 04.03.18 15:22, Roy Teeuwen wrote: >> Hey guys, >> >> I am using Oak 1.6.6 with an authoring system and a few publish systems. We >> are using the latest TarMK that is available on the 1.6.6 branch and also >> using the separate file datastore instead of embedded in the segment store. >> >> What I have noticed so far is that the segment store of the author is 16GB >> with 165GB datastore while the publishes are 1.5GB with only 50GB datastore. >> I would like to investigate where the big difference is between those two >> systems, seeing as all the content nodes are as good as all published. The >> offline compaction happens daily so that can't be the problem, also the >> online compaction is enabled. Are there any tools / methods available to >> list out what the disk usage is of every node? This being both in the >> segmentstore and the related datastore files? I can make wild guesses as to >> it being for example sling event / job nodes and stuff like that but I would >> like some real numbers. >> >> Thanks! >> Roy >>