I think you could do the same via a Groovy script. Depending on how
deep you want to dig into the lower layers you would need to hack your
way through though. The tooling I started building aims to simplify
this (but didn't fully succeed at it yet).
On 6 March 2018 at 22:06, Roy Teeuwen <r...@teeuwen.be> wrote:
> Hey Michael,
> Thanks for the info! I will have a look if I can still run the script on an
> oak 1.6.6, who knows :).
> Can you tell me what the difference would be in making a groovy script and
> running it in oak-run? Are there things you can't do in there that you can in
> the scala ammonite shell you use?
>> On 5 Mar 2018, at 13:59, Michael Dürig <mic...@gmail.com> wrote:
>> Unfortunately there is no good tooling at this point in time.
>> In the past I hacked something together, which might serve as a
>> starting point: https://github.com/mduerig/script-oak. This tooling
>> allows you to fire arbitrary queries at the segment store from the
>> Ammonite shell (a Scala REPL). Since this relies of a lot of
>> implementation details that keep changing the tooling is usually out
>> of sync with Oak. There is plans to improve this (see
>> https://issues.apache.org/jira/browse/OAK-6584), but so far not much
>> commitment in making his happen. Patches welcome though!
>> On 4 March 2018 at 15:22, Roy Teeuwen <r...@teeuwen.be> wrote:
>>> Hey guys,
>>> I am using Oak 1.6.6 with an authoring system and a few publish systems. We
>>> are using the latest TarMK that is available on the 1.6.6 branch and also
>>> using the separate file datastore instead of embedded in the segment store.
>>> What I have noticed so far is that the segment store of the author is 16GB
>>> with 165GB datastore while the publishes are 1.5GB with only 50GB
>>> datastore. I would like to investigate where the big difference is between
>>> those two systems, seeing as all the content nodes are as good as all
>>> published. The offline compaction happens daily so that can't be the
>>> problem, also the online compaction is enabled. Are there any tools /
>>> methods available to list out what the disk usage is of every node? This
>>> being both in the segmentstore and the related datastore files? I can make
>>> wild guesses as to it being for example sling event / job nodes and stuff
>>> like that but I would like some real numbers.