[
https://issues.apache.org/jira/browse/OAK-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793295#comment-17793295
]
Thomas Mueller commented on OAK-10577:
--------------------------------------
PR (work in progress): https://github.com/apache/jackrabbit-oak/pull/1247
> Advanced repository statistics
> ------------------------------
>
> Key: OAK-10577
> URL: https://issues.apache.org/jira/browse/OAK-10577
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: oak-run
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Priority: Major
>
> Currently, we have very few metrics per repository, and most are for the
> whole repository: total size, the total index sizes, datastore size. The only
> metric we collect per path is the approximate number of nodes per path.
> I would like to collect more data, first via a "flat file store" (sorted list
> of node data), e.g.
> * Approximate number of nodes per path.
> * Approximate size of binaries per path.
> * Histograms of binary sizes.
> * The same, but for a filtered set of binaries.
> * Approximate number and size of distinct binaries.
> * Number of distinct values per (indexed) property, and the top values. This
> is useful to improve cost estimation (the "weight" property of indexes) and
> estimate index sizes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)