[
https://issues.apache.org/jira/browse/OAK-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Mueller updated OAK-10577:
---------------------------------
Description:
Currently, we have very few metrics per repository, and most are for the whole
repository: total size, the total index sizes, datastore size. The only metric
we collect per path is the approximate number of nodes per path.
I would like to collect more data, first via a "flat file store" (sorted list
of node data), e.g.
* Approximate number of nodes per path.
* Approximate size of binaries per path.
* Histograms of binary sizes.
* The same, but for a filtered set of binaries.
* Number and size of distinct binaries.
* Number of distinct values per (indexed) property, and the top values. This is
useful to improve cost estimation (the "weight" property of indexes) and
estimate index sizes.
was:
Currently, we have very few metrics per repository, and most are for the whole
repository: total size, the total index sizes, datastore size. The only metric
we collect per path is the approximate number of nodes per path.
I would like to collect more data, first via a "flat file store" (sorted list
of node data), e.g.
* Approximate number of nodes per path.
* Approximate size of binaries per path.
* Histograms of binary sizes.
* The same, but for a filtered set of binaries.
* Size of distinct binaries.
> Advanced repository statistics
> ------------------------------
>
> Key: OAK-10577
> URL: https://issues.apache.org/jira/browse/OAK-10577
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: oak-run
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Priority: Major
>
> Currently, we have very few metrics per repository, and most are for the
> whole repository: total size, the total index sizes, datastore size. The only
> metric we collect per path is the approximate number of nodes per path.
> I would like to collect more data, first via a "flat file store" (sorted list
> of node data), e.g.
> * Approximate number of nodes per path.
> * Approximate size of binaries per path.
> * Histograms of binary sizes.
> * The same, but for a filtered set of binaries.
> * Number and size of distinct binaries.
> * Number of distinct values per (indexed) property, and the top values. This
> is useful to improve cost estimation (the "weight" property of indexes) and
> estimate index sizes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)