[jira] [Updated] (OAK-10577) Advanced repository statistics

Thomas Mueller (Jira) Mon, 04 Dec 2023 08:40:11 -0800


     [ 
https://issues.apache.org/jira/browse/OAK-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thomas Mueller updated OAK-10577:
---------------------------------
    Description: 
Currently, we have very few metrics per repository, and most are for the whole 
repository: total size, the total index sizes, datastore size. The only metric 
we collect per path is the approximate number of nodes per path.

I would like to collect more data, first via a "flat file store" (sorted list 
of node data), e.g.

* Approximate number of nodes per path.
* Approximate size of binaries per path.
* Histograms of binary sizes.
* The same, but for a filtered set of binaries.
* Number and size of distinct binaries.
* Number of distinct values per (indexed) property, and the top values. This is 
useful to improve cost estimation (the "weight" property of indexes) and 
estimate index sizes.


  was:
Currently, we have very few metrics per repository, and most are for the whole 
repository: total size, the total index sizes, datastore size. The only metric 
we collect per path is the approximate number of nodes per path.

I would like to collect more data, first via a "flat file store" (sorted list 
of node data), e.g.

* Approximate number of nodes per path.
* Approximate size of binaries per path.
* Histograms of binary sizes.
* The same, but for a filtered set of binaries.
* Size of distinct binaries.


> Advanced repository statistics
> ------------------------------
>
>                 Key: OAK-10577
>                 URL: https://issues.apache.org/jira/browse/OAK-10577
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: oak-run
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Major
>
> Currently, we have very few metrics per repository, and most are for the 
> whole repository: total size, the total index sizes, datastore size. The only 
> metric we collect per path is the approximate number of nodes per path.
> I would like to collect more data, first via a "flat file store" (sorted list 
> of node data), e.g.
> * Approximate number of nodes per path.
> * Approximate size of binaries per path.
> * Histograms of binary sizes.
> * The same, but for a filtered set of binaries.
> * Number and size of distinct binaries.
> * Number of distinct values per (indexed) property, and the top values. This 
> is useful to improve cost estimation (the "weight" property of indexes) and 
> estimate index sizes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (OAK-10577) Advanced repository statistics

Reply via email to