[
https://issues.apache.org/jira/browse/OAK-11478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926377#comment-17926377
]
Thomas Mueller commented on OAK-11478:
--------------------------------------
Usage:
{noformat}
java -cp oak-run-1.77-SNAPSHOT.jar
org.apache.jackrabbit.oak.index.indexer.document.flatfile.analysis.StatsBuilder
--fileName <treeStoreFileName> --treeStore
{noformat}
Where the tree store file name is eg. "r194f8dfc943-0-1.merged-tree.lz4"
Output: first, the progress is listed, in million nodes (with path). Then there
are different sections.
* NodeCount: number of nodes if more than one million.
* PropertyStats: for each property the count, approximate distinct values, avg
and max size.
* NodeTypeCount: number of nodes with the various primary types and mixins.
* BinarySize: references (in blob store) and embedded (small binaries), in GB,
per path
* BinarySizeHistogram: histogram of binary sizes (approximation) for references
and embedded
* TopLargestBinaries: top 10 largest binaries
* DistinctBinarySizeHistogram: histogram for approximate counts of binaries,
and approximate distinct counts
* DistinctBinarySize: approximate counts of binaries, and approximate distinct
counts
The following means there are around 4 GB of binary references, and 2 GB in
/content/dam and 2 GB in the version store. The DistinctBinarySize shows that
only ~2 GB are distinct binaries (multiple references can point to the same
binary):
{noformat}
BinarySize references in GB (resolution: 100000000)
/: 4
/content: 2
/content/dam: 2
/content/dam/projects: 1
/content/dam/projects/translation: 1
/jcr:system: 2
/jcr:system/jcr:versionStorage: 2
DistinctBinarySize
total distinct count: 33866
total distinct size GiB: 2
total reference count: 117717
total reference size GiB: 4
{noformat}
> Node store statistics: support the tree store
> ---------------------------------------------
>
> Key: OAK-11478
> URL: https://issues.apache.org/jira/browse/OAK-11478
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: indexing
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Priority: Major
>
> There is a statistics collector in oak-run-common that I use sometimes.
> It is currently missing support for tree stores.
> This issue is about adding support for this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)