[
https://issues.apache.org/jira/browse/OAK-9811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562678#comment-17562678
]
Thomas Mueller edited comment on OAK-9811 at 7/15/22 1:43 PM:
--------------------------------------------------------------
{noformat}
/oak:index/statistics @resolution = 1000 (default)
/oak:index/statistics/index/1/propertiesCountMinSketch { @p0_0, @p0_1, @p1_0,
@p1_1 }
/oak:index/statistics/index/1/props
/oak:index/statistics/index/1/props/jcr:createdBy { count: 10240, uniqueHLL:
454543 }
/oak:index/statistics/index/1/props/jcr:primaryType { ... }
/oak:index/statistics/index/1/props/hidden
/oak:index/statistics/index/2/props
/oak:index/statistics/index/2/props/jcr:createdBy { count: 10240, uniqueHLL:
454543 }
/oak:index/statistics/index/2/props/jcr:primaryType { ... }
/oak:index/statistics/index/2/props/hidden
{noformat}
The "index" will later be renamed to ":index" in order to hide it from users.
But for debugging, it's easier if it's not hidden currently.
* resolution: same as for the counter index
* propertiesCountMinSketch: count min sketch data structure for the counts
* props: properties (children are individual properties)
was (Author: tmueller):
{noformat}
/oak:index/statistics @resolution = 1000 (default)
/oak:index/statistics/index/1 @propertiesCountMinSketch
/oak:index/statistics/index/1/props
/oak:index/statistics/index/1/props/jcr:createdBy { count: 10240, uniqueHLL:
454543 }
/oak:index/statistics/index/1/props/jcr:primaryType { ... }
/oak:index/statistics/index/1/props/hidden
/oak:index/statistics/index/2/props
/oak:index/statistics/index/2/props/jcr:createdBy { count: 10240, uniqueHLL:
454543 }
/oak:index/statistics/index/2/props/jcr:primaryType { ... }
/oak:index/statistics/index/2/props/hidden
{noformat}
The "index" will later be renamed to ":index" in order to hide it from users.
But for debugging, it's easier if it's not hidden currently.
* resolution: same as for the counter index
* propertiesCountMinSketch: count min sketch data structure for the counts
* props: properties (children are individual properties)
> Statistics index
> ----------------
>
> Key: OAK-9811
> URL: https://issues.apache.org/jira/browse/OAK-9811
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: indexing, query
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Priority: Major
>
> Queries should be as fast as possible:
> * They should read as little data as possible (low I/O)
> * Network roundtrips should be reduced (see also OAK-9780)
> * In-memory processing should be fast (low CPU usage)
> To do that:
> * Queries needs to _have_ the right indexes. Possibly indexes need to be
> added (which might be a manual task, or semi-automated, or fully automated).
> For a developer, it would also be good to know how fast a query could be, if
> an index is added.
> * Queries should _use_ the right indexes. Sometimes multiple indexes can be
> used.
> * Queries should use the right execution plan (for example: a join can be
> evaluated in multiple ways).
> For this, it is great to have accurate statistics. We currently have
> statistics about number of nodes per path ([approximate
> counter|https://github.com/apache/jackrabbit-oak/tree/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/counter]),
> and document statistics for Lucene and Elastic indexes.
> But we don't have statistics for _unindexed_ data currently. That would be
> good to have: which property (by property name) is how common? How many
> distinct values are there per property? What is the histogram? And so on.
> For this, something like the counter index could be added, that is updated
> using a streaming algorithm. We need to ensure the number of writes to this
> index is low (e.g. less than 1% of the overall writes), and memory usage is
> very low. There are a number of such libraries, but arguably we could
> implement this ourselves, as our use case is untypical (reduced number of
> writes, reduced memory usage). https://github.com/thomasmueller/tinyStats and
> related libraries could be used as a starting point.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)