[
https://issues.apache.org/jira/browse/OAK-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068411#comment-16068411
]
Thomas Mueller commented on OAK-6381:
-------------------------------------
We should check if the Luke tool can be used:
*
https://jackrabbit.apache.org/oak/docs/query/lucene.html#Analyzing_created_Lucene_Index
* https://code.google.com/archive/p/luke/
> Improved index analysis tools
> -----------------------------
>
> Key: OAK-6381
> URL: https://issues.apache.org/jira/browse/OAK-6381
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Fix For: 1.8
>
>
> It would be good to have more tools to analyze indexes:
> * For Lucene indexes, get a histogram of samples (terms). We have
> "getFieldInfo", which shows which fields are how common, but we don't have
> terms. For example the /oak:index/lucene index contains 1 million fulltext
> fields and node names for 1 million nodes, but I wonder why, and what typical
> nodes names are, and maybe fulltext for most nodes is actually empty. Maybe a
> new method "getTermHistogram(int sampleCount)" or similar
> * For property indexes, number of updated nodes per second or so. Right now
> we can just analyze the counts per key, but some indexes / keys are very
> volatile (see many short lived entries)
> * For Lucene indexes, writes per second or so (in MB).
> * How indexes are used (approximate read nodes / MB per hours)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)