[
https://issues.apache.org/jira/browse/OAK-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Davide Giannella updated OAK-7300:
----------------------------------
Fix Version/s: 1.16.0
> Lucene Index: per-column selectivity to improve cost estimation
> ---------------------------------------------------------------
>
> Key: OAK-7300
> URL: https://issues.apache.org/jira/browse/OAK-7300
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: lucene, query
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Priority: Major
> Fix For: 1.14.0, 1.16.0
>
>
> In OAK-6735 we have improved cost estimation for Lucene indexes, however the
> following case is still not working as expected: a very common property is
> indexes (many nodes have that property), and each value of that property is
> more or less unique. In this case, currently the cost estimation is the total
> number of documents that contain that property. Assuming the condition
> "property is not null" this is correct, however for the common case "property
> = x" the estimated cost is far too high.
> A known workaround is to set the "costPerEntry" for the given index to a low
> value, for example 0.2. However this isn't a good solution, as it affects all
> properties and queries.
> It would be good to be able to set the selectivity per property, for example
> by specifying the number of distinct values, or (better yet) the average
> number of entries for a given key (1 for unique values, 2 meaning for each
> distinct values there are two documents on average).
> That value can be set manually (cost override), and it can be set
> automatically, e.g. when building the index, or updated from time to time
> during the index update, using a cardinality
> estimation algorithm. That doesn't have to be accurate; we could use an rough
> approximation such as hyperbitbit.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)