[
https://issues.apache.org/jira/browse/OAK-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vikas Saurabh updated OAK-6735:
-------------------------------
Attachment: OAK-6735.patch
Attaching implementation patch [^OAK-6735.patch]. It also fixes a few tests
which weren't adhering to new cost estimation based on actual data.
Apart, from estimated entry count now being min (doc count for prop restriction
field), there are 2 notable differences:
* costPerEntry override is gone now - it's 1+sortOrder.size() after the patch
* override weight property (OAK-5899) to down-scale cost estimation - the
default is "essentially" 1 where the cost for a given field = number of
documents indexed for that field. That's a worst case number (since we'd likely
be selecting some term and that numDocs won't be subtracting deleted docs) -
hence, downscaling should be the only required option.
TODO: Add tests
[~chetanm], [~tmueller], can you guys please take a quick took.
> Lucene Index: improved cost estimation by using document count per field
> ------------------------------------------------------------------------
>
> Key: OAK-6735
> URL: https://issues.apache.org/jira/browse/OAK-6735
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: lucene, query
> Affects Versions: 1.7.4
> Reporter: Thomas Mueller
> Assignee: Vikas Saurabh
> Fix For: 1.8
>
> Attachments: IndexReadPattern.txt, LuceneIndexReadPattern.java,
> OAK-6735.patch
>
>
> The cost estimation of the Lucene index is somewhat inaccurate because (by
> default) it just used the number of documents in the index (as of Oak 1.7.4
> by default, due to OAK-6333).
> Instead, it should use the number of documents for the given fields (the
> minimum, if there are multiple fields with restrictions).
> Plus divided by the number of restrictions (as we do now already).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)