[jira] [Updated] (OAK-6735) Lucene Index: improved cost estimation by using document count per field

Vikas Saurabh (JIRA) Mon, 30 Oct 2017 19:35:12 -0700

     [ 
https://issues.apache.org/jira/browse/OAK-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vikas Saurabh updated OAK-6735:
-------------------------------
    Attachment: OAK-6735.patch

Attaching implementation patch [^OAK-6735.patch]. It also fixes a few tests 
which weren't adhering to new cost estimation based on actual data.

Apart, from estimated entry count now being min (doc count for prop restriction 
field), there are 2 notable differences:
* costPerEntry override is gone now - it's 1+sortOrder.size() after the patch
* override weight property (OAK-5899) to down-scale cost estimation - the 
default is "essentially" 1 where the cost for a given field = number of 
documents indexed for that field. That's a worst case number (since we'd likely 
be selecting some term and that numDocs won't be subtracting deleted docs) - 
hence, downscaling should be the only required option.

TODO: Add tests

[~chetanm], [~tmueller], can you guys please take a quick took.

> Lucene Index: improved cost estimation by using document count per field
> ------------------------------------------------------------------------
>
>                 Key: OAK-6735
>                 URL: https://issues.apache.org/jira/browse/OAK-6735
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene, query
>    Affects Versions: 1.7.4
>            Reporter: Thomas Mueller
>            Assignee: Vikas Saurabh
>             Fix For: 1.8
>
>         Attachments: IndexReadPattern.txt, LuceneIndexReadPattern.java, 
> OAK-6735.patch
>
>
> The cost estimation of the Lucene index is somewhat inaccurate because (by 
> default) it just used the number of documents in the index (as of Oak 1.7.4 
> by default, due to OAK-6333).
> Instead, it should use the number of documents for the given fields (the 
> minimum, if there are multiple fields with restrictions). 
> Plus divided by the number of restrictions (as we do now already).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (OAK-6735) Lucene Index: improved cost estimation by using document count per field

Reply via email to