[ 
https://issues.apache.org/jira/browse/OAK-6333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044086#comment-16044086
 ] 

Chetan Mehrotra commented on OAK-6333:
--------------------------------------

Fixing this is simple but we need to decide if we should change the default 
behaviour or make it depend on some flag. For now I think

# trunk - Change the default behaviour
# Branches - Provide a flag to enable this behaviour

[~tmueller] [~catholicon] [~amitj_76] Thoughts?

> IndexPlanner should use actual entryCount instead of limiting it to 1000
> ------------------------------------------------------------------------
>
>                 Key: OAK-6333
>                 URL: https://issues.apache.org/jira/browse/OAK-6333
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8
>
>
> Currently IndexPlanner uses following logic for estimating the entryCount
> # If the index has fulltext indexing enable then 
> ## If {{entryCount}} value is defined then min(entryCount, numOfDocs)
> ## If not then use the {{numDocs}} i.e. actual entry count
> # If the index is pure property index i.e. none of the property definitions 
> have {{analyzed}} set to true
> ## If {{entryCount}} value is defined then min(entryCount, numOfDocs)
> ## Else Take min(1000, numDocs)
> Revisiting the logic for #2 it appears in 1.0.x days (OAK-2200) we capped it 
> to 1000 because cost estimation for property indexes was inaccurate (they 
> used to report low values causing lucene index to loose). 
> With support for Counters the cost estimation for property index has improved 
> and now we should remove this capping and let it use numDocs.
> One area where it causes issue is when we have two indexes where one is 
> superset of other. For e.g. /oak:index/asset and /content/en/ 
> /oak:index/asset where both have some matching properties. Logically if query 
> can be handled by sub index then it should get picked but currently either of 
> them can be picked making query plan undeterministic



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to