[ 
https://issues.apache.org/jira/browse/LUCENE-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-7897.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 7.1
                   master (8.0)

> RangeQuery optimization in IndexOrDocValuesQuery 
> -------------------------------------------------
>
>                 Key: LUCENE-7897
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7897
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: trunk, 7.0
>            Reporter: Murali Krishna P
>             Fix For: master (8.0), 7.1
>
>         Attachments: LUCENE-7897.patch
>
>
> For range queries, Lucene uses either Points or Docvalues based on cost 
> estimation 
> (https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/search/IndexOrDocValuesQuery.html).
>  Scorer is chosen based on the minCost here: 
> https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java#L16
> However, the cost calculation for TermQuery and IndexOrDocvalueQuery seems to 
> have same weightage. Essentially, cost depends upon the docfreq in TermDict, 
> number of points visited and number of docvalues. In a situation where 
> docfreq is not too restrictive, this is lot of lookups for docvalues and 
> using points would have been better.
> Following query with 1M matches, takes 60ms with docvalues, but only 27ms 
> with points. If I change the query to "message:*", which matches all docs, it 
> choses the points(since cost is same), but with message:xyz it choses 
> docvalues eventhough doc frequency is 1million which results in many docvalue 
> fetches. Would it make sense to change the cost of docvalues query to be 
> higher or use points if the docfreq is too high for the term query(find an 
> optimum threshold where points cost < docvalue cost)?
> {noformat}
> {
>   "query": {
>     "bool": {
>       "must": [
>         {
>           "query_string": {
>             "query": "message:xyz"
>           }
>         },
>         {
>           "range": {
>             "@timestamp": {
>               "gte": 1498652400000,
>               "lte": 1498905000000,
>               "format": "epoch_millis"
>             }
>           }
>         }
>       ]
>     }
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to