David Smiley created SOLR-5093:
----------------------------------

             Summary: Rewrite field:* to use the filter cache
                 Key: SOLR-5093
                 URL: https://issues.apache.org/jira/browse/SOLR-5093
             Project: Solr
          Issue Type: New Feature
          Components: query parsers
            Reporter: David Smiley


Sometimes people writes a query including something like {{field:*}} which 
matches all documents that have an indexed value in that field.  That can be 
particularly expensive for tokenized text, numeric, and spatial fields.  The 
expert advise is to index a separate boolean field that is used in place of 
these query clauses, but that's annoying to do and it can take users a while to 
realize that's what they need to do.

I propose that Solr's query parser rewrite such queries to return a query 
backed by Solr's filter cache.  The underlying query happens once (and it's 
slow this time) and then it's cached after which it's super-fast to reuse.  
Unfortunately Solr's filter cache is currently index global, not per-segment; 
that's being handled in a separate issue.  

Related to this, it may be worth considering if Solr should behind the scenes 
index a field that records which fields have indexed values, and then it could 
use this indexed data to power these queries so they are always fast to 
execute.  Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly 
use this.

For an example of how a user bumped into this, see:
http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to