[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache
[ https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729536#comment-13729536 ] David Smiley commented on SOLR-5093: I guess I change my mind; the veto arguments are good. Mikhail, I like your idea on making a sub-clause be filter-cache'able. But I don't think it should be a separate query parser because it's an orthogonal issue to how the query is parsed. Perhaps a special local-param filterCache=true. Your example would become: {noformat} q=bee:blah OR {! filterCache=true}foo:bar OR {! filterCache=true}foo:bar {noformat} A key thing to document would not only be that this clause would be cached in the filter-cache, but also that it would constant-score. Rewrite field:* to use the filter cache --- Key: SOLR-5093 URL: https://issues.apache.org/jira/browse/SOLR-5093 Project: Solr Issue Type: New Feature Components: query parsers Reporter: David Smiley Sometimes people writes a query including something like {{field:*}} which matches all documents that have an indexed value in that field. That can be particularly expensive for tokenized text, numeric, and spatial fields. The expert advise is to index a separate boolean field that is used in place of these query clauses, but that's annoying to do and it can take users a while to realize that's what they need to do. I propose that Solr's query parser rewrite such queries to return a query backed by Solr's filter cache. The underlying query happens once (and it's slow this time) and then it's cached after which it's super-fast to reuse. Unfortunately Solr's filter cache is currently index global, not per-segment; that's being handled in a separate issue. Related to this, it may be worth considering if Solr should behind the scenes index a field that records which fields have indexed values, and then it could use this indexed data to power these queries so they are always fast to execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly use this. For an example of how a user bumped into this, see: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache
[ https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729793#comment-13729793 ] Robert Muir commented on SOLR-5093: --- I dont think that would work: {quote} There may only be one LocalParams prefix per argument, preventing the need for any escaping of the original argument. {quote} http://wiki.apache.org/solr/LocalParams Rewrite field:* to use the filter cache --- Key: SOLR-5093 URL: https://issues.apache.org/jira/browse/SOLR-5093 Project: Solr Issue Type: New Feature Components: query parsers Reporter: David Smiley Sometimes people writes a query including something like {{field:*}} which matches all documents that have an indexed value in that field. That can be particularly expensive for tokenized text, numeric, and spatial fields. The expert advise is to index a separate boolean field that is used in place of these query clauses, but that's annoying to do and it can take users a while to realize that's what they need to do. I propose that Solr's query parser rewrite such queries to return a query backed by Solr's filter cache. The underlying query happens once (and it's slow this time) and then it's cached after which it's super-fast to reuse. Unfortunately Solr's filter cache is currently index global, not per-segment; that's being handled in a separate issue. Related to this, it may be worth considering if Solr should behind the scenes index a field that records which fields have indexed values, and then it could use this indexed data to power these queries so they are always fast to execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly use this. For an example of how a user bumped into this, see: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache
[ https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729873#comment-13729873 ] Mikhail Khludnev commented on SOLR-5093: [~rcmuir] I think that would SOLR-4093 Can anyone confess to {! sep=true} which is backed by ExtendedQuery.getCacheSep()? Isn't it somehow related to the discussed challenge? Rewrite field:* to use the filter cache --- Key: SOLR-5093 URL: https://issues.apache.org/jira/browse/SOLR-5093 Project: Solr Issue Type: New Feature Components: query parsers Reporter: David Smiley Sometimes people writes a query including something like {{field:*}} which matches all documents that have an indexed value in that field. That can be particularly expensive for tokenized text, numeric, and spatial fields. The expert advise is to index a separate boolean field that is used in place of these query clauses, but that's annoying to do and it can take users a while to realize that's what they need to do. I propose that Solr's query parser rewrite such queries to return a query backed by Solr's filter cache. The underlying query happens once (and it's slow this time) and then it's cached after which it's super-fast to reuse. Unfortunately Solr's filter cache is currently index global, not per-segment; that's being handled in a separate issue. Related to this, it may be worth considering if Solr should behind the scenes index a field that records which fields have indexed values, and then it could use this indexed data to power these queries so they are always fast to execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly use this. For an example of how a user bumped into this, see: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache
[ https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729879#comment-13729879 ] Yonik Seeley commented on SOLR-5093: bq. Can anyone confess to {! sep=true} It's a placeholder that currently does nothing (and is undocumented)... ignore it, or remove it if it bothers people ;-) Rewrite field:* to use the filter cache --- Key: SOLR-5093 URL: https://issues.apache.org/jira/browse/SOLR-5093 Project: Solr Issue Type: New Feature Components: query parsers Reporter: David Smiley Sometimes people writes a query including something like {{field:*}} which matches all documents that have an indexed value in that field. That can be particularly expensive for tokenized text, numeric, and spatial fields. The expert advise is to index a separate boolean field that is used in place of these query clauses, but that's annoying to do and it can take users a while to realize that's what they need to do. I propose that Solr's query parser rewrite such queries to return a query backed by Solr's filter cache. The underlying query happens once (and it's slow this time) and then it's cached after which it's super-fast to reuse. Unfortunately Solr's filter cache is currently index global, not per-segment; that's being handled in a separate issue. Related to this, it may be worth considering if Solr should behind the scenes index a field that records which fields have indexed values, and then it could use this indexed data to power these queries so they are always fast to execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly use this. For an example of how a user bumped into this, see: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache
[ https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725094#comment-13725094 ] Mikhail Khludnev commented on SOLR-5093: I agree with vetoes. but in a rare cases users need q=bee:blah *OR* pp:\* there is also a jira to handle fq disjunction like fq=foo:bar OR foo:baz. We can deliver simple qparser and use it like q=bee:blah OR _query_:{!fq}foo:bar OR _query_:{!fq}foo:bar it keeps syntax crazy enough. that's great. Do you like to accept it ? Afterwards, we can allow BS in Solr to handle filters disjunction efficiently. Rewrite field:* to use the filter cache --- Key: SOLR-5093 URL: https://issues.apache.org/jira/browse/SOLR-5093 Project: Solr Issue Type: New Feature Components: query parsers Reporter: David Smiley Sometimes people writes a query including something like {{field:*}} which matches all documents that have an indexed value in that field. That can be particularly expensive for tokenized text, numeric, and spatial fields. The expert advise is to index a separate boolean field that is used in place of these query clauses, but that's annoying to do and it can take users a while to realize that's what they need to do. I propose that Solr's query parser rewrite such queries to return a query backed by Solr's filter cache. The underlying query happens once (and it's slow this time) and then it's cached after which it's super-fast to reuse. Unfortunately Solr's filter cache is currently index global, not per-segment; that's being handled in a separate issue. Related to this, it may be worth considering if Solr should behind the scenes index a field that records which fields have indexed values, and then it could use this indexed data to power these queries so they are always fast to execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly use this. For an example of how a user bumped into this, see: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache
[ https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724393#comment-13724393 ] Robert Muir commented on SOLR-5093: --- Err, this user already had this in their FQ. So if they had a filtercache, he'd be using it. he should pull that slow piece to a separate FQ so its cached by itself. I don't understand why the queryparser needs to do anything else here (especially any trappy auto-caching) Rewrite field:* to use the filter cache --- Key: SOLR-5093 URL: https://issues.apache.org/jira/browse/SOLR-5093 Project: Solr Issue Type: New Feature Components: query parsers Reporter: David Smiley Sometimes people writes a query including something like {{field:*}} which matches all documents that have an indexed value in that field. That can be particularly expensive for tokenized text, numeric, and spatial fields. The expert advise is to index a separate boolean field that is used in place of these query clauses, but that's annoying to do and it can take users a while to realize that's what they need to do. I propose that Solr's query parser rewrite such queries to return a query backed by Solr's filter cache. The underlying query happens once (and it's slow this time) and then it's cached after which it's super-fast to reuse. Unfortunately Solr's filter cache is currently index global, not per-segment; that's being handled in a separate issue. Related to this, it may be worth considering if Solr should behind the scenes index a field that records which fields have indexed values, and then it could use this indexed data to power these queries so they are always fast to execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly use this. For an example of how a user bumped into this, see: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache
[ https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724405#comment-13724405 ] Jack Krupansky commented on SOLR-5093: -- Some time ago I had suggested a related approach: LUCENE-4386 - Query parser should generate FieldValueFilter for pure wildcard terms to boost query performance. There were objections from the Lucene guys, but now that the Solr query parser is divorced from Lucene, maybe it could be reconsidered. I couldn't testify as to the relative merits of using the filter cache vs. the FieldValueFilter. Rewrite field:* to use the filter cache --- Key: SOLR-5093 URL: https://issues.apache.org/jira/browse/SOLR-5093 Project: Solr Issue Type: New Feature Components: query parsers Reporter: David Smiley Sometimes people writes a query including something like {{field:*}} which matches all documents that have an indexed value in that field. That can be particularly expensive for tokenized text, numeric, and spatial fields. The expert advise is to index a separate boolean field that is used in place of these query clauses, but that's annoying to do and it can take users a while to realize that's what they need to do. I propose that Solr's query parser rewrite such queries to return a query backed by Solr's filter cache. The underlying query happens once (and it's slow this time) and then it's cached after which it's super-fast to reuse. Unfortunately Solr's filter cache is currently index global, not per-segment; that's being handled in a separate issue. Related to this, it may be worth considering if Solr should behind the scenes index a field that records which fields have indexed values, and then it could use this indexed data to power these queries so they are always fast to execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly use this. For an example of how a user bumped into this, see: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache
[ https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724422#comment-13724422 ] Robert Muir commented on SOLR-5093: --- Those same lucene guys are not afraid to object here either. This user just has to pull out AND pp:* into another fq of pp:* {quote} (Each filter is executed and cached separately. When it's time to use them to limit the number of results returned by a query, this is done using set intersections.) {quote} http://wiki.apache.org/solr/SolrCaching#filterCache Rewrite field:* to use the filter cache --- Key: SOLR-5093 URL: https://issues.apache.org/jira/browse/SOLR-5093 Project: Solr Issue Type: New Feature Components: query parsers Reporter: David Smiley Sometimes people writes a query including something like {{field:*}} which matches all documents that have an indexed value in that field. That can be particularly expensive for tokenized text, numeric, and spatial fields. The expert advise is to index a separate boolean field that is used in place of these query clauses, but that's annoying to do and it can take users a while to realize that's what they need to do. I propose that Solr's query parser rewrite such queries to return a query backed by Solr's filter cache. The underlying query happens once (and it's slow this time) and then it's cached after which it's super-fast to reuse. Unfortunately Solr's filter cache is currently index global, not per-segment; that's being handled in a separate issue. Related to this, it may be worth considering if Solr should behind the scenes index a field that records which fields have indexed values, and then it could use this indexed data to power these queries so they are always fast to execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly use this. For an example of how a user bumped into this, see: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache
[ https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724428#comment-13724428 ] David Smiley commented on SOLR-5093: Rob, You're right for this particular user's use-case that I mentioned. I overlooked that aspect of his query. Nonetheless, I don't think that negates the usefulness of what I propose in this issue though. If you consider auto-caching trappy then you probably don't like Solr very much at all then. Rewrite field:* to use the filter cache --- Key: SOLR-5093 URL: https://issues.apache.org/jira/browse/SOLR-5093 Project: Solr Issue Type: New Feature Components: query parsers Reporter: David Smiley Sometimes people writes a query including something like {{field:*}} which matches all documents that have an indexed value in that field. That can be particularly expensive for tokenized text, numeric, and spatial fields. The expert advise is to index a separate boolean field that is used in place of these query clauses, but that's annoying to do and it can take users a while to realize that's what they need to do. I propose that Solr's query parser rewrite such queries to return a query backed by Solr's filter cache. The underlying query happens once (and it's slow this time) and then it's cached after which it's super-fast to reuse. Unfortunately Solr's filter cache is currently index global, not per-segment; that's being handled in a separate issue. Related to this, it may be worth considering if Solr should behind the scenes index a field that records which fields have indexed values, and then it could use this indexed data to power these queries so they are always fast to execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly use this. For an example of how a user bumped into this, see: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache
[ https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724430#comment-13724430 ] Jack Krupansky commented on SOLR-5093: -- bq. This user just has to pull out AND pp:* into another fq of pp:* Exactly! That's what we (non-Lucene guys) are trying to do - eliminate the need for users to have to do that kind of manual optimization. We want Solr to behave as optimally as possibly OOTB. Rewrite field:* to use the filter cache --- Key: SOLR-5093 URL: https://issues.apache.org/jira/browse/SOLR-5093 Project: Solr Issue Type: New Feature Components: query parsers Reporter: David Smiley Sometimes people writes a query including something like {{field:*}} which matches all documents that have an indexed value in that field. That can be particularly expensive for tokenized text, numeric, and spatial fields. The expert advise is to index a separate boolean field that is used in place of these query clauses, but that's annoying to do and it can take users a while to realize that's what they need to do. I propose that Solr's query parser rewrite such queries to return a query backed by Solr's filter cache. The underlying query happens once (and it's slow this time) and then it's cached after which it's super-fast to reuse. Unfortunately Solr's filter cache is currently index global, not per-segment; that's being handled in a separate issue. Related to this, it may be worth considering if Solr should behind the scenes index a field that records which fields have indexed values, and then it could use this indexed data to power these queries so they are always fast to execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly use this. For an example of how a user bumped into this, see: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache
[ https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724443#comment-13724443 ] Robert Muir commented on SOLR-5093: --- Solr today doesn't auto-cache. You can specify that you intend for a query to act only as a filter with fqs, control the caching behavior of these fqs, and so on. So there is no need to add any additional auto-caching in the queryparser. Things like LUCENE-4386 would just cause filter cache insanity where its cached in duplicate places (on FieldCache.docsWithField as well as in fq bitsets). Auto-caching things in the query can easily pollute the cache with stuff thats not actually intended to be reused: then it doesn't really work at all. Rewrite field:* to use the filter cache --- Key: SOLR-5093 URL: https://issues.apache.org/jira/browse/SOLR-5093 Project: Solr Issue Type: New Feature Components: query parsers Reporter: David Smiley Sometimes people writes a query including something like {{field:*}} which matches all documents that have an indexed value in that field. That can be particularly expensive for tokenized text, numeric, and spatial fields. The expert advise is to index a separate boolean field that is used in place of these query clauses, but that's annoying to do and it can take users a while to realize that's what they need to do. I propose that Solr's query parser rewrite such queries to return a query backed by Solr's filter cache. The underlying query happens once (and it's slow this time) and then it's cached after which it's super-fast to reuse. Unfortunately Solr's filter cache is currently index global, not per-segment; that's being handled in a separate issue. Related to this, it may be worth considering if Solr should behind the scenes index a field that records which fields have indexed values, and then it could use this indexed data to power these queries so they are always fast to execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly use this. For an example of how a user bumped into this, see: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache
[ https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724446#comment-13724446 ] Hoss Man commented on SOLR-5093: I can see the argument for making field:* parse as equivalent to field:[* TO *] if the later is in fact more efficient, but i agree with rob that we shouldn't try make the parser pull out individual clauses and construct special query objects that are baked by the filterCache. If i have an fq in my solrconfig that looks like this... {noformat} str name=fqX AND Y AND Z/str {noformat} ...that entire BooleanQuery should be cached as a single entity in the filterCache regardless of what X, Y, and Z really are -- because that's what i asked for: a single filter query. it would suck if the Query Parser looked at the specifics of what each of those clauses are and said I'm going to try and be smart and make each of these clauses be special query backed by the filterCache because now i have 4 queries in my filterCache instead of just 1, and 3 of them will never be used. Rewrite field:* to use the filter cache --- Key: SOLR-5093 URL: https://issues.apache.org/jira/browse/SOLR-5093 Project: Solr Issue Type: New Feature Components: query parsers Reporter: David Smiley Sometimes people writes a query including something like {{field:*}} which matches all documents that have an indexed value in that field. That can be particularly expensive for tokenized text, numeric, and spatial fields. The expert advise is to index a separate boolean field that is used in place of these query clauses, but that's annoying to do and it can take users a while to realize that's what they need to do. I propose that Solr's query parser rewrite such queries to return a query backed by Solr's filter cache. The underlying query happens once (and it's slow this time) and then it's cached after which it's super-fast to reuse. Unfortunately Solr's filter cache is currently index global, not per-segment; that's being handled in a separate issue. Related to this, it may be worth considering if Solr should behind the scenes index a field that records which fields have indexed values, and then it could use this indexed data to power these queries so they are always fast to execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly use this. For an example of how a user bumped into this, see: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org