subject:"\[jira\] \[Commented\] \(SOLR\-5093\) Rewrite field\:\* to use the filter cache"

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

2013-08-05 Thread David Smiley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729536#comment-13729536
]

David Smiley commented on SOLR-5093:

I guess I change my mind; the veto arguments are good.

Mikhail, I like your idea on making a sub-clause be filter-cache'able. But I
don't think it should be a separate query parser because it's an orthogonal
issue to how the query is parsed. Perhaps a special local-param
filterCache=true. Your example would become:

{noformat}
q=bee:blah OR {! filterCache=true}foo:bar OR {! filterCache=true}foo:bar
{noformat}

A key thing to document would not only be that this clause would be cached in
the filter-cache, but also that it would constant-score.

Rewrite field:* to use the filter cache
---

Key: SOLR-5093
URL: https://issues.apache.org/jira/browse/SOLR-5093
Project: Solr
Issue Type: New Feature
Components: query parsers
Reporter: David Smiley

Sometimes people writes a query including something like {{field:*}} which
matches all documents that have an indexed value in that field. That can be
particularly expensive for tokenized text, numeric, and spatial fields. The
expert advise is to index a separate boolean field that is used in place of
these query clauses, but that's annoying to do and it can take users a while
to realize that's what they need to do.
I propose that Solr's query parser rewrite such queries to return a query
backed by Solr's filter cache. The underlying query happens once (and it's
slow this time) and then it's cached after which it's super-fast to reuse.
Unfortunately Solr's filter cache is currently index global, not per-segment;
that's being handled in a separate issue.
Related to this, it may be worth considering if Solr should behind the scenes
index a field that records which fields have indexed values, and then it
could use this indexed data to power these queries so they are always fast to
execute. Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly
use this.
For an example of how a user bumped into this, see:
http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

2013-08-05 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729793#comment-13729793
 ] 

Robert Muir commented on SOLR-5093:
---

I dont think that would work:

{quote}
There may only be one LocalParams prefix per argument, preventing the need for 
any escaping of the original argument.
{quote}

http://wiki.apache.org/solr/LocalParams



 Rewrite field:* to use the filter cache
 ---

 Key: SOLR-5093
 URL: https://issues.apache.org/jira/browse/SOLR-5093
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Reporter: David Smiley

 Sometimes people writes a query including something like {{field:*}} which 
 matches all documents that have an indexed value in that field.  That can be 
 particularly expensive for tokenized text, numeric, and spatial fields.  The 
 expert advise is to index a separate boolean field that is used in place of 
 these query clauses, but that's annoying to do and it can take users a while 
 to realize that's what they need to do.
 I propose that Solr's query parser rewrite such queries to return a query 
 backed by Solr's filter cache.  The underlying query happens once (and it's 
 slow this time) and then it's cached after which it's super-fast to reuse.  
 Unfortunately Solr's filter cache is currently index global, not per-segment; 
 that's being handled in a separate issue.  
 Related to this, it may be worth considering if Solr should behind the scenes 
 index a field that records which fields have indexed values, and then it 
 could use this indexed data to power these queries so they are always fast to 
 execute.  Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly 
 use this.
 For an example of how a user bumped into this, see:
 http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

2013-08-05 Thread Mikhail Khludnev (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729873#comment-13729873
]

Mikhail Khludnev commented on SOLR-5093:

[~rcmuir] I think that would SOLR-4093

Can anyone confess to {! sep=true} which is backed by
ExtendedQuery.getCacheSep()? Isn't it somehow related to the discussed
challenge?

Rewrite field:* to use the filter cache
---

Key: SOLR-5093
URL: https://issues.apache.org/jira/browse/SOLR-5093
Project: Solr
Issue Type: New Feature
Components: query parsers
Reporter: David Smiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

2013-08-05 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729879#comment-13729879
]

Yonik Seeley commented on SOLR-5093:

bq. Can anyone confess to {! sep=true}

It's a placeholder that currently does nothing (and is undocumented)... ignore
it, or remove it if it bothers people ;-)

Rewrite field:* to use the filter cache
---

Key: SOLR-5093
URL: https://issues.apache.org/jira/browse/SOLR-5093
Project: Solr
Issue Type: New Feature
Components: query parsers
Reporter: David Smiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

2013-07-31 Thread Mikhail Khludnev (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725094#comment-13725094
]

Mikhail Khludnev commented on SOLR-5093:

I agree with vetoes.
but in a rare cases users need q=bee:blah *OR* pp:\* there is also a jira to
handle fq disjunction like fq=foo:bar OR foo:baz. We can deliver simple qparser
and use it like
q=bee:blah OR _query_:{!fq}foo:bar OR _query_:{!fq}foo:bar
it keeps syntax crazy enough. that's great.
Do you like to accept it ?

Afterwards, we can allow BS in Solr to handle filters disjunction efficiently.

Rewrite field:* to use the filter cache
---

Key: SOLR-5093
URL: https://issues.apache.org/jira/browse/SOLR-5093
Project: Solr
Issue Type: New Feature
Components: query parsers
Reporter: David Smiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

2013-07-30 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724393#comment-13724393
]

Robert Muir commented on SOLR-5093:
---

Err, this user already had this in their FQ. So if they had a filtercache, he'd
be using it.

he should pull that slow piece to a separate FQ so its cached by itself. I
don't understand why the queryparser needs to do anything else here (especially
any trappy auto-caching)

Rewrite field:* to use the filter cache
---

Key: SOLR-5093
URL: https://issues.apache.org/jira/browse/SOLR-5093
Project: Solr
Issue Type: New Feature
Components: query parsers
Reporter: David Smiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

2013-07-30 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724405#comment-13724405
]

Jack Krupansky commented on SOLR-5093:
--

Some time ago I had suggested a related approach: LUCENE-4386 - Query parser
should generate FieldValueFilter for pure wildcard terms to boost query
performance.

There were objections from the Lucene guys, but now that the Solr query parser
is divorced from Lucene, maybe it could be reconsidered.

I couldn't testify as to the relative merits of using the filter cache vs. the
FieldValueFilter.

Rewrite field:* to use the filter cache
---

Key: SOLR-5093
URL: https://issues.apache.org/jira/browse/SOLR-5093
Project: Solr
Issue Type: New Feature
Components: query parsers
Reporter: David Smiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

2013-07-30 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724422#comment-13724422
 ] 

Robert Muir commented on SOLR-5093:
---

Those same lucene guys are not afraid to object here either.

This user just has to pull out AND pp:* into another fq of pp:*

{quote}
(Each filter is executed and cached separately. When it's time to use them to 
limit the number of results returned by a query, this is done using set 
intersections.) 
{quote}
http://wiki.apache.org/solr/SolrCaching#filterCache

 Rewrite field:* to use the filter cache
 ---

 Key: SOLR-5093
 URL: https://issues.apache.org/jira/browse/SOLR-5093
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Reporter: David Smiley

 Sometimes people writes a query including something like {{field:*}} which 
 matches all documents that have an indexed value in that field.  That can be 
 particularly expensive for tokenized text, numeric, and spatial fields.  The 
 expert advise is to index a separate boolean field that is used in place of 
 these query clauses, but that's annoying to do and it can take users a while 
 to realize that's what they need to do.
 I propose that Solr's query parser rewrite such queries to return a query 
 backed by Solr's filter cache.  The underlying query happens once (and it's 
 slow this time) and then it's cached after which it's super-fast to reuse.  
 Unfortunately Solr's filter cache is currently index global, not per-segment; 
 that's being handled in a separate issue.  
 Related to this, it may be worth considering if Solr should behind the scenes 
 index a field that records which fields have indexed values, and then it 
 could use this indexed data to power these queries so they are always fast to 
 execute.  Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly 
 use this.
 For an example of how a user bumped into this, see:
 http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

2013-07-30 Thread David Smiley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724428#comment-13724428
]

David Smiley commented on SOLR-5093:

Rob,
You're right for this particular user's use-case that I mentioned. I
overlooked that aspect of his query. Nonetheless, I don't think that negates
the usefulness of what I propose in this issue though.

If you consider auto-caching trappy then you probably don't like Solr very
much at all then.

Rewrite field:* to use the filter cache
---

Key: SOLR-5093
URL: https://issues.apache.org/jira/browse/SOLR-5093
Project: Solr
Issue Type: New Feature
Components: query parsers
Reporter: David Smiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

2013-07-30 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724430#comment-13724430
]

Jack Krupansky commented on SOLR-5093:
--

bq. This user just has to pull out AND pp:* into another fq of pp:*

Exactly! That's what we (non-Lucene guys) are trying to do - eliminate the need
for users to have to do that kind of manual optimization.

We want Solr to behave as optimally as possibly OOTB.

Rewrite field:* to use the filter cache
---

Key: SOLR-5093
URL: https://issues.apache.org/jira/browse/SOLR-5093
Project: Solr
Issue Type: New Feature
Components: query parsers
Reporter: David Smiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

2013-07-30 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724443#comment-13724443
]

Robert Muir commented on SOLR-5093:
---

Solr today doesn't auto-cache. You can specify that you intend for a query to
act only as a filter with fqs, control the caching behavior of these fqs, and
so on.

So there is no need to add any additional auto-caching in the queryparser.
Things like LUCENE-4386 would just cause filter cache insanity where its
cached in duplicate places (on FieldCache.docsWithField as well as in fq
bitsets).

Auto-caching things in the query can easily pollute the cache with stuff thats
not actually intended to be reused: then it doesn't really work at all.

Rewrite field:* to use the filter cache
---

Key: SOLR-5093
URL: https://issues.apache.org/jira/browse/SOLR-5093
Project: Solr
Issue Type: New Feature
Components: query parsers
Reporter: David Smiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

2013-07-30 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724446#comment-13724446
 ] 

Hoss Man commented on SOLR-5093:


I can see the argument for making field:* parse as equivalent to field:[* TO 
*] if the later is in fact more efficient, but i agree with rob that we 
shouldn't try make the parser pull out individual clauses and construct special 
query objects that are baked by the filterCache.  If i have an fq in my 
solrconfig that looks like this...

{noformat}
str name=fqX AND Y AND Z/str
{noformat}

...that entire BooleanQuery should be cached as a single entity in the 
filterCache regardless of what X, Y, and Z really are -- because that's what i 
asked for: a single filter query.

it would suck if the Query Parser looked at the specifics of what each of those 
clauses are and said I'm going to try and be smart and make each of these 
clauses be special query backed by the filterCache because now i have 4 
queries in my filterCache instead of just 1, and 3 of them will never be used.



 Rewrite field:* to use the filter cache
 ---

 Key: SOLR-5093
 URL: https://issues.apache.org/jira/browse/SOLR-5093
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Reporter: David Smiley

 Sometimes people writes a query including something like {{field:*}} which 
 matches all documents that have an indexed value in that field.  That can be 
 particularly expensive for tokenized text, numeric, and spatial fields.  The 
 expert advise is to index a separate boolean field that is used in place of 
 these query clauses, but that's annoying to do and it can take users a while 
 to realize that's what they need to do.
 I propose that Solr's query parser rewrite such queries to return a query 
 backed by Solr's filter cache.  The underlying query happens once (and it's 
 slow this time) and then it's cached after which it's super-fast to reuse.  
 Unfortunately Solr's filter cache is currently index global, not per-segment; 
 that's being handled in a separate issue.  
 Related to this, it may be worth considering if Solr should behind the scenes 
 index a field that records which fields have indexed values, and then it 
 could use this indexed data to power these queries so they are always fast to 
 execute.  Likewise, {{\[\* TO \*\]}} open-ended range queries could similarly 
 use this.
 For an example of how a user bumped into this, see:
 http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tt4081150.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

[jira] [Commented] (SOLR-5093) Rewrite field:* to use the filter cache

12 matches

Site Navigation

Mail list logo

Footer information