Re: filter cache and negative filter query
> lookups to work with an arbitrary query, you would either need to changed > the cache structure from Query=>DocSet to a mapping of > Query=>[DocSet,inverseionBit] and store the same cache value needs needs > with two keys -- both the positive and the negative; or you keep the Well, I don't know how it's working right now, but I guess that, as the positive version is being stored, when you look a negative query up, you already have a similar lookup problem: or you store two keys for the same value or you just transform the negative query into a positive "canonical" one before looking it up. The same could be done in this case, with the difference that yes, you need an inversion bit stored too. The double lookup option sounds worse, though benchmarking should be done to know for sure. Would this optimization influence only memory usage or also smaller sets are faster to intersect, for example? Well, in any case, saving memory allows to use the additional memory to speed up the application, for example, with bigger caches.
Re: filter cache and negative filter query
> : query that in fact returns the "negative" results. As a simple example, > : I believe that, for a boolean field, -field:true is exactly the same as > : +field:false, but the former is a negative query and the latter is a > > that's not strictly true in all cases... > > * if the field is multivalued=true, a doc may contain both "false" and > "true" in "field", in which case it would match +field:false but it > would not match -field:true > > * if the field is not multivalued-false, and required=false, a doc > may not contain any value, in which case it would match -field:true but > it would not match +field:false You're totally right. But it was just an example. I just didn't think about specifying the field to be single valued and required. I did some testing yesterday about how are filteres cached, using the admin interface. I noticed that if I perform a facet.query on a boolean field testing it to be true or false it always looks to add two entries to the query cache. May be it also adds an entry to test for unexsistence of the value? And if I perform a facet.field on the same boolean field, three new entries are inserted into the filter cache. May be one for true, one for false and one for unexsistence? I really don't know what it's exactly doing, but doesn't look, at first sight, like a very optimal behaviour... I'm testing on 1.4.1 lucidworks version of solr, using the boolean field inStock of its example schema, with its example data.
Re: filter cache and negative filter query
: What I don't like is that it systematically uses the positive version. : Sometimes the negative version will give many less results (for example, : in some cases I filter by documents not having a given field, and there : are very few of them). I think it would be much better that solr the "positive" version of the filter is the only one that can be executed, so it's the one that gets cached today, but the principle you are describing is still sound -- in fact I'm pretty sure there is a note in the code about this exact idea as a possible performance enhancment: if the cardinality of a filter is very large (regardless of wether the query was "positive" or "negative") it's negative relative the set of all docs could be cached in it's place to save space... ...but... ...the complication would comes later when doing lookups -- for cache lookups to work with an arbitrary query, you would either need to changed the cache structure from Query=>DocSet to a mapping of Query=>[DocSet,inverseionBit] and store the same cache value needs needs with two keys -- both the positive and the negative; or you keep the current cache structure, store whichever Query=>DocSet pair has the smallest cardinality, but then every logical cache lookup requires a second actual cache lookup under the covers (for the negation of the query) if the first one doesn't match anything. it would require some benchmarking and hard decisions about whether the (hypothetical) memory savings are worth the (hypothetical) CPU cost. : query that in fact returns the "negative" results. As a simple example, : I believe that, for a boolean field, -field:true is exactly the same as : +field:false, but the former is a negative query and the latter is a that's not strictly true in all cases... * if the field is multivalued=true, a doc may contain both "false" and "true" in "field", in which case it would match +field:false but it would not match -field:true * if the field is not multivalued-false, and required=false, a doc may not contain any value, in which case it would match -field:true but it would not match +field:false -Hoss
Re: filter cache and negative filter query
Mmm... I had wondered whether solr reused filters this way (not having both the positive and negative versions) and I'm glad to see it does indeed reuse them. What I don't like is that it systematically uses the positive version. Sometimes the negative version will give many less results (for example, in some cases I filter by documents not having a given field, and there are very few of them). I think it would be much better that solr performed exactly the query requested and, if there's more than a 50% of documents that match the query, then it just stored the negated one. I think (without knowing almost at all how things are implemented) this shouldn't be a problem. Is there any place where you can post a suggestion of improvement? :) Anyway, it would be very useful to know exactly how the current versions work (I think the info in the message I'm answering is about version 1.1 and could have changed), because knowing it, one can sometimes manage to write, for example, a "positive" query that in fact returns the "negative" results. As a simple example, I believe that, for a boolean field, -field:true is exactly the same as +field:false, but the former is a negative query and the latter is a positive one. So, knowing the exact behaviour of solr can help you write optimized filters when you know that one version will give many less hits than the other. El 18/05/2011, a las 00:26, Yonik Seeley escribió: > On Tue, May 17, 2011 at 6:17 PM, Markus Jelsma > wrote: >> I'm not sure. The filter cache uses your filter as a key and a negation is a >> different key. You can check this easily in a controlled environment by >> issueing these queries and watching the filter cache statistics. > > Gotta hate crossing emails ;-) > Anyway, this goes back to Solr 1.1 > > 5. SOLR-80: Negative queries are now allowed everywhere. Negative queries >are generated and cached as their positive counterpart, speeding >generation and generally resulting in smaller sets to cache. >Set intersections in SolrIndexSearcher are more efficient, >starting with the smallest positive set, subtracting all negative >sets, then intersecting with all other positive sets. (yonik) > > -Yonik > http://www.lucenerevolution.org -- Lucene/Solr User Conference, May > 25-26, San Francisco > > > >>> If I have a query with a filter query such as : " q=art&fq=history" and >>> then run a second query "q=art&fq=-history", will Solr realize that it >>> can use the cached results of the previous filter query "history" (in the >>> filter cache) or will it not realize this and have to actually do a second >>> filter query against the index for "not history"? >>> >>> Tom >>
Re: filter cache and negative filter query
, that's not what i see while testing right now. Queries with -one OR -two return less documents than a either operand does on its own, this is with LuceneQParser. I haven't done extensive testing since i rarely use boolean algebra in Lucene or Solr. > Oops, you're right, I had misremembered --- Solr 1.4.1 "lucene" qp > handles pure negative fine, it's Solr 1.4.1 _dismax_ that does not. > > Although, here's one, not actually related to this thread, that DOESN'T > work in Solr 1.4.1 lucene query parser. Curious if it's been fixed in > Solr 3.1. > > &defType=lucene&q=-one OR -two > > That one does NOT work as expected in solr 1.4.1, although I can't > explain exactly what it's doing, it's not right. (It returns FEWER > results than "-one" alone, which can't be right algebraicly). I think. > So there are still some kinds of negative queries that do weird things. > > On 5/17/2011 6:29 PM, Markus Jelsma wrote: > > Such a negation works just as one would expect. > > > > q=*:* > > > > > > q=*:*&fq=-type:text/html > > > > > > q=*:*&fq=type:text/html > > > > > > Well, that adds up , doesn't it ;) > > > >> 1. I don't think Solr will re-use the filter cache in that situation, > >> although I'm not sure. But I comment anyway because, not what you asked > >> but something else that will trip you up with your example: > >> > >> 2. In fact, a pure-negative query like that doesn't work _at all_ in the > >> default solr/lucene query parser used for 'fq', at least in Solr 1.4.1. > >> Not sure if it's been improved in 3.1, but I don't think so. It will > >> always return 0 hits, the solr/lucene query parser can't generate a > >> proper lucene query from a pure negative query like that. > >> > >> To get around this, you can find a variation the query that means the > >> same thing but isn't that form. Here's a really ugly one I use, with a > >> nested dismax -- dismax ALSO has trouble with pure negatives, although I > >> think maybe edismax can handle em? But this weird as heck combo works, > >> maybe there's a better way. > >> > >> NOT _query_:"{!dismax qf=something}history" > >> > >> And to come around full circle, I have NO idea what effect nested > >> queries have on the filter cache. I think that STILL won't re-use the > >> filter cache but I wonder if it'll re-use the _query_ cache for > >> "history"? I forget even more how the query cache works though. > >> > >> On 5/17/2011 6:07 PM, Burton-West, Tom wrote: > >>> If I have a query with a filter query such as : " q=art&fq=history" and > >>> then run a second query "q=art&fq=-history", will Solr realize that it > >>> can use the cached results of the previous filter query "history" (in > >>> the filter cache) or will it not realize this and have to actually do a > >>> second filter query against the index for "not history"? > >>> > >>> Tom
Re: filter cache and negative filter query
> On Tue, May 17, 2011 at 6:17 PM, Markus Jelsma > > wrote: > > I'm not sure. The filter cache uses your filter as a key and a negation > > is a different key. You can check this easily in a controlled > > environment by issueing these queries and watching the filter cache > > statistics. > > Gotta hate crossing emails ;-) I love it, it gives me a smile :) > Anyway, this goes back to Solr 1.1 > > 5. SOLR-80: Negative queries are now allowed everywhere. Negative queries > are generated and cached as their positive counterpart, speeding > generation and generally resulting in smaller sets to cache. > Set intersections in SolrIndexSearcher are more efficient, > starting with the smallest positive set, subtracting all negative > sets, then intersecting with all other positive sets. (yonik) > > -Yonik > http://www.lucenerevolution.org -- Lucene/Solr User Conference, May > 25-26, San Francisco > > >> If I have a query with a filter query such as : " q=art&fq=history" and > >> then run a second query "q=art&fq=-history", will Solr realize that it > >> can use the cached results of the previous filter query "history" (in > >> the filter cache) or will it not realize this and have to actually do a > >> second filter query against the index for "not history"? > >> > >> Tom
Re: filter cache and negative filter query
Oops, you're right, I had misremembered --- Solr 1.4.1 "lucene" qp handles pure negative fine, it's Solr 1.4.1 _dismax_ that does not. Although, here's one, not actually related to this thread, that DOESN'T work in Solr 1.4.1 lucene query parser. Curious if it's been fixed in Solr 3.1. &defType=lucene&q=-one OR -two That one does NOT work as expected in solr 1.4.1, although I can't explain exactly what it's doing, it's not right. (It returns FEWER results than "-one" alone, which can't be right algebraicly). I think. So there are still some kinds of negative queries that do weird things. On 5/17/2011 6:29 PM, Markus Jelsma wrote: Such a negation works just as one would expect. q=*:* q=*:*&fq=-type:text/html q=*:*&fq=type:text/html Well, that adds up , doesn't it ;) 1. I don't think Solr will re-use the filter cache in that situation, although I'm not sure. But I comment anyway because, not what you asked but something else that will trip you up with your example: 2. In fact, a pure-negative query like that doesn't work _at all_ in the default solr/lucene query parser used for 'fq', at least in Solr 1.4.1. Not sure if it's been improved in 3.1, but I don't think so. It will always return 0 hits, the solr/lucene query parser can't generate a proper lucene query from a pure negative query like that. To get around this, you can find a variation the query that means the same thing but isn't that form. Here's a really ugly one I use, with a nested dismax -- dismax ALSO has trouble with pure negatives, although I think maybe edismax can handle em? But this weird as heck combo works, maybe there's a better way. NOT _query_:"{!dismax qf=something}history" And to come around full circle, I have NO idea what effect nested queries have on the filter cache. I think that STILL won't re-use the filter cache but I wonder if it'll re-use the _query_ cache for "history"? I forget even more how the query cache works though. On 5/17/2011 6:07 PM, Burton-West, Tom wrote: If I have a query with a filter query such as : " q=art&fq=history" and then run a second query "q=art&fq=-history", will Solr realize that it can use the cached results of the previous filter query "history" (in the filter cache) or will it not realize this and have to actually do a second filter query against the index for "not history"? Tom
Re: filter cache and negative filter query
Using q works just as one would expect as well ;) I think you confuse using negation to find documents that don't _have_ a specific field. In that case, a simple negation doesn't work indeed. > Wait, will a pure negative filter query actually work then, even though > a pure negative lucene 'q' won't? > > WOAH, it WILL. Okay, ignore my last message. But, okay, can someone > explain THAT one to me? > > How come &q=-history does NOT work with Solr 1.4.1 lucene query parser, > but &q=something&fq=-history DOES work, even though that fq is still > using the same lucene query parser, no? > > On 5/17/2011 6:14 PM, Yonik Seeley wrote: > > On Tue, May 17, 2011 at 6:07 PM, Burton-West, Tom wrote: > >> If I have a query with a filter query such as : " q=art&fq=history" and > >> then run a second query "q=art&fq=-history", will Solr realize that it > >> can use the cached results of the previous filter query "history" (in > >> the filter cache) > > > > Yep. > > > > You should be able to verify with the filterCache section of the stats > > admin page. > > > > -Yonik > > http://www.lucenerevolution.org -- Lucene/Solr User Conference, May > > 25-26, San Francisco
Re: filter cache and negative filter query
Such a negation works just as one would expect. q=*:* q=*:*&fq=-type:text/html q=*:*&fq=type:text/html Well, that adds up , doesn't it ;) > 1. I don't think Solr will re-use the filter cache in that situation, > although I'm not sure. But I comment anyway because, not what you asked > but something else that will trip you up with your example: > > 2. In fact, a pure-negative query like that doesn't work _at all_ in the > default solr/lucene query parser used for 'fq', at least in Solr 1.4.1. > Not sure if it's been improved in 3.1, but I don't think so. It will > always return 0 hits, the solr/lucene query parser can't generate a > proper lucene query from a pure negative query like that. > > To get around this, you can find a variation the query that means the > same thing but isn't that form. Here's a really ugly one I use, with a > nested dismax -- dismax ALSO has trouble with pure negatives, although I > think maybe edismax can handle em? But this weird as heck combo works, > maybe there's a better way. > > NOT _query_:"{!dismax qf=something}history" > > And to come around full circle, I have NO idea what effect nested > queries have on the filter cache. I think that STILL won't re-use the > filter cache but I wonder if it'll re-use the _query_ cache for > "history"? I forget even more how the query cache works though. > > On 5/17/2011 6:07 PM, Burton-West, Tom wrote: > > If I have a query with a filter query such as : " q=art&fq=history" and > > then run a second query "q=art&fq=-history", will Solr realize that it > > can use the cached results of the previous filter query "history" (in > > the filter cache) or will it not realize this and have to actually do a > > second filter query against the index for "not history"? > > > > Tom
Re: filter cache and negative filter query
On Tue, May 17, 2011 at 6:17 PM, Markus Jelsma wrote: > I'm not sure. The filter cache uses your filter as a key and a negation is a > different key. You can check this easily in a controlled environment by > issueing these queries and watching the filter cache statistics. Gotta hate crossing emails ;-) Anyway, this goes back to Solr 1.1 5. SOLR-80: Negative queries are now allowed everywhere. Negative queries are generated and cached as their positive counterpart, speeding generation and generally resulting in smaller sets to cache. Set intersections in SolrIndexSearcher are more efficient, starting with the smallest positive set, subtracting all negative sets, then intersecting with all other positive sets. (yonik) -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco >> If I have a query with a filter query such as : " q=art&fq=history" and >> then run a second query "q=art&fq=-history", will Solr realize that it >> can use the cached results of the previous filter query "history" (in the >> filter cache) or will it not realize this and have to actually do a second >> filter query against the index for "not history"? >> >> Tom >
Re: filter cache and negative filter query
Wait, will a pure negative filter query actually work then, even though a pure negative lucene 'q' won't? WOAH, it WILL. Okay, ignore my last message. But, okay, can someone explain THAT one to me? How come &q=-history does NOT work with Solr 1.4.1 lucene query parser, but &q=something&fq=-history DOES work, even though that fq is still using the same lucene query parser, no? On 5/17/2011 6:14 PM, Yonik Seeley wrote: On Tue, May 17, 2011 at 6:07 PM, Burton-West, Tom wrote: If I have a query with a filter query such as : " q=art&fq=history" and then run a second query "q=art&fq=-history", will Solr realize that it can use the cached results of the previous filter query "history" (in the filter cache) Yep. You should be able to verify with the filterCache section of the stats admin page. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: filter cache and negative filter query
1. I don't think Solr will re-use the filter cache in that situation, although I'm not sure. But I comment anyway because, not what you asked but something else that will trip you up with your example: 2. In fact, a pure-negative query like that doesn't work _at all_ in the default solr/lucene query parser used for 'fq', at least in Solr 1.4.1. Not sure if it's been improved in 3.1, but I don't think so. It will always return 0 hits, the solr/lucene query parser can't generate a proper lucene query from a pure negative query like that. To get around this, you can find a variation the query that means the same thing but isn't that form. Here's a really ugly one I use, with a nested dismax -- dismax ALSO has trouble with pure negatives, although I think maybe edismax can handle em? But this weird as heck combo works, maybe there's a better way. NOT _query_:"{!dismax qf=something}history" And to come around full circle, I have NO idea what effect nested queries have on the filter cache. I think that STILL won't re-use the filter cache but I wonder if it'll re-use the _query_ cache for "history"? I forget even more how the query cache works though. On 5/17/2011 6:07 PM, Burton-West, Tom wrote: If I have a query with a filter query such as : " q=art&fq=history" and then run a second query "q=art&fq=-history", will Solr realize that it can use the cached results of the previous filter query "history" (in the filter cache) or will it not realize this and have to actually do a second filter query against the index for "not history"? Tom
Re: filter cache and negative filter query
I'm not sure. The filter cache uses your filter as a key and a negation is a different key. You can check this easily in a controlled environment by issueing these queries and watching the filter cache statistics. > If I have a query with a filter query such as : " q=art&fq=history" and > then run a second query "q=art&fq=-history", will Solr realize that it > can use the cached results of the previous filter query "history" (in the > filter cache) or will it not realize this and have to actually do a second > filter query against the index for "not history"? > > Tom
Re: filter cache and negative filter query
On Tue, May 17, 2011 at 6:07 PM, Burton-West, Tom wrote: > If I have a query with a filter query such as : " q=art&fq=history" and then > run a second query "q=art&fq=-history", will Solr realize that it can use > the cached results of the previous filter query "history" (in the filter > cache) Yep. You should be able to verify with the filterCache section of the stats admin page. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
filter cache and negative filter query
If I have a query with a filter query such as : " q=art&fq=history" and then run a second query "q=art&fq=-history", will Solr realize that it can use the cached results of the previous filter query "history" (in the filter cache) or will it not realize this and have to actually do a second filter query against the index for "not history"? Tom