Re: filter cache and negative filter query

2011-05-19 Thread Juan Antonio Farré Basurte
> lookups to work with an arbitrary query, you would either need to changed 
> the cache structure from Query=>DocSet to a mapping of 
> Query=>[DocSet,inverseionBit] and store the same cache value needs needs 
> with two keys -- both the positive and the negative; or you keep the 

Well, I don't know how it's working right now, but I guess that, as the 
positive version is being stored, when you look a negative query up, you 
already have a similar lookup problem: or you store two keys for the same value 
or you just transform the negative query into a positive "canonical" one before 
looking it up. The same could be done in this case, with the difference that 
yes, you need an inversion bit stored too. The double lookup option sounds 
worse, though benchmarking should be done to know for sure.
Would this optimization influence only memory usage or also smaller sets are 
faster to intersect, for example? Well, in any case, saving memory allows to 
use the additional memory to speed up the application, for example, with bigger 
caches.

Re: filter cache and negative filter query

2011-05-19 Thread Juan Antonio Farré Basurte
> : query that in fact returns the "negative" results. As a simple example, 
> : I believe that, for a boolean field, -field:true is exactly the same as 
> : +field:false, but the former is a negative query and the latter is a 
> 
> that's not strictly true in all cases... 
> 
> * if the field is multivalued=true, a doc may contain both "false" and 
>   "true" in "field", in which case it would match +field:false but it 
>   would not match -field:true
> 
> * if the field is not multivalued-false, and required=false, a doc
>   may not contain any value, in which case it would match -field:true but 
>   it would not match +field:false

You're totally right. But it was just an example. I just didn't think about 
specifying the field to be single valued and required.

I did some testing yesterday about how are filteres cached, using the admin 
interface.
I noticed that if I perform a facet.query on a boolean field testing it to be 
true or false it always looks to add two entries to the query cache. May be it 
also adds an entry to test for unexsistence of the value?
And if I perform a facet.field on the same boolean field, three new entries are 
inserted into the filter cache. May be one for true, one for false and one for 
unexsistence? I really don't know what it's exactly doing, but doesn't look, at 
first sight, like a very optimal behaviour...
I'm testing on 1.4.1 lucidworks version of solr, using the boolean field 
inStock of its example schema, with its example data.

Re: filter cache and negative filter query

2011-05-18 Thread Chris Hostetter

: What I don't like is that it systematically uses the positive version. 
: Sometimes the negative version will give many less results (for example, 
: in some cases I filter by documents not having a given field, and there 
: are very few of them). I think it would be much better that solr 

the "positive" version of the filter is the only one that can be executed, 
so it's the one that gets cached today, but the principle you are 
describing is still sound -- in fact I'm pretty sure there is a note in 
the code about this exact idea as a possible performance enhancment:

if the cardinality of a filter is very large (regardless of wether the 
query was "positive" or "negative") it's negative relative the set of all 
docs could be cached in it's place to save space...

...but...

...the complication would comes later when doing lookups -- for cache 
lookups to work with an arbitrary query, you would either need to changed 
the cache structure from Query=>DocSet to a mapping of 
Query=>[DocSet,inverseionBit] and store the same cache value needs needs 
with two keys -- both the positive and the negative; or you keep the 
current cache structure, store whichever Query=>DocSet pair has the 
smallest cardinality, but then every logical cache lookup requires a 
second actual cache lookup under the covers (for the negation of the 
query) if the first one doesn't match anything.

it would require some benchmarking and hard decisions about whether the 
(hypothetical) memory savings are worth the (hypothetical) CPU cost.

: query that in fact returns the "negative" results. As a simple example, 
: I believe that, for a boolean field, -field:true is exactly the same as 
: +field:false, but the former is a negative query and the latter is a 

that's not strictly true in all cases... 

 * if the field is multivalued=true, a doc may contain both "false" and 
   "true" in "field", in which case it would match +field:false but it 
   would not match -field:true

 * if the field is not multivalued-false, and required=false, a doc
   may not contain any value, in which case it would match -field:true but 
   it would not match +field:false


-Hoss


Re: filter cache and negative filter query

2011-05-18 Thread Juan Antonio Farré Basurte
Mmm... I had wondered whether solr reused filters this way (not having both the 
positive and negative versions) and I'm glad to see it does indeed reuse them.
What I don't like is that it systematically uses the positive version. 
Sometimes the negative version will give many less results (for example, in 
some cases I filter by documents not having a given field, and there are very 
few of them).
I think it would be much better that solr performed exactly the query requested 
and, if there's more than a 50% of documents that match the query, then it just 
stored the negated one. I think (without knowing almost at all how things are 
implemented) this shouldn't be a problem.
Is there any place where you can post a suggestion of improvement? :)
Anyway, it would be very useful to know exactly how the current versions work 
(I think the info in the message I'm answering is about version 1.1 and could 
have changed), because knowing it, one can sometimes manage to write, for 
example, a "positive" query that in fact returns the "negative" results. As a 
simple example, I believe that, for a boolean field, -field:true is exactly the 
same as +field:false, but the former is a negative query and the latter is a 
positive one.
So, knowing the exact behaviour of solr can help you write optimized filters 
when you know that one version will give many less hits than the other.

El 18/05/2011, a las 00:26, Yonik Seeley escribió:

> On Tue, May 17, 2011 at 6:17 PM, Markus Jelsma
>  wrote:
>> I'm not sure. The filter cache uses your filter as a key and a negation is a
>> different key. You can check this easily in a controlled environment by
>> issueing these queries and watching the filter cache statistics.
> 
> Gotta hate crossing emails ;-)
> Anyway, this goes back to Solr 1.1
> 
> 5. SOLR-80: Negative queries are now allowed everywhere.  Negative queries
>are generated and cached as their positive counterpart, speeding
>generation and generally resulting in smaller sets to cache.
>Set intersections in SolrIndexSearcher are more efficient,
>starting with the smallest positive set, subtracting all negative
>sets, then intersecting with all other positive sets.  (yonik)
> 
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
> 
> 
> 
>>> If I have a query with a filter query such as : " q=art&fq=history" and
>>> then run a second query  "q=art&fq=-history", will Solr realize that it
>>> can use the cached results of the previous filter query "history"  (in the
>>> filter cache) or will it not realize this and have to actually do a second
>>> filter query against the index  for "not history"?
>>> 
>>> Tom
>> 



Re: filter cache and negative filter query

2011-05-17 Thread Markus Jelsma
, that's not what i see while testing right now. Queries with -one OR -two 
return less documents than a either operand does on its own, this is with 
LuceneQParser. I haven't done extensive testing since i rarely use boolean 
algebra in Lucene or Solr.

> Oops, you're right, I had misremembered --- Solr 1.4.1 "lucene" qp
> handles pure negative fine, it's Solr 1.4.1 _dismax_ that does not.
> 
> Although, here's one, not actually related to this thread,  that DOESN'T
> work in Solr 1.4.1 lucene query parser. Curious if it's been fixed in
> Solr 3.1.
> 
> &defType=lucene&q=-one OR -two
> 
> That one does NOT work as expected in solr 1.4.1, although I can't
> explain exactly what it's doing, it's not right. (It returns FEWER
> results than "-one" alone, which can't be right algebraicly). I think.
> So there are still some kinds of negative queries that do weird things.
> 
> On 5/17/2011 6:29 PM, Markus Jelsma wrote:
> > Such a negation works just as one would expect.
> > 
> > q=*:*
> > 
> > 
> > q=*:*&fq=-type:text/html
> > 
> > 
> > q=*:*&fq=type:text/html
> > 
> > 
> > Well, that adds up , doesn't it ;)
> > 
> >> 1. I don't think Solr will re-use the filter cache in that situation,
> >> although I'm not sure. But I comment anyway because, not what you asked
> >> but something else that will trip you up with your example:
> >> 
> >> 2. In fact, a pure-negative query like that doesn't work _at all_ in the
> >> default solr/lucene query parser used for 'fq', at least in Solr 1.4.1.
> >> Not sure if it's been improved in 3.1, but I don't think so.  It will
> >> always return 0 hits, the solr/lucene query parser can't generate a
> >> proper lucene query from a pure negative query like that.
> >> 
> >> To get around this, you can find a variation the query that means the
> >> same thing but isn't that form. Here's a really ugly one I use, with a
> >> nested dismax -- dismax ALSO has trouble with pure negatives, although I
> >> think maybe edismax can handle em? But this weird as heck combo works,
> >> maybe there's a better way.
> >> 
> >> NOT _query_:"{!dismax qf=something}history"
> >> 
> >> And to come around full circle, I have NO idea what effect nested
> >> queries have on the filter cache. I think that STILL won't re-use the
> >> filter cache but I wonder if it'll re-use the _query_ cache for
> >> "history"?  I forget even more how the query cache works though.
> >> 
> >> On 5/17/2011 6:07 PM, Burton-West, Tom wrote:
> >>> If I have a query with a filter query such as : " q=art&fq=history" and
> >>> then run a second query  "q=art&fq=-history", will Solr realize that it
> >>> can use the cached results of the previous filter query "history"  (in
> >>> the filter cache) or will it not realize this and have to actually do a
> >>> second filter query against the index  for "not history"?
> >>> 
> >>> Tom


Re: filter cache and negative filter query

2011-05-17 Thread Markus Jelsma

> On Tue, May 17, 2011 at 6:17 PM, Markus Jelsma
> 
>  wrote:
> > I'm not sure. The filter cache uses your filter as a key and a negation
> > is a different key. You can check this easily in a controlled
> > environment by issueing these queries and watching the filter cache
> > statistics.
> 
> Gotta hate crossing emails ;-)

I love it, it gives me a smile :)

> Anyway, this goes back to Solr 1.1
> 
>  5. SOLR-80: Negative queries are now allowed everywhere.  Negative queries
> are generated and cached as their positive counterpart, speeding
> generation and generally resulting in smaller sets to cache.
> Set intersections in SolrIndexSearcher are more efficient,
> starting with the smallest positive set, subtracting all negative
> sets, then intersecting with all other positive sets.  (yonik)
> 
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
> 
> >> If I have a query with a filter query such as : " q=art&fq=history" and
> >> then run a second query  "q=art&fq=-history", will Solr realize that it
> >> can use the cached results of the previous filter query "history"  (in
> >> the filter cache) or will it not realize this and have to actually do a
> >> second filter query against the index  for "not history"?
> >> 
> >> Tom


Re: filter cache and negative filter query

2011-05-17 Thread Jonathan Rochkind
Oops, you're right, I had misremembered --- Solr 1.4.1 "lucene" qp 
handles pure negative fine, it's Solr 1.4.1 _dismax_ that does not.


Although, here's one, not actually related to this thread,  that DOESN'T 
work in Solr 1.4.1 lucene query parser. Curious if it's been fixed in 
Solr 3.1.


&defType=lucene&q=-one OR -two

That one does NOT work as expected in solr 1.4.1, although I can't 
explain exactly what it's doing, it's not right. (It returns FEWER 
results than "-one" alone, which can't be right algebraicly). I think. 
So there are still some kinds of negative queries that do weird things.


On 5/17/2011 6:29 PM, Markus Jelsma wrote:

Such a negation works just as one would expect.

q=*:*


q=*:*&fq=-type:text/html


q=*:*&fq=type:text/html


Well, that adds up , doesn't it ;)


1. I don't think Solr will re-use the filter cache in that situation,
although I'm not sure. But I comment anyway because, not what you asked
but something else that will trip you up with your example:

2. In fact, a pure-negative query like that doesn't work _at all_ in the
default solr/lucene query parser used for 'fq', at least in Solr 1.4.1.
Not sure if it's been improved in 3.1, but I don't think so.  It will
always return 0 hits, the solr/lucene query parser can't generate a
proper lucene query from a pure negative query like that.

To get around this, you can find a variation the query that means the
same thing but isn't that form. Here's a really ugly one I use, with a
nested dismax -- dismax ALSO has trouble with pure negatives, although I
think maybe edismax can handle em? But this weird as heck combo works,
maybe there's a better way.

NOT _query_:"{!dismax qf=something}history"

And to come around full circle, I have NO idea what effect nested
queries have on the filter cache. I think that STILL won't re-use the
filter cache but I wonder if it'll re-use the _query_ cache for
"history"?  I forget even more how the query cache works though.

On 5/17/2011 6:07 PM, Burton-West, Tom wrote:

If I have a query with a filter query such as : " q=art&fq=history" and
then run a second query  "q=art&fq=-history", will Solr realize that it
can use the cached results of the previous filter query "history"  (in
the filter cache) or will it not realize this and have to actually do a
second filter query against the index  for "not history"?

Tom


Re: filter cache and negative filter query

2011-05-17 Thread Markus Jelsma
Using q works just as one would expect as well ;)

I think you confuse using negation to find documents that don't _have_ a 
specific field. In that case, a simple negation doesn't work indeed.

> Wait, will a pure negative filter query actually work then, even though
> a pure negative lucene 'q' won't?
> 
> WOAH, it WILL.  Okay, ignore my last message. But, okay, can someone
> explain THAT one to me?
> 
> How come &q=-history does NOT work with Solr 1.4.1 lucene query parser,
> but &q=something&fq=-history DOES work, even though that fq is still
> using the same lucene query parser, no?
> 
> On 5/17/2011 6:14 PM, Yonik Seeley wrote:
> > On Tue, May 17, 2011 at 6:07 PM, Burton-West, Tom  
wrote:
> >> If I have a query with a filter query such as : " q=art&fq=history" and
> >> then run a second query  "q=art&fq=-history", will Solr realize that it
> >> can use the cached results of the previous filter query "history"  (in
> >> the filter cache)
> > 
> > Yep.
> > 
> > You should be able to verify with the filterCache section of the stats
> > admin page.
> > 
> > -Yonik
> > http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> > 25-26, San Francisco


Re: filter cache and negative filter query

2011-05-17 Thread Markus Jelsma
Such a negation works just as one would expect.

q=*:*


q=*:*&fq=-type:text/html


q=*:*&fq=type:text/html


Well, that adds up , doesn't it ;)

> 1. I don't think Solr will re-use the filter cache in that situation,
> although I'm not sure. But I comment anyway because, not what you asked
> but something else that will trip you up with your example:
> 
> 2. In fact, a pure-negative query like that doesn't work _at all_ in the
> default solr/lucene query parser used for 'fq', at least in Solr 1.4.1.
> Not sure if it's been improved in 3.1, but I don't think so.  It will
> always return 0 hits, the solr/lucene query parser can't generate a
> proper lucene query from a pure negative query like that.
> 
> To get around this, you can find a variation the query that means the
> same thing but isn't that form. Here's a really ugly one I use, with a
> nested dismax -- dismax ALSO has trouble with pure negatives, although I
> think maybe edismax can handle em? But this weird as heck combo works,
> maybe there's a better way.
> 
> NOT _query_:"{!dismax qf=something}history"
> 
> And to come around full circle, I have NO idea what effect nested
> queries have on the filter cache. I think that STILL won't re-use the
> filter cache but I wonder if it'll re-use the _query_ cache for
> "history"?  I forget even more how the query cache works though.
> 
> On 5/17/2011 6:07 PM, Burton-West, Tom wrote:
> > If I have a query with a filter query such as : " q=art&fq=history" and
> > then run a second query  "q=art&fq=-history", will Solr realize that it
> > can use the cached results of the previous filter query "history"  (in
> > the filter cache) or will it not realize this and have to actually do a
> > second filter query against the index  for "not history"?
> > 
> > Tom


Re: filter cache and negative filter query

2011-05-17 Thread Yonik Seeley
On Tue, May 17, 2011 at 6:17 PM, Markus Jelsma
 wrote:
> I'm not sure. The filter cache uses your filter as a key and a negation is a
> different key. You can check this easily in a controlled environment by
> issueing these queries and watching the filter cache statistics.

Gotta hate crossing emails ;-)
Anyway, this goes back to Solr 1.1

 5. SOLR-80: Negative queries are now allowed everywhere.  Negative queries
are generated and cached as their positive counterpart, speeding
generation and generally resulting in smaller sets to cache.
Set intersections in SolrIndexSearcher are more efficient,
starting with the smallest positive set, subtracting all negative
sets, then intersecting with all other positive sets.  (yonik)

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco



>> If I have a query with a filter query such as : " q=art&fq=history" and
>> then run a second query  "q=art&fq=-history", will Solr realize that it
>> can use the cached results of the previous filter query "history"  (in the
>> filter cache) or will it not realize this and have to actually do a second
>> filter query against the index  for "not history"?
>>
>> Tom
>


Re: filter cache and negative filter query

2011-05-17 Thread Jonathan Rochkind
Wait, will a pure negative filter query actually work then, even though 
a pure negative lucene 'q' won't?


WOAH, it WILL.  Okay, ignore my last message. But, okay, can someone 
explain THAT one to me?


How come &q=-history does NOT work with Solr 1.4.1 lucene query parser, 
but &q=something&fq=-history DOES work, even though that fq is still 
using the same lucene query parser, no?


On 5/17/2011 6:14 PM, Yonik Seeley wrote:

On Tue, May 17, 2011 at 6:07 PM, Burton-West, Tom  wrote:

If I have a query with a filter query such as : " q=art&fq=history" and then run a second query  
"q=art&fq=-history", will Solr realize that it can use the cached results of the previous filter query 
"history"  (in the filter cache)

Yep.

You should be able to verify with the filterCache section of the stats
admin page.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco



Re: filter cache and negative filter query

2011-05-17 Thread Jonathan Rochkind
1. I don't think Solr will re-use the filter cache in that situation, 
although I'm not sure. But I comment anyway because, not what you asked 
but something else that will trip you up with your example:


2. In fact, a pure-negative query like that doesn't work _at all_ in the 
default solr/lucene query parser used for 'fq', at least in Solr 1.4.1. 
Not sure if it's been improved in 3.1, but I don't think so.  It will 
always return 0 hits, the solr/lucene query parser can't generate a 
proper lucene query from a pure negative query like that.


To get around this, you can find a variation the query that means the 
same thing but isn't that form. Here's a really ugly one I use, with a 
nested dismax -- dismax ALSO has trouble with pure negatives, although I 
think maybe edismax can handle em? But this weird as heck combo works, 
maybe there's a better way.


NOT _query_:"{!dismax qf=something}history"

And to come around full circle, I have NO idea what effect nested 
queries have on the filter cache. I think that STILL won't re-use the 
filter cache but I wonder if it'll re-use the _query_ cache for 
"history"?  I forget even more how the query cache works though.




On 5/17/2011 6:07 PM, Burton-West, Tom wrote:

If I have a query with a filter query such as : " q=art&fq=history" and then run a second query  
"q=art&fq=-history", will Solr realize that it can use the cached results of the previous filter query 
"history"  (in the filter cache) or will it not realize this and have to actually do a second filter query against 
the index  for "not history"?

Tom




Re: filter cache and negative filter query

2011-05-17 Thread Markus Jelsma
I'm not sure. The filter cache uses your filter as a key and a negation is a 
different key. You can check this easily in a controlled environment by 
issueing these queries and watching the filter cache statistics.

> If I have a query with a filter query such as : " q=art&fq=history" and
> then run a second query  "q=art&fq=-history", will Solr realize that it
> can use the cached results of the previous filter query "history"  (in the
> filter cache) or will it not realize this and have to actually do a second
> filter query against the index  for "not history"?
> 
> Tom


Re: filter cache and negative filter query

2011-05-17 Thread Yonik Seeley
On Tue, May 17, 2011 at 6:07 PM, Burton-West, Tom  wrote:
> If I have a query with a filter query such as : " q=art&fq=history" and then 
> run a second query  "q=art&fq=-history", will Solr realize that it can use 
> the cached results of the previous filter query "history"  (in the filter 
> cache)

Yep.

You should be able to verify with the filterCache section of the stats
admin page.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


filter cache and negative filter query

2011-05-17 Thread Burton-West, Tom
If I have a query with a filter query such as : " q=art&fq=history" and then 
run a second query  "q=art&fq=-history", will Solr realize that it can use the 
cached results of the previous filter query "history"  (in the filter cache) or 
will it not realize this and have to actually do a second filter query against 
the index  for "not history"?

Tom