Re: edismax inconsistency -- AND/OR

2010-12-22 Thread Shawn Heisey

On 12/22/2010 8:25 AM, Dyer, James wrote:

I'm using SOLR 1.4.1 with SOLR-1553 applied (edismax query parser).  I'm 
experiencing inconsistent behavior with terms grouped in parenthesis.  
Sometimes they are AND'ed and sometimes OR'ed together.

1. q=Title:(life)defType=edismax  285 results
2. q=Title:(hope)defType=edismax  34 results

3. q=Title:(life AND hope)defType=edismax  1 result
4. q=Title:(life OR hope)defType=edismax  318 results
5. q=Title:(life hope)defType=edismax  1 result (life, hope are being AND'ed 
together)

6. q=Title:(life AND hope) AND Title:(life)defType=edismax  1 result
7. q=Title:(life OR hope) AND Title:(life)defType=edismax  285 result
8. q=Title:(life hope) AND Title:(life)defType=edismax  285 results (life, 
hope are being OR'ed together)

See how in #5, the two terms get AND'ed, but by adding the additional 
(nonsense) clause in #8, the first two terms get OR'ed .  Is this a feature or 
a bug?  Am I likely doing something wrong?


The dismax parser doesn't pay any attention to the default query 
operator.  in the absence of these values in the actual query, edismax 
likely doesn't either.  What matters is the value of the mm variable, 
also known as minimum 'should' match.  If your mm value is 50%, which 
is a common value to see in dismax examples, I believe it would behave 
exactly like you are seeing.


This is a complex little beast.  Just a couple of weeks ago, Chris 
Hostetter said that although he wrote the code and the syntax for mm, 
the explanation for the parameter that's in the Smiley and Pugh Solr 
book (pages 138-140) is the clearest he's ever seen.


Here's some detailed documentation on it.  I can't find my copy of the 
book right now, so I don't know if this is as good as what's in it:


http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html

Hopefully this is applicable to you, and not something you already 
thought of!


Shawn



RE: edismax inconsistency -- AND/OR

2010-12-22 Thread Dyer, James
Shawn,

Thank you for the reply.  The URL you gave was helpful and Smiley  Pugh even 
more so.  On Smiley  Pugh page 140, they indicate that mm=100% using dismax is 
analogous to Standard's q.op=AND.  This is exactly what I need.

However...testing with these queries and edismax, I get different # of results:

q=Title:(life hope) AND Title:(life)q.op=AND  (STANDARD Q.P.) - 1 result
q=Title:(life AND hope) AND Title:(life)defType=edismax - 1 result
q=Title:(life hope) AND Title:(life)defType=edismaxmm=100% - 285 results 
(ut-oh.  looks like the first 2 get OR'ed)

The dismax parser seems to behave as documented:

q=life hope lifedefType=dismaxrows=0qf=Titlemm=0% - 285 results (results 
are OR'ed as expected)
q=life hope lifedefType=dismaxrows=0qf=Titlemm=100% - 1 result (results are 
AND'ed as expected)

Unfortunately I need to be able to combine the use of pf with key:value 
syntax, wildcards, etc, so I need to use edismax, I think.

With a quick glance at ExtendedDismaxQParserPlugin, I'm finding...
 - MM is ignored if there are any of these operators in the query (OR NOT + -)  
... but AND is ok (line 227)
 - MM is ignored if the parse method did not return a BooleanQuery instance 
(line 244)
 - MM is used after all regardless of operators used in the query, so long as 
its a BooleanQuery (line 286)
 - The default MM value is 100% if not specified in the query parameters 
(lines 241, 283)
Given the apparent contradiction here, my very quick analysis is surely missing 
something!  But if this is accurate, then the trick is to formulate the query 
in such a way so that parse returns an instance of BooleanQuery, right?  

Any more advice anyone can give is appreciated!  For the client I'm responsible 
for, I'm just inserting explicit operators between all of the user's queries.  
But for the client I'm not responsible for I would love to have a workaround 
for the other developers!  I think they'd appreciate it...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, December 22, 2010 4:08 PM
To: solr-user@lucene.apache.org
Subject: Re: edismax inconsistency -- AND/OR

On 12/22/2010 8:25 AM, Dyer, James wrote:
 I'm using SOLR 1.4.1 with SOLR-1553 applied (edismax query parser).  I'm 
 experiencing inconsistent behavior with terms grouped in parenthesis.  
 Sometimes they are AND'ed and sometimes OR'ed together.

 1. q=Title:(life)defType=edismax  285 results
 2. q=Title:(hope)defType=edismax  34 results

 3. q=Title:(life AND hope)defType=edismax  1 result
 4. q=Title:(life OR hope)defType=edismax  318 results
 5. q=Title:(life hope)defType=edismax  1 result (life, hope are being 
 AND'ed together)

 6. q=Title:(life AND hope) AND Title:(life)defType=edismax  1 result
 7. q=Title:(life OR hope) AND Title:(life)defType=edismax  285 result
 8. q=Title:(life hope) AND Title:(life)defType=edismax  285 results (life, 
 hope are being OR'ed together)

 See how in #5, the two terms get AND'ed, but by adding the additional 
 (nonsense) clause in #8, the first two terms get OR'ed .  Is this a feature 
 or a bug?  Am I likely doing something wrong?

The dismax parser doesn't pay any attention to the default query 
operator.  in the absence of these values in the actual query, edismax 
likely doesn't either.  What matters is the value of the mm variable, 
also known as minimum 'should' match.  If your mm value is 50%, which 
is a common value to see in dismax examples, I believe it would behave 
exactly like you are seeing.

This is a complex little beast.  Just a couple of weeks ago, Chris 
Hostetter said that although he wrote the code and the syntax for mm, 
the explanation for the parameter that's in the Smiley and Pugh Solr 
book (pages 138-140) is the clearest he's ever seen.

Here's some detailed documentation on it.  I can't find my copy of the 
book right now, so I don't know if this is as good as what's in it:

http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html

Hopefully this is applicable to you, and not something you already 
thought of!

Shawn