minimum should match, cant explain the amount of hits

2015-12-16 Thread Ron van der Vegt

Hi,

I'm currently searching with the following query: q="sony+led+tv".
The minimum should match setting is set on: mm=2<65%.
So when there are more then two terms, at least 65% of the terms should 
match.

I'm not using the StopFilterFactory.

When turning on debug, this is the parsedquery_toString:

+(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0 
| breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 | 
salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15 
(categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 | 
breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led | 
brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 | 
name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 | 
text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 | 
salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15


While I except that at least two terms should match, because of the 65%, 
i'm also getting hits of documents which seems to match on only one of 
the terms. Below the explain of the hit, which shouldn't be there:


2.6449876 = sum of:
  2.6449876 = sum of:
2.6449876 = max plus 0.15 times others of:
  2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
), product of:
  2.6449876 = idf(docFreq=3254, maxDocs=45833)
  1.0 = tfNorm, computed from:
1.0 = termFreq=1.0
1.0 = parameter k1
0.0 = parameter b (norms omitted for field)

When I change the mm to 2<67% then I get the amount of results what I 
expect with 65%, but If I understand correctly then all the terms should 
match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss 
something, or is there something else what could effect the minimum 
should match setting?


Thanks in advice!

Ron


Re: minimum should match, cant explain the amount of hits

2015-12-16 Thread Ron van der Vegt

Thanks! This makes sense, I will change my configuration to 2<-35%

On 16-12-15 13:11, Binoy Dalal wrote:

The edismax documentation confirms that when a positive % value is
provided, solr will round down. If you want solr to round up set your
parameter value as '-35%'

On Wed, 16 Dec 2015, 17:28 Binoy Dalal <binoydala...@gmail.com> wrote:


My guess is that solr is rounding down while calculating number of
mandatory terms.
In your case, there are 3 terms, 65% of which is 1.95 which rounded down
is 1, but 67% is 2.01 which rounded down is 2 which conforms with the
results you're seeing.

Maybe someone else can confirm this.

On Wed, 16 Dec 2015, 16:56 Ron van der Vegt <ron.van.der.v...@openindex.io>
wrote:


Hi,

I'm currently searching with the following query: q="sony+led+tv".
The minimum should match setting is set on: mm=2<65%.
So when there are more then two terms, at least 65% of the terms should
match.
I'm not using the StopFilterFactory.

When turning on debug, this is the parsedquery_toString:

+(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0
| breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 |
salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15
(categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 |
breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led |
brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 |
name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 |
text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 |
salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15

While I except that at least two terms should match, because of the 65%,
i'm also getting hits of documents which seems to match on only one of
the terms. Below the explain of the hit, which shouldn't be there:

2.6449876 = sum of:
2.6449876 = sum of:
  2.6449876 = max plus 0.15 times others of:
2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
  2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
), product of:
2.6449876 = idf(docFreq=3254, maxDocs=45833)
1.0 = tfNorm, computed from:
  1.0 = termFreq=1.0
  1.0 = parameter k1
  0.0 = parameter b (norms omitted for field)

When I change the mm to 2<67% then I get the amount of results what I
expect with 65%, but If I understand correctly then all the terms should
match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss
something, or is there something else what could effect the minimum
should match setting?

Thanks in advice!

Ron


--
Regards,
Binoy Dalal