Re: strange behavior of scores and term proximity use

2011-11-25 Thread Erick Erickson
You  might try with a less "fraught" search phrase,
"to be or not to be" is a classic query that may be all
stop words.

Otherwise, I'm clueless.

On Wed, Nov 23, 2011 at 3:15 PM, Ariel Zerbib  wrote:
> I tested with the version 4.0-2011-11-04_09-29-42.
>
> Ariel
>
>
> 2011/11/17 Erick Erickson 
>
>> Hmmm, I'm not seeing similar behavior on a trunk from today, when did
>> you get your copy?
>>
>> Erick
>>
>> On Wed, Nov 16, 2011 at 2:06 PM, Ariel Zerbib 
>> wrote:
>> > Hi,
>> >
>> > For this term proximity query: ab_main_title_l0:"to be or not to be"~1000
>> >
>> >
>> http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000&sort=score+desc&start=0&rows=3&fl=ab_main_title_l0%2Cscore%2Cid&debugQuery=true
>> >
>> > The third first results are the following one:
>> >
>> > 
>> > 
>> > 
>> >  0
>> >  5
>> > 
>> > 
>> >  
>> >    2315190010001021
>> >    
>> >      og54ct8n To be or not to be a Jew. 5w8ojsx2
>> >    
>> >    3.0814114
>> >  
>> >    2313006480001021
>> >    
>> >      og54ct8n To be or not to be 5w8ojsx2
>> >    
>> >    3.0814114
>> >  
>> >    2356410250001021
>> >    
>> >      og54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2
>> >    
>> >    3.0814114
>> > 
>> > 
>> >  ab_main_title_l0:"og54ct8n to be or not to be
>> > 5w8ojsx2"~1000
>> >  ab_main_title_l0:"og54ct8n to be or not to be
>> > 5w8ojsx2"~1000
>> >  PhraseQuery(ab_main_title_l0:"og54ct8n to be or
>> > not to be 5w8ojsx2"~1000)
>> >  ab_main_title_l0:"og54ct8n to be or not
>> > to be 5w8ojsx2"~1000
>> >  
>> >    
>> > 5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
>> > 5w8ojsx2"~1000 in 378403) [DefaultSimilarity], result of:
>> >  5.337161 = fieldWeight in 378403, product of:
>> >    0.57735026 = tf(freq=0.3334), with freq of:
>> >      0.3334 = phraseFreq=0.3334
>> >    29.581549 = idf(), sum of:
>> >      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
>> >      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>> >      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>> >      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
>> >      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
>> >      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>> >      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>> >      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
>> >    0.3125 = fieldNorm(doc=378403)
>> > 
>> >    
>> > 9.244234 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
>> > 5w8ojsx2"~1000 in 482807) [DefaultSimilarity], result of:
>> >  9.244234 = fieldWeight in 482807, product of:
>> >    1.0 = tf(freq=1.0), with freq of:
>> >      1.0 = phraseFreq=1.0
>> >    29.581549 = idf(), sum of:
>> >      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
>> >      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>> >      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>> >      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
>> >      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
>> >      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>> >      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>> >      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
>> >    0.3125 = fieldNorm(doc=482807)
>> > 
>> >    
>> > 5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
>> > 5w8ojsx2"~1000 in 1317563) [DefaultSimilarity], result of:
>> >  5.337161 = fieldWeight in 1317563, product of:
>> >    0.57735026 = tf(freq=0.3334), with freq of:
>> >      0.3334 = phraseFreq=0.3334
>> >    29.581549 = idf(), sum of:
>> >      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
>> >      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>> >      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>> >      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
>> >      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
>> >      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>> >      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>> >      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
>> >    0.3125 = fieldNorm(doc=1317563)
>> > 
>> > 
>> >
>> > The used version is a 4.0 October snapshot.
>> >
>> > I have 2 questions about the result:
>> > - Why debug print and scores in result are different?
>> > - What is the expected behavior of this kind of term proximity query?
>> >          - The debug scores seem to be well ordered but the result scores
>> > seem to be wrong.
>> >
>> >
>> > Thanks,
>> > Ariel
>> >
>>
>


Re: strange behavior of scores and term proximity use

2011-11-23 Thread Ariel Zerbib
I tested with the version 4.0-2011-11-04_09-29-42.

Ariel


2011/11/17 Erick Erickson 

> Hmmm, I'm not seeing similar behavior on a trunk from today, when did
> you get your copy?
>
> Erick
>
> On Wed, Nov 16, 2011 at 2:06 PM, Ariel Zerbib 
> wrote:
> > Hi,
> >
> > For this term proximity query: ab_main_title_l0:"to be or not to be"~1000
> >
> >
> http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000&sort=score+desc&start=0&rows=3&fl=ab_main_title_l0%2Cscore%2Cid&debugQuery=true
> >
> > The third first results are the following one:
> >
> > 
> > 
> > 
> >  0
> >  5
> > 
> > 
> >  
> >2315190010001021
> >
> >  og54ct8n To be or not to be a Jew. 5w8ojsx2
> >
> >3.0814114
> >  
> >2313006480001021
> >
> >  og54ct8n To be or not to be 5w8ojsx2
> >
> >3.0814114
> >  
> >2356410250001021
> >
> >  og54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2
> >
> >3.0814114
> > 
> > 
> >  ab_main_title_l0:"og54ct8n to be or not to be
> > 5w8ojsx2"~1000
> >  ab_main_title_l0:"og54ct8n to be or not to be
> > 5w8ojsx2"~1000
> >  PhraseQuery(ab_main_title_l0:"og54ct8n to be or
> > not to be 5w8ojsx2"~1000)
> >  ab_main_title_l0:"og54ct8n to be or not
> > to be 5w8ojsx2"~1000
> >  
> >
> > 5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
> > 5w8ojsx2"~1000 in 378403) [DefaultSimilarity], result of:
> >  5.337161 = fieldWeight in 378403, product of:
> >0.57735026 = tf(freq=0.3334), with freq of:
> >  0.3334 = phraseFreq=0.3334
> >29.581549 = idf(), sum of:
> >  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
> >  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
> >  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
> >  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
> >  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
> >  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
> >  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
> >  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
> >0.3125 = fieldNorm(doc=378403)
> > 
> >
> > 9.244234 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
> > 5w8ojsx2"~1000 in 482807) [DefaultSimilarity], result of:
> >  9.244234 = fieldWeight in 482807, product of:
> >1.0 = tf(freq=1.0), with freq of:
> >  1.0 = phraseFreq=1.0
> >29.581549 = idf(), sum of:
> >  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
> >  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
> >  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
> >  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
> >  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
> >  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
> >  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
> >  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
> >0.3125 = fieldNorm(doc=482807)
> > 
> >
> > 5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
> > 5w8ojsx2"~1000 in 1317563) [DefaultSimilarity], result of:
> >  5.337161 = fieldWeight in 1317563, product of:
> >0.57735026 = tf(freq=0.3334), with freq of:
> >  0.3334 = phraseFreq=0.3334
> >29.581549 = idf(), sum of:
> >  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
> >  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
> >  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
> >  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
> >  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
> >  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
> >  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
> >  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
> >0.3125 = fieldNorm(doc=1317563)
> > 
> > 
> >
> > The used version is a 4.0 October snapshot.
> >
> > I have 2 questions about the result:
> > - Why debug print and scores in result are different?
> > - What is the expected behavior of this kind of term proximity query?
> >  - The debug scores seem to be well ordered but the result scores
> > seem to be wrong.
> >
> >
> > Thanks,
> > Ariel
> >
>


Re: strange behavior of scores and term proximity use

2011-11-17 Thread Erick Erickson
Hmmm, I'm not seeing similar behavior on a trunk from today, when did
you get your copy?

Erick

On Wed, Nov 16, 2011 at 2:06 PM, Ariel Zerbib  wrote:
> Hi,
>
> For this term proximity query: ab_main_title_l0:"to be or not to be"~1000
>
> http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000&sort=score+desc&start=0&rows=3&fl=ab_main_title_l0%2Cscore%2Cid&debugQuery=true
>
> The third first results are the following one:
>
> 
> 
> 
>  0
>  5
> 
> 
>  
>    2315190010001021
>    
>      og54ct8n To be or not to be a Jew. 5w8ojsx2
>    
>    3.0814114
>  
>    2313006480001021
>    
>      og54ct8n To be or not to be 5w8ojsx2
>    
>    3.0814114
>  
>    2356410250001021
>    
>      og54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2
>    
>    3.0814114
> 
> 
>  ab_main_title_l0:"og54ct8n to be or not to be
> 5w8ojsx2"~1000
>  ab_main_title_l0:"og54ct8n to be or not to be
> 5w8ojsx2"~1000
>  PhraseQuery(ab_main_title_l0:"og54ct8n to be or
> not to be 5w8ojsx2"~1000)
>  ab_main_title_l0:"og54ct8n to be or not
> to be 5w8ojsx2"~1000
>  
>    
> 5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
> 5w8ojsx2"~1000 in 378403) [DefaultSimilarity], result of:
>  5.337161 = fieldWeight in 378403, product of:
>    0.57735026 = tf(freq=0.3334), with freq of:
>      0.3334 = phraseFreq=0.3334
>    29.581549 = idf(), sum of:
>      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
>      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
>      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
>      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
>    0.3125 = fieldNorm(doc=378403)
> 
>    
> 9.244234 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
> 5w8ojsx2"~1000 in 482807) [DefaultSimilarity], result of:
>  9.244234 = fieldWeight in 482807, product of:
>    1.0 = tf(freq=1.0), with freq of:
>      1.0 = phraseFreq=1.0
>    29.581549 = idf(), sum of:
>      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
>      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
>      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
>      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
>    0.3125 = fieldNorm(doc=482807)
> 
>    
> 5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
> 5w8ojsx2"~1000 in 1317563) [DefaultSimilarity], result of:
>  5.337161 = fieldWeight in 1317563, product of:
>    0.57735026 = tf(freq=0.3334), with freq of:
>      0.3334 = phraseFreq=0.3334
>    29.581549 = idf(), sum of:
>      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
>      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
>      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
>      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
>    0.3125 = fieldNorm(doc=1317563)
> 
> 
>
> The used version is a 4.0 October snapshot.
>
> I have 2 questions about the result:
> - Why debug print and scores in result are different?
> - What is the expected behavior of this kind of term proximity query?
>          - The debug scores seem to be well ordered but the result scores
> seem to be wrong.
>
>
> Thanks,
> Ariel
>


strange behavior of scores and term proximity use

2011-11-16 Thread Ariel Zerbib
Hi,

For this term proximity query: ab_main_title_l0:"to be or not to be"~1000

http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000&sort=score+desc&start=0&rows=3&fl=ab_main_title_l0%2Cscore%2Cid&debugQuery=true

The third first results are the following one:




  0
  5


  
2315190010001021

  og54ct8n To be or not to be a Jew. 5w8ojsx2

3.0814114
  
2313006480001021

  og54ct8n To be or not to be 5w8ojsx2

3.0814114
  
2356410250001021

  og54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2

3.0814114


  ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000
  ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000
  PhraseQuery(ab_main_title_l0:"og54ct8n to be or
not to be 5w8ojsx2"~1000)
  ab_main_title_l0:"og54ct8n to be or not
to be 5w8ojsx2"~1000
  

5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000 in 378403) [DefaultSimilarity], result of:
  5.337161 = fieldWeight in 378403, product of:
0.57735026 = tf(freq=0.3334), with freq of:
  0.3334 = phraseFreq=0.3334
29.581549 = idf(), sum of:
  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
0.3125 = fieldNorm(doc=378403)


9.244234 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000 in 482807) [DefaultSimilarity], result of:
  9.244234 = fieldWeight in 482807, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = phraseFreq=1.0
29.581549 = idf(), sum of:
  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
0.3125 = fieldNorm(doc=482807)


5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000 in 1317563) [DefaultSimilarity], result of:
  5.337161 = fieldWeight in 1317563, product of:
0.57735026 = tf(freq=0.3334), with freq of:
  0.3334 = phraseFreq=0.3334
29.581549 = idf(), sum of:
  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
0.3125 = fieldNorm(doc=1317563)



The used version is a 4.0 October snapshot.

I have 2 questions about the result:
- Why debug print and scores in result are different?
- What is the expected behavior of this kind of term proximity query?
  - The debug scores seem to be well ordered but the result scores
seem to be wrong.


Thanks,
Ariel