Re: need help understanding an issue with scoring

2012-08-28 Thread Chris Hostetter

: What is your query and qf?

FYI: these are both inlcuded in the original message (which was also 
quoted in the reply below)

As jack points out, the differnece in score comes from thediffernece 
in which fields are matched on.


Your high scoring example doc matches on *both* the 
itemNo and itemNoExactMatchStr fields, but your low scoring example doc 
matches only on the itemNo field.  And you have a (relatively) huge boost 
on the itemNoExactMatchStr field compared to itemNo.

These queries are fairly simple, so the explain output isn't very 
complicated, and it's easy to see from the match -- but it may help to 
prune out some of the small details, and just look at the top level 
calculations...

:  str name=9030,0046,046
: 12.014634 = (MATCH) max of:
:  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of:
:  12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681),
: /str

...vs...

:  str name=90302   ,0046,046
: 0.20737723 = (MATCH) max of:
:  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of:
: /str

ou specified a huge boost on itemNoExactMatchStr, so the doc that matches 
on that field is going to score a lot higher then the doc that only 
matches on itemNo...

:  str name=qfitemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8
: brand^.5/str


-Hoss


Re: need help understanding an issue with scoring

2012-08-28 Thread geeky2
Chris, Jack,

thank you for the detailed replies and help ;)






--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4003782.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help understanding an issue with scoring

2012-08-23 Thread geeky2
update:

as an experiment - i changed the query to a wildcard (9030*) instead of an
explicit value (9030)

example:

QUERY=http://$SERVER.intra.searshc.com:${PORT}/solrpartscat/core1/select?qt=itemNoProductTypeBrandSearchq=9030*rows=2000debugQuery=onfl=*,score;

this resulted in a results list that appears much more rational from a sort
order perspective -

however - the wildcard query is not acceptable from a performance stand
point.

any input or illumination would be appreciated ;)

thank you

itemNo, score, rankNo, partCnt

  [9030],1.0,10353,1
[90302   ],1.0,6849,1
[9030P   ],1.0,444,1
[903093  ],1.0,51,1
[9030430 ],1.0,47,1
[9030],1.0,37,1
[903057-9010 ],1.0,26,1
[903061-9010 ],1.0,20,1
[903046-9010 ],1.0,18,1
[903056-9010 ],1.0,14,1
[903095  ],1.0,14,1
[90303-MR1-000   ],1.0,14,1
[903097-9050 ],1.0,12,1
[903046-9011 ],1.0,12,1
[903097-9010 ],1.0,11,1
[903097-9040 ],1.0,11,1
[903063-9100 ],1.0,6,1
[903066-9011 ],1.0,6,1
[903098  ],1.0,3,1




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002919.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help understanding an issue with scoring

2012-08-23 Thread geeky2
looks like the original complete list of the results did not get attached to
this thread 

here is a snippet of the list.

what i am trying to demonstrate, is the difference in scoring and
ultimately, sorting - and the breadth of documents (a few hundred) between
the two documents of interest (9030 and 90302)

thank you,

itemNo, score, rankNo, partCnt

  [9030],12.014701,10353,1
[9030],12.014701,37,1
[9030],12.014701,1,1
[9030   ],12.014701,0,167
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[PC-9030],7.509188,0,169
[58-9030 ],7.509188,0,1
[9030-1R ],7.509188,0,1
[903028-9030 ],7.509188,0,1
[903139-9030 ],7.509188,0,1
[903091-9030 ],7.509188,0,1
[903099-9030 ],7.509188,0,1
[903153-9030 ],7.509188,0,1
[031-9030],7.509188,0,1
[308-9030],7.509188,0,1
[9030-6010   ],7.509188,0,1
[9030-6010   ],7.509188,0,1
[9030-6006   ],7.509188,0,1
[9030-6008   ],7.509188,0,1
[9030-6008   ],7.509188,0,1
[9030-6001   ],7.509188,0,1
[9030-6003   ],7.509188,0,1
[9030-6006   ],7.509188,0,1
[208568-9030 ],7.509188,0,1
[79-9030 ],7.509188,0,1
[33-9030 ],7.509188,0,1
[M-9030  ],7.509188,0,1

... a few hundred more ...

[LGQ9030PQ1 ],0.41475832,0,150
[LEQ9030PQ0 ],0.41475832,0,124
[LEQ9030PQ1 ],0.41475832,0,123
[CWE9030BCE ],0.41475832,0,115
[PJDS9030Z   ],0.29327843,0,1
[8A-CT9-030-010  ],0.29327843,0,1
[RDT9030A],0.29327843,0,1
[PJDG9030Z   ],0.29327843,0,1
[90302   ],0.20737916,6849,1
~   



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002922.html
Sent from the Solr - User mailing list archive at Nabble.com.


need help understanding an issue with scoring

2012-08-23 Thread geeky2
hello,

i am trying to understand the debug output from a query, and specifically
- how scores for two (2) documents are derived and why they are so far
apart.

the user is entering 9030 for the search

the search is rightfully returning the top document, however - the question
is why is the document with id 90302 so far down on the list.  

i have attached a text file i generated with xslt, pulling the document
information.  the text file has the itemNo, the rankNo and the partCnt.  the
sort order of the response handler is:

  str name=sortscore desc, rankNo desc, partCnt desc/str



if you look at the text file - you will see that 90302 is 174'th on the
list!  90302 has a rankNo of 6849 - and i would think that would drive it
much higher on the list and therefore much closer to 9030.

what is happening from a business perspective - is - 9030 is one of our top
selling parts as is 90302.  they need to be closer together in the results
instead of separated by 170+ documents that have a rankNo of 0.

i have also CnP the response handler that is being used - below

can someone help me understand the scoring so i can correct this?

this is the scoring for the two documents:

  str name=9030,0046,046
12.014634 = (MATCH) max of:
  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of:
0.022755474 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  0.0027743944 = queryNorm
9.11329 = (MATCH) fieldWeight(itemNo:9030 in 2308681), product of:
  1.0 = tf(termFreq(itemNo:9030)=1)
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  1.0 = fieldNorm(field=itemNo, doc=2308681)
  12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681),
product of:
1.0 = tf(termFreq(itemNoExactMatchStr:9030)=1)
12.014634 = idf(docFreq=140, maxDocs=8566704)
1.0 = fieldNorm(field=itemNoExactMatchStr, doc=2308681)
/str




  str name=90302   ,0046,046
0.20737723 = (MATCH) max of:
  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of:
0.022755474 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  0.0027743944 = queryNorm
9.11329 = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of:
  1.0 = tf(termFreq(itemNo:9030)=1)
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  1.0 = fieldNorm(field=itemNo, doc=1796597)
/str

~  

  requestHandler name=itemNoProductTypeBrandSearch
class=solr.SearchHandler default=false
lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsall/str
  int name=rows10/int
  str name=qfitemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8
brand^.5/str
  str name=q.alt*:*/str
  str name=sortscore desc, rankNo desc, partCnt desc/str
  str name=facettrue/str
  str name=facet.fielditemDescFacet/str
  str name=facet.fieldbrandFacet/str
  str name=facet.fielddivProductTypeIdFacet/str
/lst
lst name=appends
/lst
lst name=invariants
/lst
  /requestHandler
 
thank you for any help




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help understanding an issue with scoring

2012-08-23 Thread Jack Krupansky

What is your query and qf?

The first doc gets its high score due to a match on the 
itemNoExactMatchStr field which the second doc doesn't have:


12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681),

With a low document frequency (inverts to high inverse document frequency):

12.014634 = idf(docFreq=140, maxDocs=8566704)

-- Jack Krupansky

-Original Message- 
From: geeky2

Sent: Thursday, August 23, 2012 11:44 AM
To: solr-user@lucene.apache.org
Subject: need help understanding an issue with scoring

hello,

i am trying to understand the debug output from a query, and specifically
- how scores for two (2) documents are derived and why they are so far
apart.

the user is entering 9030 for the search

the search is rightfully returning the top document, however - the question
is why is the document with id 90302 so far down on the list.

i have attached a text file i generated with xslt, pulling the document
information.  the text file has the itemNo, the rankNo and the partCnt.  the
sort order of the response handler is:

 str name=sortscore desc, rankNo desc, partCnt desc/str



if you look at the text file - you will see that 90302 is 174'th on the
list!  90302 has a rankNo of 6849 - and i would think that would drive it
much higher on the list and therefore much closer to 9030.

what is happening from a business perspective - is - 9030 is one of our top
selling parts as is 90302.  they need to be closer together in the results
instead of separated by 170+ documents that have a rankNo of 0.

i have also CnP the response handler that is being used - below

can someone help me understand the scoring so i can correct this?

this is the scoring for the two documents:

 str name=9030,0046,046
12.014634 = (MATCH) max of:
 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of:
   0.022755474 = queryWeight(itemNo:9030^0.9), product of:
 0.9 = boost
 9.11329 = idf(docFreq=2565, maxDocs=8566704)
 0.0027743944 = queryNorm
   9.11329 = (MATCH) fieldWeight(itemNo:9030 in 2308681), product of:
 1.0 = tf(termFreq(itemNo:9030)=1)
 9.11329 = idf(docFreq=2565, maxDocs=8566704)
 1.0 = fieldNorm(field=itemNo, doc=2308681)
 12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681),
product of:
   1.0 = tf(termFreq(itemNoExactMatchStr:9030)=1)
   12.014634 = idf(docFreq=140, maxDocs=8566704)
   1.0 = fieldNorm(field=itemNoExactMatchStr, doc=2308681)
/str




 str name=90302   ,0046,046
0.20737723 = (MATCH) max of:
 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of:
   0.022755474 = queryWeight(itemNo:9030^0.9), product of:
 0.9 = boost
 9.11329 = idf(docFreq=2565, maxDocs=8566704)
 0.0027743944 = queryNorm
   9.11329 = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of:
 1.0 = tf(termFreq(itemNo:9030)=1)
 9.11329 = idf(docFreq=2565, maxDocs=8566704)
 1.0 = fieldNorm(field=itemNo, doc=1796597)
/str

~

 requestHandler name=itemNoProductTypeBrandSearch
class=solr.SearchHandler default=false
   lst name=defaults
 str name=defTypeedismax/str
 str name=echoParamsall/str
 int name=rows10/int
 str name=qfitemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8
brand^.5/str
 str name=q.alt*:*/str
 str name=sortscore desc, rankNo desc, partCnt desc/str
 str name=facettrue/str
 str name=facet.fielditemDescFacet/str
 str name=facet.fieldbrandFacet/str
 str name=facet.fielddivProductTypeIdFacet/str
   /lst
   lst name=appends
   /lst
   lst name=invariants
   /lst
 /requestHandler

thank you for any help




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: need help understanding an issue with scoring

2012-08-23 Thread geeky2
hello,


this is the query i am using:

 cat goquery.sh
#!/bin/bash

SERVER=$1
PORT=$2


QUERY=http://$SERVER.blah.blah.com:${PORT}/solrpartscat/core1/select?qt=itemNoProductTypeBrandSearchq=9030rows=2000debugQuery=onfl=*,score;

curl -v $QUERY




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002969.html
Sent from the Solr - User mailing list archive at Nabble.com.