Re: need help understanding an issue with scoring
: What is your query and qf? FYI: these are both inlcuded in the original message (which was also quoted in the reply below) As jack points out, the differnece in score comes from thediffernece in which fields are matched on. Your high scoring example doc matches on *both* the itemNo and itemNoExactMatchStr fields, but your low scoring example doc matches only on the itemNo field. And you have a (relatively) huge boost on the itemNoExactMatchStr field compared to itemNo. These queries are fairly simple, so the explain output isn't very complicated, and it's easy to see from the match -- but it may help to prune out some of the small details, and just look at the top level calculations... : str name=9030,0046,046 : 12.014634 = (MATCH) max of: : 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of: : 12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681), : /str ...vs... : str name=90302 ,0046,046 : 0.20737723 = (MATCH) max of: : 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of: : /str ou specified a huge boost on itemNoExactMatchStr, so the doc that matches on that field is going to score a lot higher then the doc that only matches on itemNo... : str name=qfitemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8 : brand^.5/str -Hoss
Re: need help understanding an issue with scoring
Chris, Jack, thank you for the detailed replies and help ;) -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4003782.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help understanding an issue with scoring
update: as an experiment - i changed the query to a wildcard (9030*) instead of an explicit value (9030) example: QUERY=http://$SERVER.intra.searshc.com:${PORT}/solrpartscat/core1/select?qt=itemNoProductTypeBrandSearchq=9030*rows=2000debugQuery=onfl=*,score; this resulted in a results list that appears much more rational from a sort order perspective - however - the wildcard query is not acceptable from a performance stand point. any input or illumination would be appreciated ;) thank you itemNo, score, rankNo, partCnt [9030],1.0,10353,1 [90302 ],1.0,6849,1 [9030P ],1.0,444,1 [903093 ],1.0,51,1 [9030430 ],1.0,47,1 [9030],1.0,37,1 [903057-9010 ],1.0,26,1 [903061-9010 ],1.0,20,1 [903046-9010 ],1.0,18,1 [903056-9010 ],1.0,14,1 [903095 ],1.0,14,1 [90303-MR1-000 ],1.0,14,1 [903097-9050 ],1.0,12,1 [903046-9011 ],1.0,12,1 [903097-9010 ],1.0,11,1 [903097-9040 ],1.0,11,1 [903063-9100 ],1.0,6,1 [903066-9011 ],1.0,6,1 [903098 ],1.0,3,1 -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002919.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help understanding an issue with scoring
looks like the original complete list of the results did not get attached to this thread here is a snippet of the list. what i am trying to demonstrate, is the difference in scoring and ultimately, sorting - and the breadth of documents (a few hundred) between the two documents of interest (9030 and 90302) thank you, itemNo, score, rankNo, partCnt [9030],12.014701,10353,1 [9030],12.014701,37,1 [9030],12.014701,1,1 [9030 ],12.014701,0,167 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [PC-9030],7.509188,0,169 [58-9030 ],7.509188,0,1 [9030-1R ],7.509188,0,1 [903028-9030 ],7.509188,0,1 [903139-9030 ],7.509188,0,1 [903091-9030 ],7.509188,0,1 [903099-9030 ],7.509188,0,1 [903153-9030 ],7.509188,0,1 [031-9030],7.509188,0,1 [308-9030],7.509188,0,1 [9030-6010 ],7.509188,0,1 [9030-6010 ],7.509188,0,1 [9030-6006 ],7.509188,0,1 [9030-6008 ],7.509188,0,1 [9030-6008 ],7.509188,0,1 [9030-6001 ],7.509188,0,1 [9030-6003 ],7.509188,0,1 [9030-6006 ],7.509188,0,1 [208568-9030 ],7.509188,0,1 [79-9030 ],7.509188,0,1 [33-9030 ],7.509188,0,1 [M-9030 ],7.509188,0,1 ... a few hundred more ... [LGQ9030PQ1 ],0.41475832,0,150 [LEQ9030PQ0 ],0.41475832,0,124 [LEQ9030PQ1 ],0.41475832,0,123 [CWE9030BCE ],0.41475832,0,115 [PJDS9030Z ],0.29327843,0,1 [8A-CT9-030-010 ],0.29327843,0,1 [RDT9030A],0.29327843,0,1 [PJDG9030Z ],0.29327843,0,1 [90302 ],0.20737916,6849,1 ~ -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002922.html Sent from the Solr - User mailing list archive at Nabble.com.
need help understanding an issue with scoring
hello, i am trying to understand the debug output from a query, and specifically - how scores for two (2) documents are derived and why they are so far apart. the user is entering 9030 for the search the search is rightfully returning the top document, however - the question is why is the document with id 90302 so far down on the list. i have attached a text file i generated with xslt, pulling the document information. the text file has the itemNo, the rankNo and the partCnt. the sort order of the response handler is: str name=sortscore desc, rankNo desc, partCnt desc/str if you look at the text file - you will see that 90302 is 174'th on the list! 90302 has a rankNo of 6849 - and i would think that would drive it much higher on the list and therefore much closer to 9030. what is happening from a business perspective - is - 9030 is one of our top selling parts as is 90302. they need to be closer together in the results instead of separated by 170+ documents that have a rankNo of 0. i have also CnP the response handler that is being used - below can someone help me understand the scoring so i can correct this? this is the scoring for the two documents: str name=9030,0046,046 12.014634 = (MATCH) max of: 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of: 0.022755474 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 9.11329 = idf(docFreq=2565, maxDocs=8566704) 0.0027743944 = queryNorm 9.11329 = (MATCH) fieldWeight(itemNo:9030 in 2308681), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 9.11329 = idf(docFreq=2565, maxDocs=8566704) 1.0 = fieldNorm(field=itemNo, doc=2308681) 12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681), product of: 1.0 = tf(termFreq(itemNoExactMatchStr:9030)=1) 12.014634 = idf(docFreq=140, maxDocs=8566704) 1.0 = fieldNorm(field=itemNoExactMatchStr, doc=2308681) /str str name=90302 ,0046,046 0.20737723 = (MATCH) max of: 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of: 0.022755474 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 9.11329 = idf(docFreq=2565, maxDocs=8566704) 0.0027743944 = queryNorm 9.11329 = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 9.11329 = idf(docFreq=2565, maxDocs=8566704) 1.0 = fieldNorm(field=itemNo, doc=1796597) /str ~ requestHandler name=itemNoProductTypeBrandSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int str name=qfitemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8 brand^.5/str str name=q.alt*:*/str str name=sortscore desc, rankNo desc, partCnt desc/str str name=facettrue/str str name=facet.fielditemDescFacet/str str name=facet.fieldbrandFacet/str str name=facet.fielddivProductTypeIdFacet/str /lst lst name=appends /lst lst name=invariants /lst /requestHandler thank you for any help -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help understanding an issue with scoring
What is your query and qf? The first doc gets its high score due to a match on the itemNoExactMatchStr field which the second doc doesn't have: 12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681), With a low document frequency (inverts to high inverse document frequency): 12.014634 = idf(docFreq=140, maxDocs=8566704) -- Jack Krupansky -Original Message- From: geeky2 Sent: Thursday, August 23, 2012 11:44 AM To: solr-user@lucene.apache.org Subject: need help understanding an issue with scoring hello, i am trying to understand the debug output from a query, and specifically - how scores for two (2) documents are derived and why they are so far apart. the user is entering 9030 for the search the search is rightfully returning the top document, however - the question is why is the document with id 90302 so far down on the list. i have attached a text file i generated with xslt, pulling the document information. the text file has the itemNo, the rankNo and the partCnt. the sort order of the response handler is: str name=sortscore desc, rankNo desc, partCnt desc/str if you look at the text file - you will see that 90302 is 174'th on the list! 90302 has a rankNo of 6849 - and i would think that would drive it much higher on the list and therefore much closer to 9030. what is happening from a business perspective - is - 9030 is one of our top selling parts as is 90302. they need to be closer together in the results instead of separated by 170+ documents that have a rankNo of 0. i have also CnP the response handler that is being used - below can someone help me understand the scoring so i can correct this? this is the scoring for the two documents: str name=9030,0046,046 12.014634 = (MATCH) max of: 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of: 0.022755474 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 9.11329 = idf(docFreq=2565, maxDocs=8566704) 0.0027743944 = queryNorm 9.11329 = (MATCH) fieldWeight(itemNo:9030 in 2308681), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 9.11329 = idf(docFreq=2565, maxDocs=8566704) 1.0 = fieldNorm(field=itemNo, doc=2308681) 12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681), product of: 1.0 = tf(termFreq(itemNoExactMatchStr:9030)=1) 12.014634 = idf(docFreq=140, maxDocs=8566704) 1.0 = fieldNorm(field=itemNoExactMatchStr, doc=2308681) /str str name=90302 ,0046,046 0.20737723 = (MATCH) max of: 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of: 0.022755474 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 9.11329 = idf(docFreq=2565, maxDocs=8566704) 0.0027743944 = queryNorm 9.11329 = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 9.11329 = idf(docFreq=2565, maxDocs=8566704) 1.0 = fieldNorm(field=itemNo, doc=1796597) /str ~ requestHandler name=itemNoProductTypeBrandSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int str name=qfitemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8 brand^.5/str str name=q.alt*:*/str str name=sortscore desc, rankNo desc, partCnt desc/str str name=facettrue/str str name=facet.fielditemDescFacet/str str name=facet.fieldbrandFacet/str str name=facet.fielddivProductTypeIdFacet/str /lst lst name=appends /lst lst name=invariants /lst /requestHandler thank you for any help -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help understanding an issue with scoring
hello, this is the query i am using: cat goquery.sh #!/bin/bash SERVER=$1 PORT=$2 QUERY=http://$SERVER.blah.blah.com:${PORT}/solrpartscat/core1/select?qt=itemNoProductTypeBrandSearchq=9030rows=2000debugQuery=onfl=*,score; curl -v $QUERY -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002969.html Sent from the Solr - User mailing list archive at Nabble.com.