Re: please help explaining debug output
IDF is the frequency of the term in that field for the entire index, not the specific document. So it means that the term is in that field for some document somewhere, but not in that particular document I believe... Which leads me to wonder if the document is getting indexed as you expect, although there's nothing in the data that you've provided that I can point to as the culprit, it all looks like it *should* work If you can get a copy of Luke and look at the document in question and/or look at the schema browser for that particular field it might help, but frankly I'm at a loss to understand what the problem is... Sorry I can't be of more help Erick On Tue, Jul 26, 2011 at 1:04 PM, Robert Petersen rober...@buy.com wrote: That didn't help. Seems like another case where I should get matches but don't and this time it is only for some documents. Others with similar content do match just fine. The debug output 'explain other' section for a non-matching document seems to say the term frequency is 0 for my problematic term, although I know it is in the content. I ended up making a synonym to do what the analysis stack *should* be doing: splitting LaserJet on case changes. IE putting LaserJet, laser jet in synonyms at index time makes this work. I don't know why though. Question: Does this debug output mean it is matching the terms but the term frequency vector is returning 0 for the frequency of this term. IE Does this mean the term is in the doc but not in the tf array? 0.0 = no match on required clause (moreWords:laser jet) 0.0 = weight(moreWords:laser jet in 32497), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 0.0 = fieldWeight(moreWords:laser jet in 32497), product of: 0.0 = tf(phraseFreq=0.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.078125 = fieldNorm(field=moreWords, doc=32497) -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, July 25, 2011 3:28 PM To: solr-user@lucene.apache.org Subject: Re: please help explaining debug output Hmmm, I can't find a convenient 1.4.0 to download, but re-indexing is a good idea since this seems like it *should* work. Erick On Mon, Jul 25, 2011 at 5:32 PM, Robert Petersen rober...@buy.com wrote: I'm still on solr 1.4.0 and the analysis page looks like they should match, and other products with the same content do in fact match. I'm reindexing the non-matching ones to rule that out. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, July 25, 2011 1:58 PM To: solr-user@lucene.apache.org Subject: Re: please help explaining debug output Hmmm, I'm assuming that moreWords is your default text field, yes? But it works for me (tm), using 1.4.1. What version of Solr are you on? Also, take a glance at the admin/analysis page, that might help... Gotta run Erick On Mon, Jul 25, 2011 at 4:52 PM, Robert Petersen rober...@buy.com wrote: Sorry, to clarify a search for P1102W matches all three docs but a search for p1102w LaserJet only matches the second two. Someone asked me a question while I was typing and I got distracted, apologies for any confusion. -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Monday, July 25, 2011 1:42 PM To: solr-user@lucene.apache.org Subject: please help explaining debug output I have three documents with the following product titles in a text field called moreWords with analysis stack matching the solr example text field definition. 1. HP LaserJet P1102W Monochrome Laser Printer http://www.buy.com/prod/hp-laserjet-p1102w-monochrome-laser-printer/q/l oc/101/213824965.html 2. HP CE285A (85A) Remanufactured Black Toner Cartridge for LaserJet M1212nf, P1102, P1102W Series http://www.buy.com/prod/hp-ce285a-85a-remanufactured-black-toner-cartri dge-for-laserjet/q/loc/101/217145536.html 3. Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet M1130, LaserJet M1132, LaserJet M1210 http://www.buy.com/prod/black-hp-ce285a-toner-cartridge-for-laserjet-p1 102w-laserjet-m1130/q/loc/101/222045267.html A search for P1102W matches (2) and (3), but not (1) above. Can someone explain the debug output? It looks like I am getting a non-match on (1) because term frequency is zero? Am I reading that right? If so, how could that be? the searched terms are equivalently in all three docs. I don't get it. lst name=debug str name=rawquerystringp1102w LaserJet /str str name=querystringp1102w LaserJet /str str name=parsedquery+PhraseQuery(moreWords:p 1102 w) +PhraseQuery(moreWords:laser jet)/str str name=parsedquery_toString+moreWords:p 1102 w +moreWords:laser jet/str lst name=explain str name=222045267 3.64852 = (MATCH) sum
RE: please help explaining debug output
That didn't help. Seems like another case where I should get matches but don't and this time it is only for some documents. Others with similar content do match just fine. The debug output 'explain other' section for a non-matching document seems to say the term frequency is 0 for my problematic term, although I know it is in the content. I ended up making a synonym to do what the analysis stack *should* be doing: splitting LaserJet on case changes. IE putting LaserJet, laser jet in synonyms at index time makes this work. I don't know why though. Question: Does this debug output mean it is matching the terms but the term frequency vector is returning 0 for the frequency of this term. IE Does this mean the term is in the doc but not in the tf array? 0.0 = no match on required clause (moreWords:laser jet) 0.0 = weight(moreWords:laser jet in 32497), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 0.0 = fieldWeight(moreWords:laser jet in 32497), product of: 0.0 = tf(phraseFreq=0.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.078125 = fieldNorm(field=moreWords, doc=32497) -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, July 25, 2011 3:28 PM To: solr-user@lucene.apache.org Subject: Re: please help explaining debug output Hmmm, I can't find a convenient 1.4.0 to download, but re-indexing is a good idea since this seems like it *should* work. Erick On Mon, Jul 25, 2011 at 5:32 PM, Robert Petersen rober...@buy.com wrote: I'm still on solr 1.4.0 and the analysis page looks like they should match, and other products with the same content do in fact match. I'm reindexing the non-matching ones to rule that out. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, July 25, 2011 1:58 PM To: solr-user@lucene.apache.org Subject: Re: please help explaining debug output Hmmm, I'm assuming that moreWords is your default text field, yes? But it works for me (tm), using 1.4.1. What version of Solr are you on? Also, take a glance at the admin/analysis page, that might help... Gotta run Erick On Mon, Jul 25, 2011 at 4:52 PM, Robert Petersen rober...@buy.com wrote: Sorry, to clarify a search for P1102W matches all three docs but a search for p1102w LaserJet only matches the second two. Someone asked me a question while I was typing and I got distracted, apologies for any confusion. -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Monday, July 25, 2011 1:42 PM To: solr-user@lucene.apache.org Subject: please help explaining debug output I have three documents with the following product titles in a text field called moreWords with analysis stack matching the solr example text field definition. 1. HP LaserJet P1102W Monochrome Laser Printer http://www.buy.com/prod/hp-laserjet-p1102w-monochrome-laser-printer/q/l oc/101/213824965.html 2. HP CE285A (85A) Remanufactured Black Toner Cartridge for LaserJet M1212nf, P1102, P1102W Series http://www.buy.com/prod/hp-ce285a-85a-remanufactured-black-toner-cartri dge-for-laserjet/q/loc/101/217145536.html 3. Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet M1130, LaserJet M1132, LaserJet M1210 http://www.buy.com/prod/black-hp-ce285a-toner-cartridge-for-laserjet-p1 102w-laserjet-m1130/q/loc/101/222045267.html A search for P1102W matches (2) and (3), but not (1) above. Can someone explain the debug output? It looks like I am getting a non-match on (1) because term frequency is zero? Am I reading that right? If so, how could that be? the searched terms are equivalently in all three docs. I don't get it. lst name=debug str name=rawquerystringp1102w LaserJet /str str name=querystringp1102w LaserJet /str str name=parsedquery+PhraseQuery(moreWords:p 1102 w) +PhraseQuery(moreWords:laser jet)/str str name=parsedquery_toString+moreWords:p 1102 w +moreWords:laser jet/str lst name=explain str name=222045267 3.64852 = (MATCH) sum of: 2.4758534 = weight(moreWords:p 1102 w in 6667236), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 3.1121879 = fieldWeight(moreWords:p 1102 w in 6667236), product of: 1.7320508 = tf(phraseFreq=3.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.09375 = fieldNorm(field=moreWords, doc=6667236) 1.1726664 = weight(moreWords:laser jet in 6667236), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 1.9353869 = fieldWeight(moreWords:laser jet in 6667236), product
please help explaining debug output
I have three documents with the following product titles in a text field called moreWords with analysis stack matching the solr example text field definition. 1. HP LaserJet P1102W Monochrome Laser Printer http://www.buy.com/prod/hp-laserjet-p1102w-monochrome-laser-printer/q/l oc/101/213824965.html 2. HP CE285A (85A) Remanufactured Black Toner Cartridge for LaserJet M1212nf, P1102, P1102W Series http://www.buy.com/prod/hp-ce285a-85a-remanufactured-black-toner-cartri dge-for-laserjet/q/loc/101/217145536.html 3. Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet M1130, LaserJet M1132, LaserJet M1210 http://www.buy.com/prod/black-hp-ce285a-toner-cartridge-for-laserjet-p1 102w-laserjet-m1130/q/loc/101/222045267.html A search for P1102W matches (2) and (3), but not (1) above. Can someone explain the debug output? It looks like I am getting a non-match on (1) because term frequency is zero? Am I reading that right? If so, how could that be? the searched terms are equivalently in all three docs. I don't get it. lst name=debug str name=rawquerystringp1102w LaserJet /str str name=querystringp1102w LaserJet /str str name=parsedquery+PhraseQuery(moreWords:p 1102 w) +PhraseQuery(moreWords:laser jet)/str str name=parsedquery_toString+moreWords:p 1102 w +moreWords:laser jet/str lst name=explain str name=222045267 3.64852 = (MATCH) sum of: 2.4758534 = weight(moreWords:p 1102 w in 6667236), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 3.1121879 = fieldWeight(moreWords:p 1102 w in 6667236), product of: 1.7320508 = tf(phraseFreq=3.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.09375 = fieldNorm(field=moreWords, doc=6667236) 1.1726664 = weight(moreWords:laser jet in 6667236), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 1.9353869 = fieldWeight(moreWords:laser jet in 6667236), product of: 1.4142135 = tf(phraseFreq=2.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.09375 = fieldNorm(field=moreWords, doc=6667236) /str str name=222045265 2.8656518 = (MATCH) sum of: 1.4294347 = weight(moreWords:p 1102 w in 6684158), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 1.7968225 = fieldWeight(moreWords:p 1102 w in 6684158), product of: 1.0 = tf(phraseFreq=1.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.09375 = fieldNorm(field=moreWords, doc=6684158) 1.4362172 = weight(moreWords:laser jet in 6684158), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 2.3703551 = fieldWeight(moreWords:laser jet in 6684158), product of: 1.7320508 = tf(phraseFreq=3.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.09375 = fieldNorm(field=moreWords, doc=6684158) /str /lst str name=otherQuerysku:213824965 /str lst name=explainOther str name=213824965 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s) 1.1911955 = weight(moreWords:p 1102 w in 32497), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 1.4973521 = fieldWeight(moreWords:p 1102 w in 32497), product of: 1.0 = tf(phraseFreq=1.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.078125 = fieldNorm(field=moreWords, doc=32497) 0.0 = no match on required clause (moreWords:laser jet) 0.0 = weight(moreWords:laser jet in 32497), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 0.0 = fieldWeight(moreWords:laser jet in 32497), product of: 0.0 = tf(phraseFreq=0.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.078125 = fieldNorm(field=moreWords, doc=32497) /str /lst
RE: please help explaining debug output
Sorry, to clarify a search for P1102W matches all three docs but a search for p1102w LaserJet only matches the second two. Someone asked me a question while I was typing and I got distracted, apologies for any confusion. -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Monday, July 25, 2011 1:42 PM To: solr-user@lucene.apache.org Subject: please help explaining debug output I have three documents with the following product titles in a text field called moreWords with analysis stack matching the solr example text field definition. 1. HP LaserJet P1102W Monochrome Laser Printer http://www.buy.com/prod/hp-laserjet-p1102w-monochrome-laser-printer/q/l oc/101/213824965.html 2. HP CE285A (85A) Remanufactured Black Toner Cartridge for LaserJet M1212nf, P1102, P1102W Series http://www.buy.com/prod/hp-ce285a-85a-remanufactured-black-toner-cartri dge-for-laserjet/q/loc/101/217145536.html 3. Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet M1130, LaserJet M1132, LaserJet M1210 http://www.buy.com/prod/black-hp-ce285a-toner-cartridge-for-laserjet-p1 102w-laserjet-m1130/q/loc/101/222045267.html A search for P1102W matches (2) and (3), but not (1) above. Can someone explain the debug output? It looks like I am getting a non-match on (1) because term frequency is zero? Am I reading that right? If so, how could that be? the searched terms are equivalently in all three docs. I don't get it. lst name=debug str name=rawquerystringp1102w LaserJet /str str name=querystringp1102w LaserJet /str str name=parsedquery+PhraseQuery(moreWords:p 1102 w) +PhraseQuery(moreWords:laser jet)/str str name=parsedquery_toString+moreWords:p 1102 w +moreWords:laser jet/str lst name=explain str name=222045267 3.64852 = (MATCH) sum of: 2.4758534 = weight(moreWords:p 1102 w in 6667236), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 3.1121879 = fieldWeight(moreWords:p 1102 w in 6667236), product of: 1.7320508 = tf(phraseFreq=3.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.09375 = fieldNorm(field=moreWords, doc=6667236) 1.1726664 = weight(moreWords:laser jet in 6667236), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 1.9353869 = fieldWeight(moreWords:laser jet in 6667236), product of: 1.4142135 = tf(phraseFreq=2.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.09375 = fieldNorm(field=moreWords, doc=6667236) /str str name=222045265 2.8656518 = (MATCH) sum of: 1.4294347 = weight(moreWords:p 1102 w in 6684158), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 1.7968225 = fieldWeight(moreWords:p 1102 w in 6684158), product of: 1.0 = tf(phraseFreq=1.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.09375 = fieldNorm(field=moreWords, doc=6684158) 1.4362172 = weight(moreWords:laser jet in 6684158), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 2.3703551 = fieldWeight(moreWords:laser jet in 6684158), product of: 1.7320508 = tf(phraseFreq=3.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.09375 = fieldNorm(field=moreWords, doc=6684158) /str /lst str name=otherQuerysku:213824965 /str lst name=explainOther str name=213824965 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s) 1.1911955 = weight(moreWords:p 1102 w in 32497), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 1.4973521 = fieldWeight(moreWords:p 1102 w in 32497), product of: 1.0 = tf(phraseFreq=1.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.078125 = fieldNorm(field=moreWords, doc=32497) 0.0 = no match on required clause (moreWords:laser jet) 0.0 = weight(moreWords:laser jet in 32497), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 0.0 = fieldWeight(moreWords:laser jet in 32497), product of: 0.0 = tf(phraseFreq=0.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.078125 = fieldNorm(field=moreWords, doc=32497) /str /lst
Re: please help explaining debug output
Hmmm, I'm assuming that moreWords is your default text field, yes? But it works for me (tm), using 1.4.1. What version of Solr are you on? Also, take a glance at the admin/analysis page, that might help... Gotta run Erick On Mon, Jul 25, 2011 at 4:52 PM, Robert Petersen rober...@buy.com wrote: Sorry, to clarify a search for P1102W matches all three docs but a search for p1102w LaserJet only matches the second two. Someone asked me a question while I was typing and I got distracted, apologies for any confusion. -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Monday, July 25, 2011 1:42 PM To: solr-user@lucene.apache.org Subject: please help explaining debug output I have three documents with the following product titles in a text field called moreWords with analysis stack matching the solr example text field definition. 1. HP LaserJet P1102W Monochrome Laser Printer http://www.buy.com/prod/hp-laserjet-p1102w-monochrome-laser-printer/q/l oc/101/213824965.html 2. HP CE285A (85A) Remanufactured Black Toner Cartridge for LaserJet M1212nf, P1102, P1102W Series http://www.buy.com/prod/hp-ce285a-85a-remanufactured-black-toner-cartri dge-for-laserjet/q/loc/101/217145536.html 3. Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet M1130, LaserJet M1132, LaserJet M1210 http://www.buy.com/prod/black-hp-ce285a-toner-cartridge-for-laserjet-p1 102w-laserjet-m1130/q/loc/101/222045267.html A search for P1102W matches (2) and (3), but not (1) above. Can someone explain the debug output? It looks like I am getting a non-match on (1) because term frequency is zero? Am I reading that right? If so, how could that be? the searched terms are equivalently in all three docs. I don't get it. lst name=debug str name=rawquerystringp1102w LaserJet /str str name=querystringp1102w LaserJet /str str name=parsedquery+PhraseQuery(moreWords:p 1102 w) +PhraseQuery(moreWords:laser jet)/str str name=parsedquery_toString+moreWords:p 1102 w +moreWords:laser jet/str lst name=explain str name=222045267 3.64852 = (MATCH) sum of: 2.4758534 = weight(moreWords:p 1102 w in 6667236), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 3.1121879 = fieldWeight(moreWords:p 1102 w in 6667236), product of: 1.7320508 = tf(phraseFreq=3.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.09375 = fieldNorm(field=moreWords, doc=6667236) 1.1726664 = weight(moreWords:laser jet in 6667236), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 1.9353869 = fieldWeight(moreWords:laser jet in 6667236), product of: 1.4142135 = tf(phraseFreq=2.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.09375 = fieldNorm(field=moreWords, doc=6667236) /str str name=222045265 2.8656518 = (MATCH) sum of: 1.4294347 = weight(moreWords:p 1102 w in 6684158), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 1.7968225 = fieldWeight(moreWords:p 1102 w in 6684158), product of: 1.0 = tf(phraseFreq=1.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.09375 = fieldNorm(field=moreWords, doc=6684158) 1.4362172 = weight(moreWords:laser jet in 6684158), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 2.3703551 = fieldWeight(moreWords:laser jet in 6684158), product of: 1.7320508 = tf(phraseFreq=3.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.09375 = fieldNorm(field=moreWords, doc=6684158) /str /lst str name=otherQuerysku:213824965 /str lst name=explainOther str name=213824965 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s) 1.1911955 = weight(moreWords:p 1102 w in 32497), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 1.4973521 = fieldWeight(moreWords:p 1102 w in 32497), product of: 1.0 = tf(phraseFreq=1.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.078125 = fieldNorm(field=moreWords, doc=32497) 0.0 = no match on required clause (moreWords:laser jet) 0.0 = weight(moreWords:laser jet in 32497), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 0.0 = fieldWeight(moreWords:laser jet in 32497), product of: 0.0 = tf
RE: please help explaining debug output
I'm still on solr 1.4.0 and the analysis page looks like they should match, and other products with the same content do in fact match. I'm reindexing the non-matching ones to rule that out. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, July 25, 2011 1:58 PM To: solr-user@lucene.apache.org Subject: Re: please help explaining debug output Hmmm, I'm assuming that moreWords is your default text field, yes? But it works for me (tm), using 1.4.1. What version of Solr are you on? Also, take a glance at the admin/analysis page, that might help... Gotta run Erick On Mon, Jul 25, 2011 at 4:52 PM, Robert Petersen rober...@buy.com wrote: Sorry, to clarify a search for P1102W matches all three docs but a search for p1102w LaserJet only matches the second two. Someone asked me a question while I was typing and I got distracted, apologies for any confusion. -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Monday, July 25, 2011 1:42 PM To: solr-user@lucene.apache.org Subject: please help explaining debug output I have three documents with the following product titles in a text field called moreWords with analysis stack matching the solr example text field definition. 1. HP LaserJet P1102W Monochrome Laser Printer http://www.buy.com/prod/hp-laserjet-p1102w-monochrome-laser-printer/q/l oc/101/213824965.html 2. HP CE285A (85A) Remanufactured Black Toner Cartridge for LaserJet M1212nf, P1102, P1102W Series http://www.buy.com/prod/hp-ce285a-85a-remanufactured-black-toner-cartri dge-for-laserjet/q/loc/101/217145536.html 3. Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet M1130, LaserJet M1132, LaserJet M1210 http://www.buy.com/prod/black-hp-ce285a-toner-cartridge-for-laserjet-p1 102w-laserjet-m1130/q/loc/101/222045267.html A search for P1102W matches (2) and (3), but not (1) above. Can someone explain the debug output? It looks like I am getting a non-match on (1) because term frequency is zero? Am I reading that right? If so, how could that be? the searched terms are equivalently in all three docs. I don't get it. lst name=debug str name=rawquerystringp1102w LaserJet /str str name=querystringp1102w LaserJet /str str name=parsedquery+PhraseQuery(moreWords:p 1102 w) +PhraseQuery(moreWords:laser jet)/str str name=parsedquery_toString+moreWords:p 1102 w +moreWords:laser jet/str lst name=explain str name=222045267 3.64852 = (MATCH) sum of: 2.4758534 = weight(moreWords:p 1102 w in 6667236), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 3.1121879 = fieldWeight(moreWords:p 1102 w in 6667236), product of: 1.7320508 = tf(phraseFreq=3.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.09375 = fieldNorm(field=moreWords, doc=6667236) 1.1726664 = weight(moreWords:laser jet in 6667236), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 1.9353869 = fieldWeight(moreWords:laser jet in 6667236), product of: 1.4142135 = tf(phraseFreq=2.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.09375 = fieldNorm(field=moreWords, doc=6667236) /str str name=222045265 2.8656518 = (MATCH) sum of: 1.4294347 = weight(moreWords:p 1102 w in 6684158), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 1.7968225 = fieldWeight(moreWords:p 1102 w in 6684158), product of: 1.0 = tf(phraseFreq=1.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.09375 = fieldNorm(field=moreWords, doc=6684158) 1.4362172 = weight(moreWords:laser jet in 6684158), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 2.3703551 = fieldWeight(moreWords:laser jet in 6684158), product of: 1.7320508 = tf(phraseFreq=3.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.09375 = fieldNorm(field=moreWords, doc=6684158) /str /lst str name=otherQuerysku:213824965 /str lst name=explainOther str name=213824965 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s) 1.1911955 = weight(moreWords:p 1102 w in 32497), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 1.4973521 = fieldWeight(moreWords:p 1102 w in 32497), product of: 1.0 = tf(phraseFreq=1.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.078125 = fieldNorm
Re: please help explaining debug output
Hmmm, I can't find a convenient 1.4.0 to download, but re-indexing is a good idea since this seems like it *should* work. Erick On Mon, Jul 25, 2011 at 5:32 PM, Robert Petersen rober...@buy.com wrote: I'm still on solr 1.4.0 and the analysis page looks like they should match, and other products with the same content do in fact match. I'm reindexing the non-matching ones to rule that out. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, July 25, 2011 1:58 PM To: solr-user@lucene.apache.org Subject: Re: please help explaining debug output Hmmm, I'm assuming that moreWords is your default text field, yes? But it works for me (tm), using 1.4.1. What version of Solr are you on? Also, take a glance at the admin/analysis page, that might help... Gotta run Erick On Mon, Jul 25, 2011 at 4:52 PM, Robert Petersen rober...@buy.com wrote: Sorry, to clarify a search for P1102W matches all three docs but a search for p1102w LaserJet only matches the second two. Someone asked me a question while I was typing and I got distracted, apologies for any confusion. -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Monday, July 25, 2011 1:42 PM To: solr-user@lucene.apache.org Subject: please help explaining debug output I have three documents with the following product titles in a text field called moreWords with analysis stack matching the solr example text field definition. 1. HP LaserJet P1102W Monochrome Laser Printer http://www.buy.com/prod/hp-laserjet-p1102w-monochrome-laser-printer/q/l oc/101/213824965.html 2. HP CE285A (85A) Remanufactured Black Toner Cartridge for LaserJet M1212nf, P1102, P1102W Series http://www.buy.com/prod/hp-ce285a-85a-remanufactured-black-toner-cartri dge-for-laserjet/q/loc/101/217145536.html 3. Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet M1130, LaserJet M1132, LaserJet M1210 http://www.buy.com/prod/black-hp-ce285a-toner-cartridge-for-laserjet-p1 102w-laserjet-m1130/q/loc/101/222045267.html A search for P1102W matches (2) and (3), but not (1) above. Can someone explain the debug output? It looks like I am getting a non-match on (1) because term frequency is zero? Am I reading that right? If so, how could that be? the searched terms are equivalently in all three docs. I don't get it. lst name=debug str name=rawquerystringp1102w LaserJet /str str name=querystringp1102w LaserJet /str str name=parsedquery+PhraseQuery(moreWords:p 1102 w) +PhraseQuery(moreWords:laser jet)/str str name=parsedquery_toString+moreWords:p 1102 w +moreWords:laser jet/str lst name=explain str name=222045267 3.64852 = (MATCH) sum of: 2.4758534 = weight(moreWords:p 1102 w in 6667236), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 3.1121879 = fieldWeight(moreWords:p 1102 w in 6667236), product of: 1.7320508 = tf(phraseFreq=3.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.09375 = fieldNorm(field=moreWords, doc=6667236) 1.1726664 = weight(moreWords:laser jet in 6667236), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 1.9353869 = fieldWeight(moreWords:laser jet in 6667236), product of: 1.4142135 = tf(phraseFreq=2.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.09375 = fieldNorm(field=moreWords, doc=6667236) /str str name=222045265 2.8656518 = (MATCH) sum of: 1.4294347 = weight(moreWords:p 1102 w in 6684158), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.041507367 = queryNorm 1.7968225 = fieldWeight(moreWords:p 1102 w in 6684158), product of: 1.0 = tf(phraseFreq=1.0) 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) 0.09375 = fieldNorm(field=moreWords, doc=6684158) 1.4362172 = weight(moreWords:laser jet in 6684158), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 2.3703551 = fieldWeight(moreWords:laser jet in 6684158), product of: 1.7320508 = tf(phraseFreq=3.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.09375 = fieldNorm(field=moreWords, doc=6684158) /str /lst str name=otherQuerysku:213824965 /str lst name=explainOther str name=213824965 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s) 1.1911955 = weight(moreWords:p 1102 w in 32497), product of: 0.7955347 = queryWeight(moreWords:p 1102 w), product of: 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720