Re: two word phrase search using dismax
OK, why not just bump the boost on the site field way higher than you already have? A note of caution. You'll drive yourself crazy trying to get *exact* ordering based on some arbitrary (and usually changing) set of requirements. Put what you have working in front of product management and see if it's good enough to let you go on to other higher-value enhancements Best Erick On Mon, Dec 5, 2011 at 6:15 PM, alx...@aim.com wrote: Hi Eric, After reading more about pf param I increased them a few times and this solved options 2, 3, 4 but 1. As an example, for phrase newspaper latimes latimes.com is not even in the results to boost it to the first place and changing mm param to str name=mm1lt;-1 5lt;-2 6lt;90%/str solves only 1,4 but 2,3. Thanks. Alex. -Original Message- From: Erick Erickson erickerick...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Dec 5, 2011 5:52 am Subject: Re: two word phrase search using dismax Have you looked at the pf (phrase fields) parameter of edismax? http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29 Best Erick On Sat, Dec 3, 2011 at 7:04 PM, alx...@aim.com wrote: Hello, Here is my request handler requestHandler name=search class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfsite^1.5 content^0.5 title^1.2/str str name=pfsite^1.5 content^0.5 title^1.2/str str name=flid,title, site/str str name=mm2lt;-1 5lt;-2 6lt;90%/str int name=ps300/int bool name=hltrue/bool str name=q.alt*:*/str str name=hl.flcontent/str str name=f.title.hl.fragsize0/str str name=hl.fragsize165/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str /lst /requestHandler I have made a few tests with debugQuery and realised that for two word phrases, solr takes the first word and gives it a score according to qf param then takes the second word and gives it score and etc, but not to the whole phrase. That is why if one of the words is in the title and one of them in the content then this doc is given higher score than the one that has both words in the content but none in the title. Ideally, I want to achieve the following order. 1. If one (or both) of the words are in field site, then it must be given higher score. 2. Then come docs with both words in the title. 3. Next, docs with both words in the content. 4. And finally docs having either of words in the title and content. I tried to change mm param to str name=mm1lt;-1 5lt;-2 6lt;90%/str This allows to achieve 1,4 but not 2,3 Thanks. Alex. -Original Message- From: Chris Hostetter hossman_luc...@fucit.org To: solr-user solr-user@lucene.apache.org Sent: Thu, Nov 17, 2011 2:17 pm Subject: Re: two word phrase search using dismax : After putting the same score for title and content in qf filed, docs : with both words in content moved to fifth place. The doc in the first, : third and fourth places still have only one of the words in content and : title. The doc in the second place has one of the words in title and : both words in the content but in different places not together. details matter -- if you send futher followup mails the full details of your dismax options and the score explanations for debugQuery are neccessary to be sure people understand what you are describing (a snapshot of reality is far more valuable then a vague description of reality) off hand what you are describing sounds correct -- this is what the dismax parser is really designed to do. even if you have given both title and content equal boosts, your title field is probably shorter then your content field, so words matching once in title are likly to score higher then the same word matching once in content due to length normalization -- and unless you set the tie param to something really high, the score contribution from the highest scoring field (in this case title) will be the dominant factor in the score (it's disjunction *max* by default ... if you make tie=1 then it's disjunction *sum*) you haven't mentioned anything about hte pf param at all which i can only assume means you aren't using it -- the pf param is how you configure that scores should be increased if/when all of the words in teh query string appear together. I would suggest putting all of the fields in your qf param in your pf param as well. -Hoss
Re: two word phrase search using dismax
Have you looked at the pf (phrase fields) parameter of edismax? http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29 Best Erick On Sat, Dec 3, 2011 at 7:04 PM, alx...@aim.com wrote: Hello, Here is my request handler requestHandler name=search class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfsite^1.5 content^0.5 title^1.2/str str name=pfsite^1.5 content^0.5 title^1.2/str str name=flid,title, site/str str name=mm2lt;-1 5lt;-2 6lt;90%/str int name=ps300/int bool name=hltrue/bool str name=q.alt*:*/str str name=hl.flcontent/str str name=f.title.hl.fragsize0/str str name=hl.fragsize165/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str /lst /requestHandler I have made a few tests with debugQuery and realised that for two word phrases, solr takes the first word and gives it a score according to qf param then takes the second word and gives it score and etc, but not to the whole phrase. That is why if one of the words is in the title and one of them in the content then this doc is given higher score than the one that has both words in the content but none in the title. Ideally, I want to achieve the following order. 1. If one (or both) of the words are in field site, then it must be given higher score. 2. Then come docs with both words in the title. 3. Next, docs with both words in the content. 4. And finally docs having either of words in the title and content. I tried to change mm param to str name=mm1lt;-1 5lt;-2 6lt;90%/str This allows to achieve 1,4 but not 2,3 Thanks. Alex. -Original Message- From: Chris Hostetter hossman_luc...@fucit.org To: solr-user solr-user@lucene.apache.org Sent: Thu, Nov 17, 2011 2:17 pm Subject: Re: two word phrase search using dismax : After putting the same score for title and content in qf filed, docs : with both words in content moved to fifth place. The doc in the first, : third and fourth places still have only one of the words in content and : title. The doc in the second place has one of the words in title and : both words in the content but in different places not together. details matter -- if you send futher followup mails the full details of your dismax options and the score explanations for debugQuery are neccessary to be sure people understand what you are describing (a snapshot of reality is far more valuable then a vague description of reality) off hand what you are describing sounds correct -- this is what the dismax parser is really designed to do. even if you have given both title and content equal boosts, your title field is probably shorter then your content field, so words matching once in title are likly to score higher then the same word matching once in content due to length normalization -- and unless you set the tie param to something really high, the score contribution from the highest scoring field (in this case title) will be the dominant factor in the score (it's disjunction *max* by default ... if you make tie=1 then it's disjunction *sum*) you haven't mentioned anything about hte pf param at all which i can only assume means you aren't using it -- the pf param is how you configure that scores should be increased if/when all of the words in teh query string appear together. I would suggest putting all of the fields in your qf param in your pf param as well. -Hoss
Re: two word phrase search using dismax
Hi Eric, After reading more about pf param I increased them a few times and this solved options 2, 3, 4 but 1. As an example, for phrase newspaper latimes latimes.com is not even in the results to boost it to the first place and changing mm param to str name=mm1lt;-1 5lt;-2 6lt;90%/str solves only 1,4 but 2,3. Thanks. Alex. -Original Message- From: Erick Erickson erickerick...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Dec 5, 2011 5:52 am Subject: Re: two word phrase search using dismax Have you looked at the pf (phrase fields) parameter of edismax? http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29 Best Erick On Sat, Dec 3, 2011 at 7:04 PM, alx...@aim.com wrote: Hello, Here is my request handler requestHandler name=search class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfsite^1.5 content^0.5 title^1.2/str str name=pfsite^1.5 content^0.5 title^1.2/str str name=flid,title, site/str str name=mm2lt;-1 5lt;-2 6lt;90%/str int name=ps300/int bool name=hltrue/bool str name=q.alt*:*/str str name=hl.flcontent/str str name=f.title.hl.fragsize0/str str name=hl.fragsize165/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str /lst /requestHandler I have made a few tests with debugQuery and realised that for two word phrases, solr takes the first word and gives it a score according to qf param then takes the second word and gives it score and etc, but not to the whole phrase. That is why if one of the words is in the title and one of them in the content then this doc is given higher score than the one that has both words in the content but none in the title. Ideally, I want to achieve the following order. 1. If one (or both) of the words are in field site, then it must be given higher score. 2. Then come docs with both words in the title. 3. Next, docs with both words in the content. 4. And finally docs having either of words in the title and content. I tried to change mm param to str name=mm1lt;-1 5lt;-2 6lt;90%/str This allows to achieve 1,4 but not 2,3 Thanks. Alex. -Original Message- From: Chris Hostetter hossman_luc...@fucit.org To: solr-user solr-user@lucene.apache.org Sent: Thu, Nov 17, 2011 2:17 pm Subject: Re: two word phrase search using dismax : After putting the same score for title and content in qf filed, docs : with both words in content moved to fifth place. The doc in the first, : third and fourth places still have only one of the words in content and : title. The doc in the second place has one of the words in title and : both words in the content but in different places not together. details matter -- if you send futher followup mails the full details of your dismax options and the score explanations for debugQuery are neccessary to be sure people understand what you are describing (a snapshot of reality is far more valuable then a vague description of reality) off hand what you are describing sounds correct -- this is what the dismax parser is really designed to do. even if you have given both title and content equal boosts, your title field is probably shorter then your content field, so words matching once in title are likly to score higher then the same word matching once in content due to length normalization -- and unless you set the tie param to something really high, the score contribution from the highest scoring field (in this case title) will be the dominant factor in the score (it's disjunction *max* by default ... if you make tie=1 then it's disjunction *sum*) you haven't mentioned anything about hte pf param at all which i can only assume means you aren't using it -- the pf param is how you configure that scores should be increased if/when all of the words in teh query string appear together. I would suggest putting all of the fields in your qf param in your pf param as well. -Hoss
Re: two word phrase search using dismax
Hello, Here is my request handler requestHandler name=search class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfsite^1.5 content^0.5 title^1.2/str str name=pfsite^1.5 content^0.5 title^1.2/str str name=flid,title, site/str str name=mm2lt;-1 5lt;-2 6lt;90%/str int name=ps300/int bool name=hltrue/bool str name=q.alt*:*/str str name=hl.flcontent/str str name=f.title.hl.fragsize0/str str name=hl.fragsize165/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str /lst /requestHandler I have made a few tests with debugQuery and realised that for two word phrases, solr takes the first word and gives it a score according to qf param then takes the second word and gives it score and etc, but not to the whole phrase. That is why if one of the words is in the title and one of them in the content then this doc is given higher score than the one that has both words in the content but none in the title. Ideally, I want to achieve the following order. 1. If one (or both) of the words are in field site, then it must be given higher score. 2. Then come docs with both words in the title. 3. Next, docs with both words in the content. 4. And finally docs having either of words in the title and content. I tried to change mm param to str name=mm1lt;-1 5lt;-2 6lt;90%/str This allows to achieve 1,4 but not 2,3 Thanks. Alex. -Original Message- From: Chris Hostetter hossman_luc...@fucit.org To: solr-user solr-user@lucene.apache.org Sent: Thu, Nov 17, 2011 2:17 pm Subject: Re: two word phrase search using dismax : After putting the same score for title and content in qf filed, docs : with both words in content moved to fifth place. The doc in the first, : third and fourth places still have only one of the words in content and : title. The doc in the second place has one of the words in title and : both words in the content but in different places not together. details matter -- if you send futher followup mails the full details of your dismax options and the score explanations for debugQuery are neccessary to be sure people understand what you are describing (a snapshot of reality is far more valuable then a vague description of reality) off hand what you are describing sounds correct -- this is what the dismax parser is really designed to do. even if you have given both title and content equal boosts, your title field is probably shorter then your content field, so words matching once in title are likly to score higher then the same word matching once in content due to length normalization -- and unless you set the tie param to something really high, the score contribution from the highest scoring field (in this case title) will be the dominant factor in the score (it's disjunction *max* by default ... if you make tie=1 then it's disjunction *sum*) you haven't mentioned anything about hte pf param at all which i can only assume means you aren't using it -- the pf param is how you configure that scores should be increased if/when all of the words in teh query string appear together. I would suggest putting all of the fields in your qf param in your pf param as well. -Hoss
Re: two word phrase search using dismax
: After putting the same score for title and content in qf filed, docs : with both words in content moved to fifth place. The doc in the first, : third and fourth places still have only one of the words in content and : title. The doc in the second place has one of the words in title and : both words in the content but in different places not together. details matter -- if you send futher followup mails the full details of your dismax options and the score explanations for debugQuery are neccessary to be sure people understand what you are describing (a snapshot of reality is far more valuable then a vague description of reality) off hand what you are describing sounds correct -- this is what the dismax parser is really designed to do. even if you have given both title and content equal boosts, your title field is probably shorter then your content field, so words matching once in title are likly to score higher then the same word matching once in content due to length normalization -- and unless you set the tie param to something really high, the score contribution from the highest scoring field (in this case title) will be the dominant factor in the score (it's disjunction *max* by default ... if you make tie=1 then it's disjunction *sum*) you haven't mentioned anything about hte pf param at all which i can only assume means you aren't using it -- the pf param is how you configure that scores should be increased if/when all of the words in teh query string appear together. I would suggest putting all of the fields in your qf param in your pf param as well. -Hoss
Re: two word phrase search using dismax
Am 14.11.2011 21:50, schrieb alx...@aim.com: Hello, I use solr3.4 and nutch 1.3. In request handler we have str name=mm2lt;-1 5lt;-2 6lt;90%/str As fas as I know this means that for two word phrase search match must be 100%. However, I noticed that in most cases documents with both words are ranked around 20 place. In the first places are documents with one of the words in the phrase. Any ideas why this happening and is it possible to fix it? Hi, are you sure that only one of the words matched in the found documents? Have you checked all fields that are listed in the qf parameter? And did you check for stemmed versions of your search terms? If all this is true, you maybe want to give an example. And AFAIK the mm parameter does not affect the ranking.
Re: two word phrase search using dismax
Hello, Thanks for your letter. I investigated further and found out that we have title scored more than content in qf field and those docs in the first places have one of the words in title but not both of them. The doc in the first place has only one of the words in the content. Docs with both words in content are placed after them in around 20th place. After putting the same score for title and content in qf filed, docs with both words in content moved to fifth place. The doc in the first, third and fourth places still have only one of the words in content and title. The doc in the second place has one of the words in title and both words in the content but in different places not together. Thanks. Alex. -Original Message- From: Michael Kuhlmann k...@solarier.de To: solr-user solr-user@lucene.apache.org Sent: Tue, Nov 15, 2011 12:20 am Subject: Re: two word phrase search using dismax Am 14.11.2011 21:50, schrieb alx...@aim.com: Hello, I use solr3.4 and nutch 1.3. In request handler we have str name=mm2lt;-1 5lt;-2 6lt;90%/str As fas as I know this means that for two word phrase search match must be 100%. However, I noticed that in most cases documents with both words are ranked around 20 place. In the first places are documents with one of the words in the phrase. Any ideas why this happening and is it possible to fix it? Hi, are you sure that only one of the words matched in the found documents? Have you checked all fields that are listed in the qf parameter? And did you check for stemmed versions of your search terms? If all this is true, you maybe want to give an example. And AFAIK the mm parameter does not affect the ranking.