Re: Boosting Documents using the field Value
Hi Erick, Finally Made it work. bf=if(exists(query($qqone)),one_score,0)=one_query:\"google cloud\" Thanks a lot for guiding, also reminding its not url escape. No analyzers used. Regards, Govind On Tue, Jun 27, 2017 at 11:01 AM, govind nitkwrote: > Hi Erick, > I accept, I should have mentioned the what I was doing first. > > field types: > one_query is "string", > one_score is float. > > So No explicit analyzers. > > mentioned sow=false. and escaping as you mentioned. But still the error > persist. - undefined field "cloud" > > Will get back. > > Regards, > Givind > > On Tue, Jun 27, 2017 at 8:44 AM, Erick Erickson > wrote: > >> bq: So, ultimate goal is when the exact query matches in field >> one_query, apply boost of one_score >> >> It would have been helpful to have made that statement in the first >> place, would have saved some false paths. >> >> What is your analysis chain here? If it's anything like "text_general" >> or the like then you're going to have some trouble. I'd think about an >> analysis chain like KeywordTokenizerFactory and >> LowercaseFilterFactory. That'll index the entire field as a single >> token. The admin/analysis page is your friend. >> >> To search against it, you need to _escape_ the space (not "url >> escape"). As in google\ cloud so that makes it through the query >> parser as a single token. >> >> As of Solr 6.5 you can also specify sow=false (Split On Whitespace), >> which may be a better option, see: >> https://issues.apache.org/jira/browse/SOLR-9185 >> >> Best, >> Erick >> >> On Mon, Jun 26, 2017 at 7:32 PM, govind nitk >> wrote: >> > Hi Developers, Erick >> > >> > I am able to add boost through function as below: >> > bf=if(termfreq(one_query,"google"),one_score,0) >> > >> > Problem is when I say "google cloud" as query, it gives error: >> > undefined field: \"cloud\"" >> > >> > I tried encoding the query(%20, + for space), but not able to get it >> > working. >> > >> > So, ultimate goal is when the exact query matches in field one_query, >> apply >> > boost of one_score. >> > >> > Is there any way to do this? Or a PR is needed. >> > >> > >> > Regards, >> > Govind >> > >> > >> > On Mon, Jun 26, 2017 at 11:14 AM, govind nitk >> wrote: >> > >> >> >> >> Hi Erick, >> >> >> >> Exactly this is what I was looking for. >> >> Thanks a lot. >> >> >> >> >> >> Regards, >> >> Govind >> >> >> >> On Mon, Jun 26, 2017 at 12:03 AM, Erick Erickson < >> erickerick...@gmail.com> >> >> wrote: >> >> >> >>> Take a look at function queries. You're probably looking for "field", >> >>> "termfreq" and "if" functions or some other combination like that. >> >>> >> >>> On Sun, Jun 25, 2017 at 9:01 AM, govind nitk >> >>> wrote: >> >>> > Hi Erik, Thanks for the reply. >> >>> > >> >>> > My intention of using the domain_ct in the qf was, giving the weight >> >>> > present in the that document. >> >>> > >> >>> > e.g >> >>> > qf=category^domain_ct >> >>> > >> >>> > if the current query matched in the category, the boost given will >> be >> >>> > domain_ct, which is present in the current matched document. >> >>> > >> >>> > >> >>> > So if I have category_1ct, category_2ct, category_3ct, category_4ct >> as 4 >> >>> > indexed categories(text_general fields) and the same document has >> >>> > domain_1ct, domain_2ct, domain_3ct, domain_4ct as 4 different count >> >>> > fields(int), is there any way to achieve: >> >>> > >> >>> > qf=category_1ct^domain_1ct=category_2ct^domain_2ct=cat >> >>> egory_3ct^domain_3ct=category_4ct^domain_4ct >> >>> > ? >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > Regards >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > On Sat, Jun 24, 2017 at 3:42 PM, Erik Hatcher < >> erik.hatc...@gmail.com> >> >>> > wrote: >> >>> > >> >>> >> With dismax use bf=domain_ct. you can also use boost=domain_ct with >> >>> >> edismax. >> >>> >> >> >>> >> > On Jun 23, 2017, at 23:01, govind nitk >> >>> wrote: >> >>> >> > >> >>> >> > Hi Solr, >> >>> >> > >> >>> >> > My Index Data: >> >>> >> > >> >>> >> > id name category domain domain_ct >> >>> >> > 1 Banana Fruits Home > Fruits > Banana 2 >> >>> >> > 2 Orange Fruits Home > Fruits > Orange 4 >> >>> >> > 3 Samsung Mobile Electronics > Mobile > Samsung 3 >> >>> >> > >> >>> >> > >> >>> >> > I am able to retrieve the documents with dismax parser with the >> >>> weights >> >>> >> > mentioned as below. >> >>> >> > >> >>> >> > http://localhost:8983/solr/my_index/select?defType=dismax; >> >>> >> indent=on=fruits=category >> >>> >> > ^0.9=name^0.7=json >> >>> >> > >> >>> >> > >> >>> >> > Is it possible to retrieve the documents with weight taken from >> the >> >>> >> indexed >> >>> >> > field like: >> >>> >> > >> >>> >> > http://localhost:8983/solr/my_index/select?defType=dismax; >> >>> >> indent=on=fruits=category >> >>> >> > ^domain_ct=name^domain_ct=json >> >>> >> > >> >>> >> > Is this possible to give weight
Re: Boosting Documents using the field Value
Hi Erick, I accept, I should have mentioned the what I was doing first. field types: one_query is "string", one_score is float. So No explicit analyzers. mentioned sow=false. and escaping as you mentioned. But still the error persist. - undefined field "cloud" Will get back. Regards, Givind On Tue, Jun 27, 2017 at 8:44 AM, Erick Ericksonwrote: > bq: So, ultimate goal is when the exact query matches in field > one_query, apply boost of one_score > > It would have been helpful to have made that statement in the first > place, would have saved some false paths. > > What is your analysis chain here? If it's anything like "text_general" > or the like then you're going to have some trouble. I'd think about an > analysis chain like KeywordTokenizerFactory and > LowercaseFilterFactory. That'll index the entire field as a single > token. The admin/analysis page is your friend. > > To search against it, you need to _escape_ the space (not "url > escape"). As in google\ cloud so that makes it through the query > parser as a single token. > > As of Solr 6.5 you can also specify sow=false (Split On Whitespace), > which may be a better option, see: > https://issues.apache.org/jira/browse/SOLR-9185 > > Best, > Erick > > On Mon, Jun 26, 2017 at 7:32 PM, govind nitk > wrote: > > Hi Developers, Erick > > > > I am able to add boost through function as below: > > bf=if(termfreq(one_query,"google"),one_score,0) > > > > Problem is when I say "google cloud" as query, it gives error: > > undefined field: \"cloud\"" > > > > I tried encoding the query(%20, + for space), but not able to get it > > working. > > > > So, ultimate goal is when the exact query matches in field one_query, > apply > > boost of one_score. > > > > Is there any way to do this? Or a PR is needed. > > > > > > Regards, > > Govind > > > > > > On Mon, Jun 26, 2017 at 11:14 AM, govind nitk > wrote: > > > >> > >> Hi Erick, > >> > >> Exactly this is what I was looking for. > >> Thanks a lot. > >> > >> > >> Regards, > >> Govind > >> > >> On Mon, Jun 26, 2017 at 12:03 AM, Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > >>> Take a look at function queries. You're probably looking for "field", > >>> "termfreq" and "if" functions or some other combination like that. > >>> > >>> On Sun, Jun 25, 2017 at 9:01 AM, govind nitk > >>> wrote: > >>> > Hi Erik, Thanks for the reply. > >>> > > >>> > My intention of using the domain_ct in the qf was, giving the weight > >>> > present in the that document. > >>> > > >>> > e.g > >>> > qf=category^domain_ct > >>> > > >>> > if the current query matched in the category, the boost given will be > >>> > domain_ct, which is present in the current matched document. > >>> > > >>> > > >>> > So if I have category_1ct, category_2ct, category_3ct, category_4ct > as 4 > >>> > indexed categories(text_general fields) and the same document has > >>> > domain_1ct, domain_2ct, domain_3ct, domain_4ct as 4 different count > >>> > fields(int), is there any way to achieve: > >>> > > >>> > qf=category_1ct^domain_1ct=category_2ct^domain_2ct=cat > >>> egory_3ct^domain_3ct=category_4ct^domain_4ct > >>> > ? > >>> > > >>> > > >>> > > >>> > > >>> > Regards > >>> > > >>> > > >>> > > >>> > > >>> > On Sat, Jun 24, 2017 at 3:42 PM, Erik Hatcher < > erik.hatc...@gmail.com> > >>> > wrote: > >>> > > >>> >> With dismax use bf=domain_ct. you can also use boost=domain_ct with > >>> >> edismax. > >>> >> > >>> >> > On Jun 23, 2017, at 23:01, govind nitk > >>> wrote: > >>> >> > > >>> >> > Hi Solr, > >>> >> > > >>> >> > My Index Data: > >>> >> > > >>> >> > id name category domain domain_ct > >>> >> > 1 Banana Fruits Home > Fruits > Banana 2 > >>> >> > 2 Orange Fruits Home > Fruits > Orange 4 > >>> >> > 3 Samsung Mobile Electronics > Mobile > Samsung 3 > >>> >> > > >>> >> > > >>> >> > I am able to retrieve the documents with dismax parser with the > >>> weights > >>> >> > mentioned as below. > >>> >> > > >>> >> > http://localhost:8983/solr/my_index/select?defType=dismax; > >>> >> indent=on=fruits=category > >>> >> > ^0.9=name^0.7=json > >>> >> > > >>> >> > > >>> >> > Is it possible to retrieve the documents with weight taken from > the > >>> >> indexed > >>> >> > field like: > >>> >> > > >>> >> > http://localhost:8983/solr/my_index/select?defType=dismax; > >>> >> indent=on=fruits=category > >>> >> > ^domain_ct=name^domain_ct=json > >>> >> > > >>> >> > Is this possible to give weight from an indexed field ? Am I doing > >>> >> > something wrong? > >>> >> > Is there any other way of doing this? > >>> >> > > >>> >> > > >>> >> > Regards > >>> >> > >>> > >> > >> >
Re: Boosting Documents using the field Value
bq: So, ultimate goal is when the exact query matches in field one_query, apply boost of one_score It would have been helpful to have made that statement in the first place, would have saved some false paths. What is your analysis chain here? If it's anything like "text_general" or the like then you're going to have some trouble. I'd think about an analysis chain like KeywordTokenizerFactory and LowercaseFilterFactory. That'll index the entire field as a single token. The admin/analysis page is your friend. To search against it, you need to _escape_ the space (not "url escape"). As in google\ cloud so that makes it through the query parser as a single token. As of Solr 6.5 you can also specify sow=false (Split On Whitespace), which may be a better option, see: https://issues.apache.org/jira/browse/SOLR-9185 Best, Erick On Mon, Jun 26, 2017 at 7:32 PM, govind nitkwrote: > Hi Developers, Erick > > I am able to add boost through function as below: > bf=if(termfreq(one_query,"google"),one_score,0) > > Problem is when I say "google cloud" as query, it gives error: > undefined field: \"cloud\"" > > I tried encoding the query(%20, + for space), but not able to get it > working. > > So, ultimate goal is when the exact query matches in field one_query, apply > boost of one_score. > > Is there any way to do this? Or a PR is needed. > > > Regards, > Govind > > > On Mon, Jun 26, 2017 at 11:14 AM, govind nitk wrote: > >> >> Hi Erick, >> >> Exactly this is what I was looking for. >> Thanks a lot. >> >> >> Regards, >> Govind >> >> On Mon, Jun 26, 2017 at 12:03 AM, Erick Erickson >> wrote: >> >>> Take a look at function queries. You're probably looking for "field", >>> "termfreq" and "if" functions or some other combination like that. >>> >>> On Sun, Jun 25, 2017 at 9:01 AM, govind nitk >>> wrote: >>> > Hi Erik, Thanks for the reply. >>> > >>> > My intention of using the domain_ct in the qf was, giving the weight >>> > present in the that document. >>> > >>> > e.g >>> > qf=category^domain_ct >>> > >>> > if the current query matched in the category, the boost given will be >>> > domain_ct, which is present in the current matched document. >>> > >>> > >>> > So if I have category_1ct, category_2ct, category_3ct, category_4ct as 4 >>> > indexed categories(text_general fields) and the same document has >>> > domain_1ct, domain_2ct, domain_3ct, domain_4ct as 4 different count >>> > fields(int), is there any way to achieve: >>> > >>> > qf=category_1ct^domain_1ct=category_2ct^domain_2ct=cat >>> egory_3ct^domain_3ct=category_4ct^domain_4ct >>> > ? >>> > >>> > >>> > >>> > >>> > Regards >>> > >>> > >>> > >>> > >>> > On Sat, Jun 24, 2017 at 3:42 PM, Erik Hatcher >>> > wrote: >>> > >>> >> With dismax use bf=domain_ct. you can also use boost=domain_ct with >>> >> edismax. >>> >> >>> >> > On Jun 23, 2017, at 23:01, govind nitk >>> wrote: >>> >> > >>> >> > Hi Solr, >>> >> > >>> >> > My Index Data: >>> >> > >>> >> > id name category domain domain_ct >>> >> > 1 Banana Fruits Home > Fruits > Banana 2 >>> >> > 2 Orange Fruits Home > Fruits > Orange 4 >>> >> > 3 Samsung Mobile Electronics > Mobile > Samsung 3 >>> >> > >>> >> > >>> >> > I am able to retrieve the documents with dismax parser with the >>> weights >>> >> > mentioned as below. >>> >> > >>> >> > http://localhost:8983/solr/my_index/select?defType=dismax; >>> >> indent=on=fruits=category >>> >> > ^0.9=name^0.7=json >>> >> > >>> >> > >>> >> > Is it possible to retrieve the documents with weight taken from the >>> >> indexed >>> >> > field like: >>> >> > >>> >> > http://localhost:8983/solr/my_index/select?defType=dismax; >>> >> indent=on=fruits=category >>> >> > ^domain_ct=name^domain_ct=json >>> >> > >>> >> > Is this possible to give weight from an indexed field ? Am I doing >>> >> > something wrong? >>> >> > Is there any other way of doing this? >>> >> > >>> >> > >>> >> > Regards >>> >> >>> >> >>
Re: Boosting Documents using the field Value
Hi Developers, Erick I am able to add boost through function as below: bf=if(termfreq(one_query,"google"),one_score,0) Problem is when I say "google cloud" as query, it gives error: undefined field: \"cloud\"" I tried encoding the query(%20, + for space), but not able to get it working. So, ultimate goal is when the exact query matches in field one_query, apply boost of one_score. Is there any way to do this? Or a PR is needed. Regards, Govind On Mon, Jun 26, 2017 at 11:14 AM, govind nitkwrote: > > Hi Erick, > > Exactly this is what I was looking for. > Thanks a lot. > > > Regards, > Govind > > On Mon, Jun 26, 2017 at 12:03 AM, Erick Erickson > wrote: > >> Take a look at function queries. You're probably looking for "field", >> "termfreq" and "if" functions or some other combination like that. >> >> On Sun, Jun 25, 2017 at 9:01 AM, govind nitk >> wrote: >> > Hi Erik, Thanks for the reply. >> > >> > My intention of using the domain_ct in the qf was, giving the weight >> > present in the that document. >> > >> > e.g >> > qf=category^domain_ct >> > >> > if the current query matched in the category, the boost given will be >> > domain_ct, which is present in the current matched document. >> > >> > >> > So if I have category_1ct, category_2ct, category_3ct, category_4ct as 4 >> > indexed categories(text_general fields) and the same document has >> > domain_1ct, domain_2ct, domain_3ct, domain_4ct as 4 different count >> > fields(int), is there any way to achieve: >> > >> > qf=category_1ct^domain_1ct=category_2ct^domain_2ct=cat >> egory_3ct^domain_3ct=category_4ct^domain_4ct >> > ? >> > >> > >> > >> > >> > Regards >> > >> > >> > >> > >> > On Sat, Jun 24, 2017 at 3:42 PM, Erik Hatcher >> > wrote: >> > >> >> With dismax use bf=domain_ct. you can also use boost=domain_ct with >> >> edismax. >> >> >> >> > On Jun 23, 2017, at 23:01, govind nitk >> wrote: >> >> > >> >> > Hi Solr, >> >> > >> >> > My Index Data: >> >> > >> >> > id name category domain domain_ct >> >> > 1 Banana Fruits Home > Fruits > Banana 2 >> >> > 2 Orange Fruits Home > Fruits > Orange 4 >> >> > 3 Samsung Mobile Electronics > Mobile > Samsung 3 >> >> > >> >> > >> >> > I am able to retrieve the documents with dismax parser with the >> weights >> >> > mentioned as below. >> >> > >> >> > http://localhost:8983/solr/my_index/select?defType=dismax; >> >> indent=on=fruits=category >> >> > ^0.9=name^0.7=json >> >> > >> >> > >> >> > Is it possible to retrieve the documents with weight taken from the >> >> indexed >> >> > field like: >> >> > >> >> > http://localhost:8983/solr/my_index/select?defType=dismax; >> >> indent=on=fruits=category >> >> > ^domain_ct=name^domain_ct=json >> >> > >> >> > Is this possible to give weight from an indexed field ? Am I doing >> >> > something wrong? >> >> > Is there any other way of doing this? >> >> > >> >> > >> >> > Regards >> >> >> > >
Re: Boosting Documents using the field Value
Hi Erick, Exactly this is what I was looking for. Thanks a lot. Regards, Govind On Mon, Jun 26, 2017 at 12:03 AM, Erick Ericksonwrote: > Take a look at function queries. You're probably looking for "field", > "termfreq" and "if" functions or some other combination like that. > > On Sun, Jun 25, 2017 at 9:01 AM, govind nitk > wrote: > > Hi Erik, Thanks for the reply. > > > > My intention of using the domain_ct in the qf was, giving the weight > > present in the that document. > > > > e.g > > qf=category^domain_ct > > > > if the current query matched in the category, the boost given will be > > domain_ct, which is present in the current matched document. > > > > > > So if I have category_1ct, category_2ct, category_3ct, category_4ct as 4 > > indexed categories(text_general fields) and the same document has > > domain_1ct, domain_2ct, domain_3ct, domain_4ct as 4 different count > > fields(int), is there any way to achieve: > > > > qf=category_1ct^domain_1ct=category_2ct^domain_2ct= > category_3ct^domain_3ct=category_4ct^domain_4ct > > ? > > > > > > > > > > Regards > > > > > > > > > > On Sat, Jun 24, 2017 at 3:42 PM, Erik Hatcher > > wrote: > > > >> With dismax use bf=domain_ct. you can also use boost=domain_ct with > >> edismax. > >> > >> > On Jun 23, 2017, at 23:01, govind nitk wrote: > >> > > >> > Hi Solr, > >> > > >> > My Index Data: > >> > > >> > id name category domain domain_ct > >> > 1 Banana Fruits Home > Fruits > Banana 2 > >> > 2 Orange Fruits Home > Fruits > Orange 4 > >> > 3 Samsung Mobile Electronics > Mobile > Samsung 3 > >> > > >> > > >> > I am able to retrieve the documents with dismax parser with the > weights > >> > mentioned as below. > >> > > >> > http://localhost:8983/solr/my_index/select?defType=dismax; > >> indent=on=fruits=category > >> > ^0.9=name^0.7=json > >> > > >> > > >> > Is it possible to retrieve the documents with weight taken from the > >> indexed > >> > field like: > >> > > >> > http://localhost:8983/solr/my_index/select?defType=dismax; > >> indent=on=fruits=category > >> > ^domain_ct=name^domain_ct=json > >> > > >> > Is this possible to give weight from an indexed field ? Am I doing > >> > something wrong? > >> > Is there any other way of doing this? > >> > > >> > > >> > Regards > >> >
Re: Boosting Documents using the field Value
Take a look at function queries. You're probably looking for "field", "termfreq" and "if" functions or some other combination like that. On Sun, Jun 25, 2017 at 9:01 AM, govind nitkwrote: > Hi Erik, Thanks for the reply. > > My intention of using the domain_ct in the qf was, giving the weight > present in the that document. > > e.g > qf=category^domain_ct > > if the current query matched in the category, the boost given will be > domain_ct, which is present in the current matched document. > > > So if I have category_1ct, category_2ct, category_3ct, category_4ct as 4 > indexed categories(text_general fields) and the same document has > domain_1ct, domain_2ct, domain_3ct, domain_4ct as 4 different count > fields(int), is there any way to achieve: > > qf=category_1ct^domain_1ct=category_2ct^domain_2ct=category_3ct^domain_3ct=category_4ct^domain_4ct > ? > > > > > Regards > > > > > On Sat, Jun 24, 2017 at 3:42 PM, Erik Hatcher > wrote: > >> With dismax use bf=domain_ct. you can also use boost=domain_ct with >> edismax. >> >> > On Jun 23, 2017, at 23:01, govind nitk wrote: >> > >> > Hi Solr, >> > >> > My Index Data: >> > >> > id name category domain domain_ct >> > 1 Banana Fruits Home > Fruits > Banana 2 >> > 2 Orange Fruits Home > Fruits > Orange 4 >> > 3 Samsung Mobile Electronics > Mobile > Samsung 3 >> > >> > >> > I am able to retrieve the documents with dismax parser with the weights >> > mentioned as below. >> > >> > http://localhost:8983/solr/my_index/select?defType=dismax; >> indent=on=fruits=category >> > ^0.9=name^0.7=json >> > >> > >> > Is it possible to retrieve the documents with weight taken from the >> indexed >> > field like: >> > >> > http://localhost:8983/solr/my_index/select?defType=dismax; >> indent=on=fruits=category >> > ^domain_ct=name^domain_ct=json >> > >> > Is this possible to give weight from an indexed field ? Am I doing >> > something wrong? >> > Is there any other way of doing this? >> > >> > >> > Regards >>
Re: Boosting Documents using the field Value
Hi Erik, Thanks for the reply. My intention of using the domain_ct in the qf was, giving the weight present in the that document. e.g qf=category^domain_ct if the current query matched in the category, the boost given will be domain_ct, which is present in the current matched document. So if I have category_1ct, category_2ct, category_3ct, category_4ct as 4 indexed categories(text_general fields) and the same document has domain_1ct, domain_2ct, domain_3ct, domain_4ct as 4 different count fields(int), is there any way to achieve: qf=category_1ct^domain_1ct=category_2ct^domain_2ct=category_3ct^domain_3ct=category_4ct^domain_4ct ? Regards On Sat, Jun 24, 2017 at 3:42 PM, Erik Hatcherwrote: > With dismax use bf=domain_ct. you can also use boost=domain_ct with > edismax. > > > On Jun 23, 2017, at 23:01, govind nitk wrote: > > > > Hi Solr, > > > > My Index Data: > > > > id name category domain domain_ct > > 1 Banana Fruits Home > Fruits > Banana 2 > > 2 Orange Fruits Home > Fruits > Orange 4 > > 3 Samsung Mobile Electronics > Mobile > Samsung 3 > > > > > > I am able to retrieve the documents with dismax parser with the weights > > mentioned as below. > > > > http://localhost:8983/solr/my_index/select?defType=dismax; > indent=on=fruits=category > > ^0.9=name^0.7=json > > > > > > Is it possible to retrieve the documents with weight taken from the > indexed > > field like: > > > > http://localhost:8983/solr/my_index/select?defType=dismax; > indent=on=fruits=category > > ^domain_ct=name^domain_ct=json > > > > Is this possible to give weight from an indexed field ? Am I doing > > something wrong? > > Is there any other way of doing this? > > > > > > Regards >
Re: Boosting Documents using the field Value
With dismax use bf=domain_ct. you can also use boost=domain_ct with edismax. > On Jun 23, 2017, at 23:01, govind nitkwrote: > > Hi Solr, > > My Index Data: > > id name category domain domain_ct > 1 Banana Fruits Home > Fruits > Banana 2 > 2 Orange Fruits Home > Fruits > Orange 4 > 3 Samsung Mobile Electronics > Mobile > Samsung 3 > > > I am able to retrieve the documents with dismax parser with the weights > mentioned as below. > > http://localhost:8983/solr/my_index/select?defType=dismax=on=fruits=category > ^0.9=name^0.7=json > > > Is it possible to retrieve the documents with weight taken from the indexed > field like: > > http://localhost:8983/solr/my_index/select?defType=dismax=on=fruits=category > ^domain_ct=name^domain_ct=json > > Is this possible to give weight from an indexed field ? Am I doing > something wrong? > Is there any other way of doing this? > > > Regards
Boosting Documents using the field Value
Hi Solr, My Index Data: id name category domain domain_ct 1 Banana Fruits Home > Fruits > Banana 2 2 Orange Fruits Home > Fruits > Orange 4 3 Samsung Mobile Electronics > Mobile > Samsung 3 I am able to retrieve the documents with dismax parser with the weights mentioned as below. http://localhost:8983/solr/my_index/select?defType=dismax=on=fruits=category ^0.9=name^0.7=json Is it possible to retrieve the documents with weight taken from the indexed field like: http://localhost:8983/solr/my_index/select?defType=dismax=on=fruits=category ^domain_ct=name^domain_ct=json Is this possible to give weight from an indexed field ? Am I doing something wrong? Is there any other way of doing this? Regards
Re: Negative Boosting documents with a certain word
: Right now, I specify the boost for my request handler as: : requestHandler name=/select class=solr.SearchHandler : . : str name=boostln(qty)/str : : /requestHandler : : Is there a way to specify this boost in the Solrconfig.xml? : : I tried: str name=boost(*:* -Refurbished)^10/str and I get the : following exception: : : ERROR - 2015-05-01 15:13:41.609; org.apache.solr.common.SolrException; : org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: : Expected identifier at pos 0 str='(*:* -Refurbished)^10' thta's because the boost option on the edismax parser expects a function, not a query... https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser try adding a bq param... str name=bq(*:* -Refurbished -foo -bar -baz)^10/str -Hoss http://www.lucidworks.com/
Re: Negative Boosting documents with a certain word
.29_boost_to_documents_that_match_a_query.3F The general principle you need to follow is to boost documents that do *not* match your keyword... (*:* -Refurbished)^10 -Hoss http://www.lucidworks.com/ -- View this message in context: http://lucene.472066.n3.nabble.com/Negative-Boosting-documents-with-a-certain-word-tp4203224p4203488.html Sent from the Solr - User mailing list archive at Nabble.com.
Negative Boosting documents with a certain word
Hi, My Solr documents contain descriptions of products, similar to a BestBuy or a NewEgg catalog. I'm wondering if it were possible to push a product down the ranking if it contains a certain word. By this I mean it would still appear in the search results. However, instead of appearing near the top of the results, it would appear further towards the bottom. (I'm assuming this is a called a negative boost.) For example, consider the word: 'Refurbished' or the word: 'Case' If the product description contains the word 'Refurbished' (or the word 'Case') I would like to reduce the ranking of these products. My business logic is that I would rather sell a new Laptop vs a refurbished laptop, or I would rather sell a laptop vs selling a laptop case. So, I would like to see if I can assign products a negative boost if they contain certain words in their description. Thank you in advance for all your help, O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Negative-Boosting-documents-with-a-certain-word-tp4203224.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Negative Boosting documents with a certain word
: My Solr documents contain descriptions of products, similar to a BestBuy or : a NewEgg catalog. I'm wondering if it were possible to push a product down : the ranking if it contains a certain word. By this I mean it would still https://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_negative_.28or_very_low.29_boost_to_documents_that_match_a_query.3F The general principle you need to follow is to boost documents that do *not* match your keyword... (*:* -Refurbished)^10 : appear in the search results. However, instead of appearing near the top of : the results, it would appear further towards the bottom. (I'm assuming this : is a called a negative boost.) : : For example, consider the word: 'Refurbished' or the word: 'Case' : : If the product description contains the word 'Refurbished' (or the word : 'Case') I would like to reduce the ranking of these products. My business : logic is that I would rather sell a new Laptop vs a refurbished laptop, or I : would rather sell a laptop vs selling a laptop case. So, I would like to see : if I can assign products a negative boost if they contain certain words in : their description. : : Thank you in advance for all your help, : O. O. : : : : -- : View this message in context: http://lucene.472066.n3.nabble.com/Negative-Boosting-documents-with-a-certain-word-tp4203224.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss http://www.lucidworks.com/
Re: Boosting documents by categorical preferences
Chris, Sounds good! Thanks for the tips.. I'll be glad to submit my talk to this as I have a writeup pretty much ready to go. Cheers Amit On Tue, Jan 28, 2014 at 11:24 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : The initial results seem to be kinda promising... of course there are many : more optimizations I could do like decay user ratings over time to indicate : that preferences decay over time so a 5 rating a year ago doesn't count as : much as a 5 rating today. : : Hope this helps others. I'll open source what I have soon and post back. If : there is feedback or other thoughts let me know! Hey Amit, Glad to hear your user based boosting experiments are paying off. I would definitely love to see a more detailed writeup down the road showing off how it affects your final user metrics -- or perhaps even give a session on your technique at ApacheCon? http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp -Hoss http://www.lucidworks.com/
Re: Boosting documents by categorical preferences
: The initial results seem to be kinda promising... of course there are many : more optimizations I could do like decay user ratings over time to indicate : that preferences decay over time so a 5 rating a year ago doesn't count as : much as a 5 rating today. : : Hope this helps others. I'll open source what I have soon and post back. If : there is feedback or other thoughts let me know! Hey Amit, Glad to hear your user based boosting experiments are paying off. I would definitely love to see a more detailed writeup down the road showing off how it affects your final user metrics -- or perhaps even give a session on your technique at ApacheCon? http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp -Hoss http://www.lucidworks.com/
Re: Boosting documents by categorical preferences
Hi Chris (and others interested in this), Sorry for dropping off.. I got sidetracked with other work and came back to this and finally got a V1 of this implemented. The final process is as follows: 1) Pre-compute the global categorical num_ratings/average/std-dev (so for Action the average rating may be 3.49 with stdDev of .99) 2) For a given user, retrieve the last X (X for me is 10) ratings and compute the user's categorical affinities by taking the average rating for all movies in that particular category (Action) subtract the global cat average and divide by cat std_dev. Furthermore, multiply this by the fraction of total user ratings in that category. - For example, if a user's last 10 ratings consisted of 9/10 Drama and 1/10 Thriller, the z-score of the Thriller should be discounted relative to that of the Drama so that it's more prominent the user's preference (either positive or negative) to Drama. 3) Sort by the absolute value of the z-score (Thanks Hossman.. great thought). 4) Return the top 3 (arbitrary number) 5) Modify the query to look like the following: qq=tom hanksq={!boost b=$b defType=edismax v=$qq}cat1=category:Childrencat2=category:Fantasycat3=category:Animationb=sum(1,sum(product(query($cat1),0.22267872),product(query($cat2),0.21630952),product(query($cat3),0.21120241))) basically b = 1+(pref1*query(category:something1) + pref2*query(category:something2) + pref3*query(category:something3)) The initial results seem to be kinda promising... of course there are many more optimizations I could do like decay user ratings over time to indicate that preferences decay over time so a 5 rating a year ago doesn't count as much as a 5 rating today. Hope this helps others. I'll open source what I have soon and post back. If there is feedback or other thoughts let me know! Cheers Amit On Fri, Nov 22, 2013 at 11:38 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I thought about that but my concern/question was how. If I used the pow : function then I'm still boosting the bad categories by a small : amount..alternatively I could multiply by a negative number but does that : work as expected? I'm not sure i understand your concern: negative powers would give you values less then 1, positive powers would give you values greater then 1, and then you'd use those values as multiplicitive boosts -- so the values less then 1 would penalize the scores of existing matching docs in the categories the user dislikes. Oh wait ... i see, in your original email (and in my subsequent suggested tweak to use pow()) you were talking about sum()ing up these 3 category boosts (and i cut/pasted sum() in my example as well) ... yeah, using multiplcation there would make more sense if you wanted to do the negative prefrences as well, because then then score of any matching doc will be reduced if it matches on an undesired category -- and the amount it will be reduced will be determined by how strongly it matches on that category (ie: the base score returned by the nested query() func) and how negative the undesired prefrence value (ie: the pow() exponent) is qq=... q={!boost b=$b v=$qq} b=prod(pow(query($cat1,cat1z)),pow(query($cat2,cat2z)),pow(query($cat3,cat3z)) cat1=...action... cat1z=1.48 cat2=...comedy... cat2z=1.33 cat3=...kids... cat3z=-1.7 -Hoss
Boosting documents at index time, based on payloads
Hi, I'm not really sure how/if payloads work (I tried out Rafal Kuc's payload example in Apache Solr 4 Cookbook and did not do what i was expecting - see below what i was expecting to do and please correct me if i was looking for the the wrong droid) What I am trying to achieve is similar to the payload principle, give certain term a boosting value at index time. At query time , if searched by that term, that boost value should influence the scoring, docs with bigger boost values being preferred to the ones with smaller boost values. Can this be achieved using payloads? I expect so, but then how should this behaviour be implemented - the basic recipe failed to work, so I'm a little confused. Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Boosting-documents-at-index-time-based-on-payloads-tp4110661.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting documents by categorical preferences
: I thought about that but my concern/question was how. If I used the pow : function then I'm still boosting the bad categories by a small : amount..alternatively I could multiply by a negative number but does that : work as expected? I'm not sure i understand your concern: negative powers would give you values less then 1, positive powers would give you values greater then 1, and then you'd use those values as multiplicitive boosts -- so the values less then 1 would penalize the scores of existing matching docs in the categories the user dislikes. Oh wait ... i see, in your original email (and in my subsequent suggested tweak to use pow()) you were talking about sum()ing up these 3 category boosts (and i cut/pasted sum() in my example as well) ... yeah, using multiplcation there would make more sense if you wanted to do the negative prefrences as well, because then then score of any matching doc will be reduced if it matches on an undesired category -- and the amount it will be reduced will be determined by how strongly it matches on that category (ie: the base score returned by the nested query() func) and how negative the undesired prefrence value (ie: the pow() exponent) is qq=... q={!boost b=$b v=$qq} b=prod(pow(query($cat1,cat1z)),pow(query($cat2,cat2z)),pow(query($cat3,cat3z)) cat1=...action... cat1z=1.48 cat2=...comedy... cat2z=1.33 cat3=...kids... cat3z=-1.7 -Hoss
Re: Boosting documents by categorical preferences
I thought about that but my concern/question was how. If I used the pow function then I'm still boosting the bad categories by a small amount..alternatively I could multiply by a negative number but does that work as expected? I haven't done much with negative boosting except for the sledgehammer approach of category exclusion through filters. Thanks Amit On Nov 19, 2013 8:51 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : My approach was something like: : 1) Look at the categories that the user has preferred and compute the : z-score : 2) Pick the top 3 among those : 3) Use those to boost search results. I think that totaly makes sense ... the additional bit i was suggesting that you consider is that instead of picking the highest 3 z-scores, pick the z-scores with the greatest absolute value ... that way if someone is a very booring person and their positive interests are all basically exactly the same as the mean for everyone else, but they have some very strong dis-interests you don't bother boosting on those miniscule interests and instead you negatively boost on the things they are antogonistic against. -Hoss
Re: Boosting documents by categorical preferences
: My approach was something like: : 1) Look at the categories that the user has preferred and compute the : z-score : 2) Pick the top 3 among those : 3) Use those to boost search results. I think that totaly makes sense ... the additional bit i was suggesting that you consider is that instead of picking the highest 3 z-scores, pick the z-scores with the greatest absolute value ... that way if someone is a very booring person and their positive interests are all basically exactly the same as the mean for everyone else, but they have some very strong dis-interests you don't bother boosting on those miniscule interests and instead you negatively boost on the things they are antogonistic against. -Hoss
Re: Boosting documents by categorical preferences
Hey Chris, Sorry for the delay and thanks for your response. This was inspired by your talk on boosting and biasing that you presented way back when at a meetup. I'm glad that my general approach seems to make sense. My approach was something like: 1) Look at the categories that the user has preferred and compute the z-score 2) Pick the top 3 among those 3) Use those to boost search results. I'll look at using the boosts as an exponent instead of a multiplier as I think that would make sense.. also as it handles the 0 case. This is for a prototype I am doing but I'll share the results one day in a meetup as I think it'll be kinda interesting. Thanks again Amit On Thu, Nov 14, 2013 at 11:11 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I have a question around boosting. I wanted to use the boost= to write a : nested query that will boost a document based on categorical preferences. You have no idea how stoked I am to see you working on this in a real world application. : Currently I have the weights set to the z-score equivalent of a user's : preference for that category which is simply how many standard deviations : above the global average is this user's preference for that movie category. : : My question though is basically whether or not semantically the equation : query(category:Drama)*some weight + query(category:Comedy)*some weight : + query(category:Action)*some weight makes sense? My gut says that your apprach makes sense -- but if i'm understadning you correclty, i think that you need to add 1 to all your weights: the boost is a multiplier, so if someone's rating for every category is is 0 std devs above the average rating (ie: the most average person imaginable), you don't wnat to give every moving in every category a score of 0. Are you picking the top 3 categories the user prefers as a cut off, or are you arbitrarily using N category boosts for however many N categories the user is above the global average in their pref for that category? Are your prefrences coming from explicit user feedback on the categories (ie: rate how much you like comedies on a scale of 1-5) or are you infering it from user ratings of the movies themselves? (ie: rate this movie, which happens to be an scifi,action,comedy, on a scale of 1-5) ... because if it's hte later you probably want to be careful to also normalize based on how many categories the movie is in. the other thing to consider is wether you want to include negative prefrences (ie: weights less then 1) based on how many std dev the user's average is *below* the global average for a category .. in this case i *think* you'd want to divide the raw value from -1 to get a useful multiplier. Alternatively: you oculd experiment with using the weights as exponents instead of multipliers... b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448)) ...that would simplify the math you'd have to worry about both for the totally boring average user (x**0 = 1) and for the categories users hate (x**-5 = some positive fraction that will act as a penalty) ... but you'd definitley need to run some tests to see if it over boosts as the std dev variations get really high (might want to take a root first before using them as the exponent) -Hoss
Re: Boosting documents by categorical preferences
: I have a question around boosting. I wanted to use the boost= to write a : nested query that will boost a document based on categorical preferences. You have no idea how stoked I am to see you working on this in a real world application. : Currently I have the weights set to the z-score equivalent of a user's : preference for that category which is simply how many standard deviations : above the global average is this user's preference for that movie category. : : My question though is basically whether or not semantically the equation : query(category:Drama)*some weight + query(category:Comedy)*some weight : + query(category:Action)*some weight makes sense? My gut says that your apprach makes sense -- but if i'm understadning you correclty, i think that you need to add 1 to all your weights: the boost is a multiplier, so if someone's rating for every category is is 0 std devs above the average rating (ie: the most average person imaginable), you don't wnat to give every moving in every category a score of 0. Are you picking the top 3 categories the user prefers as a cut off, or are you arbitrarily using N category boosts for however many N categories the user is above the global average in their pref for that category? Are your prefrences coming from explicit user feedback on the categories (ie: rate how much you like comedies on a scale of 1-5) or are you infering it from user ratings of the movies themselves? (ie: rate this movie, which happens to be an scifi,action,comedy, on a scale of 1-5) ... because if it's hte later you probably want to be careful to also normalize based on how many categories the movie is in. the other thing to consider is wether you want to include negative prefrences (ie: weights less then 1) based on how many std dev the user's average is *below* the global average for a category .. in this case i *think* you'd want to divide the raw value from -1 to get a useful multiplier. Alternatively: you oculd experiment with using the weights as exponents instead of multipliers... b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448)) ...that would simplify the math you'd have to worry about both for the totally boring average user (x**0 = 1) and for the categories users hate (x**-5 = some positive fraction that will act as a penalty) ... but you'd definitley need to run some tests to see if it over boosts as the std dev variations get really high (might want to take a root first before using them as the exponent) -Hoss
Boosting documents by categorical preferences
Hi all, I have a question around boosting. I wanted to use the boost= to write a nested query that will boost a document based on categorical preferences. For a movie search for example, say that a user likes drama, comedy, and action. I could use things like qq=q={!boost%20b=$b%20defType=edismax%20v=$qq}b=sum(product(query($cat1),1.482),product(query($cat2),0.1199),product(query($cat3),1.448))cat1=category:Dramacat2=category:Comedycat3=category:Action where cat1=Drama cat2=Comedy cat3=Action Currently I have the weights set to the z-score equivalent of a user's preference for that category which is simply how many standard deviations above the global average is this user's preference for that movie category. My question though is basically whether or not semantically the equation query(category:Drama)*some weight + query(category:Comedy)*some weight + query(category:Action)*some weight makes sense? What are some techniques people use to boost documents based on discrete things like category, manufacturer, genre etc? Thanks! Amit
Re: Boosting Documents
Oh thank you Chris, this is much clearer, and thank you for updating the Wiki too. On 05/22/2013 08:29 PM, Chris Hostetter wrote: : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for : any fields where the index-time boost should be stored. : : In my case where I only need to boost the whole document (not a specific : field), do I have to activate the omitNorms=false for all the fields : in the schema ? docBoost is really just syntactic sugar for a field boost on each field i the document -- it's factored into the norm value for each field in the document. (I'll update the wiki to make this more clear) If you do a query that doesn't utilize any field which has norms, then the docBoost you specified when indexing the document never comes into play. In general, doc boosts and field boosts, and the way they come into play as part of the field norm is fairly inflexible, and (in my opinion) antiquated. A much better way of dealing with this type of problem is also discussed in the section of the wiki you linked to. Imeediately below... http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts ...you'll find... http://wiki.apache.org/solr/SolrRelevancyFAQ#Field_Based_Boosting -Hoss
Re: Boosting Documents
Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/field field name=office boost=2.0Bridgewater/field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/field field name=office boost=2.0Bridgewater/field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/field field name=office boost=2.0Bridgewater/field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
I think that is applicable only for the field level boosting and not at document level boosting. Can you post your query, field definition and results you're expecting. I am using index and query time boosting without any issues so far. also which version of Solr you're using? On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_** attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/**field field name=office boost=2.0Bridgewater/**field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http://wiki.**apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html http:**//lucene.472066.n3.nabble.com/**Boosting-Documents-** tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
I don't know if this can help (since the document boost should be independent of any schema) but here is my schema : |?xml version=1.0 encoding=UTF-8? schema name= version=1.5 types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=long class=solr.TrieLongField sortMissingLast=true precisionStep=0 positionIncrementGap=0 / fieldType name=text class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory maxGramSize=255 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType /types fields field name=Id type=string indexed=true stored=true multiValued=false required=true / field name=Suggestion type=text indexed=true stored=true multiValued=false required=false / field name=Type type=string indexed=true stored=true multiValued=false required=true / field name=Sections type=string indexed=true stored=true multiValued=true required=false / field name=_version_ type=long indexed=true stored=true/ /fields copyField source=Id dest=Suggestion / uniqueKeyId/uniqueKey defaultSearchFieldSuggestion/defaultSearchField /schema| My query is somthing like : Suggestion:Olive Oil. The result is 9 documents, wich all has the same score 11.287682, even if they had been indexed with different boosts (I am sure of this). On 05/22/2013 10:54 AM, Sandeep Mestry wrote: I think that is applicable only for the field level boosting and not at document level boosting. Can you post your query, field definition and results you're expecting. I am using index and query time boosting without any issues so far. also which version of Solr you're using? On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_** attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/**field field name=office boost=2.0Bridgewater/**field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http://wiki.**apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html http:**//lucene.472066.n3.nabble.com/**Boosting-Documents-** tp4064955p4064966.htmlhttp
Re: Boosting Documents
://wiki.apache.org/solr/CommonQueryParameters#sorthttp://wiki.apache.org/solr/**CommonQueryParameters#sort htt**p://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ http://**wiki.apache.org/solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http://wiki.**apache.org/**solr/**SolrRelevancyFAQhttp://apache.org/solr/**SolrRelevancyFAQ http:/**/wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-tp4064955p4064966.html htt**p://nabble.com/Boosting-**Documents-**tp4064955p4064966.**htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html http:**//lucene.472066.n3.**nabble.com/**Boosting-**Documents-**http://lucene.472066.n3.nabble.com/**Boosting-Documents-** tp4064955p4064966.htmlhttp://**lucene.472066.n3.nabble.com/** Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
/ field field name=office boost=2.0Bridgewater/field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/CommonQueryParameters#sorthttp://wiki.apache.org/solr/**CommonQueryParameters#sort htt**p://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ http://**wiki.apache.org/solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http://wiki.**apache.org/**solr/**SolrRelevancyFAQhttp://apache.org/solr/**SolrRelevancyFAQ http:/**/wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-tp4064955p4064966.html htt**p://nabble.com/Boosting-**Documents-**tp4064955p4064966.**htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html http:**//lucene.472066.n3.**nabble.com/**Boosting-**Documents-**http://lucene.472066.n3.nabble.com/**Boosting-Documents-** tp4064955p4064966.htmlhttp://**lucene.472066.n3.nabble.com/** Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
/**SolrRelevancyFAQ#** index-time_**boostshttp://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boosts http://wiki.apache.org/solr/SolrRelevancyFAQ#index- time_boostshttp://wiki.apache.org/**solr/SolrRelevancyFAQ#index-**time_boosts http://wiki.**apache.org/solr/**SolrRelevancyFAQ#index-time_** boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_http://wiki.apache.org/solr/UpdateXmlMessages#Optional_** http://wiki.apache.org/solr/UpdateXmlMessages#Optional_http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_** attributes_for_.22add.22http://wiki.apache.org/solr/** UpdateXmlMessages#Optional_attributes_for_.22add.22http:** //wiki.apache.org/solr/**UpdateXmlMessages#Optional_** attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/ field field name=office boost=2.0Bridgewater/**field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort h**ttp://wiki.apache.org/solr/CommonQueryParameters#sorthttp://wiki.apache.org/solr/**CommonQueryParameters#sort htt**p://wiki.apache.org/**solr/**CommonQueryParameters#**sorthttp://wiki.apache.org/solr/**CommonQueryParameters#sort http://wiki.apache.org/**solr/CommonQueryParameters#**sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/ SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http://**wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ http://**wiki.apache.org/**solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ http**://wiki.apache.org/solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http://wiki.**apache.org/solr/**SolrRelevancyFAQhttp://apache.org/**solr/**SolrRelevancyFAQ http:/**/apache.org/solr/SolrRelevancyFAQhttp://apache.org/solr/**SolrRelevancyFAQ http:/**/wiki.apache.org/**solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http:/**/wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html h**ttp://nabble.com/Boosting-**Documents-** tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-tp4064955p4064966.html htt**p://nabble.com/Boosting-Documents- tp4064955p4064966.**htmlhttp://nabble.com/Boosting-**Documents-**tp4064955p4064966.**html http:**//nabble.com/Boosting-**Documents-**tp4064955p4064966.** htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html http:**//lucene.472066.n3.**n**abble.com/**Boosting- Documents-** http://nabble.com/**Boosting-**Documents-** http://lucene.**472066.n3.nabble.com/Boosting-Documents-**http://lucene.472066.n3.nabble.com/**Boosting-Documents-** tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/http://lucene.472066.n3.nabble.com/** Boosting-Documents-tp4064955p4064966.htmlhttp://** lucene.472066.n3.nabble.com/**Boosting-Documents-** tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
/**SolrRelevancyFAQ#index-time_** boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.**org/solr/**SolrRelevancyFAQ#** index-time_**boostshttp://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boosts http://wiki.apache.org/solr/SolrRelevancyFAQ#index- time_boostshttp://wiki.apache.org/**solr/SolrRelevancyFAQ#index-**time_boosts http://wiki.**apache.org/solr/**SolrRelevancyFAQ#index-time_** boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_http://wiki.apache.org/solr/UpdateXmlMessages#Optional_** http://wiki.apache.org/solr/UpdateXmlMessages#Optional_http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_** attributes_for_.22add.22http://wiki.apache.org/solr/** UpdateXmlMessages#Optional_attributes_for_.22add.22http:** //wiki.apache.org/solr/**UpdateXmlMessages#Optional_** attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/ field field name=office boost=2.0Bridgewater/**field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort h**ttp://wiki.apache.org/solr/CommonQueryParameters#sorthttp://wiki.apache.org/solr/**CommonQueryParameters#sort htt**p://wiki.apache.org/**solr/**CommonQueryParameters#**sorthttp://wiki.apache.org/solr/**CommonQueryParameters#sort http://wiki.apache.org/**solr/CommonQueryParameters#**sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/ SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http://**wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ http://**wiki.apache.org/**solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ http**://wiki.apache.org/solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http://wiki.**apache.org/solr/**SolrRelevancyFAQhttp://apache.org/**solr/**SolrRelevancyFAQ http:/**/apache.org/solr/SolrRelevancyFAQhttp://apache.org/solr/**SolrRelevancyFAQ http:/**/wiki.apache.org/**solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http:/**/wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html h**ttp://nabble.com/Boosting-**Documents-** tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-tp4064955p4064966.html htt**p://nabble.com/Boosting-Documents- tp4064955p4064966.**htmlhttp://nabble.com/Boosting-**Documents-**tp4064955p4064966.**html http:**//nabble.com/Boosting-**Documents-**tp4064955p4064966.** htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html http:**//lucene.472066.n3.**n**abble.com/**Boosting- Documents-** http://nabble.com/**Boosting-**Documents-** http://lucene.**472066.n3.nabble.com/Boosting-Documents-**http://lucene.472066.n3.nabble.com/**Boosting-Documents-** tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/http://lucene.472066.n3.nabble.com/** Boosting-Documents-tp4064955p4064966.htmlhttp://** lucene.472066.n3.nabble.com/**Boosting-Documents-** tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
: NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for : any fields where the index-time boost should be stored. : : In my case where I only need to boost the whole document (not a specific : field), do I have to activate the omitNorms=false for all the fields : in the schema ? docBoost is really just syntactic sugar for a field boost on each field i the document -- it's factored into the norm value for each field in the document. (I'll update the wiki to make this more clear) If you do a query that doesn't utilize any field which has norms, then the docBoost you specified when indexing the document never comes into play. In general, doc boosts and field boosts, and the way they come into play as part of the field norm is fairly inflexible, and (in my opinion) antiquated. A much better way of dealing with this type of problem is also discussed in the section of the wiki you linked to. Imeediately below... http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts ...you'll find... http://wiki.apache.org/solr/SolrRelevancyFAQ#Field_Based_Boosting -Hoss
Boosting Documents
Hi everyone, I have a small (I hope) issue, and I wish someone could point me to the right direction. I have been indexing some documents using Solr 4.1 and specifying different boosts for different types of documents (boost for the whole document). But when searching, I noticed that the scores are the same for all of them and that affected the order (not what I wanted). Does anyone, know if I have to configure something else or what ? I have been using Solr for quite some time (more than a year) but I never used the boosting feature. Thanks.
Re: Boosting Documents
Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting documents with terms derived from clustering - good idea?
Hi, I would take a different approach. Track users' queries and their clicks. Aggregate queries and start thinking of them as tags/labels. Aggregate them and use top N to tag your docs. Alternatively/additionally, extract significant terms and phrases from clicked-to docs and use that to tag your docs. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Tue, May 14, 2013 at 7:04 AM, David Parks davidpark...@yahoo.com wrote: We have a number of queries that produce good results based on the textual data, but are contextually wrong (for example, an SSD hard drive search matches the music album SSD hip hop drives us crazy. Textually a fair match, but SSD is a term that strongly relates to technical documents. We'd like to be able to direct this query more strictly in the direction of the technical documents based on the term SSD. I am considering whether it would be worth trying to cluster all documents, thus tending to group the music with the music and tech items with the tech items. Then pulling out the term vectors that define each group; do a human review of that data; and plug it back into the documents of each cluster as a separate search field that gets boosted. In my head it seems like a plausible way to weigh terms like SSD to the cluster of items that it most closely associates. Should I spend the effort to find out? Yeh or neh?
Boosting documents with terms derived from clustering - good idea?
We have a number of queries that produce good results based on the textual data, but are contextually wrong (for example, an SSD hard drive search matches the music album SSD hip hop drives us crazy. Textually a fair match, but SSD is a term that strongly relates to technical documents. We'd like to be able to direct this query more strictly in the direction of the technical documents based on the term SSD. I am considering whether it would be worth trying to cluster all documents, thus tending to group the music with the music and tech items with the tech items. Then pulling out the term vectors that define each group; do a human review of that data; and plug it back into the documents of each cluster as a separate search field that gets boosted. In my head it seems like a plausible way to weigh terms like SSD to the cluster of items that it most closely associates. Should I spend the effort to find out? Yeh or neh?
Re: Boosting documents matching in a specific shard
Well, the simplest would be to include the shard ID in the document when you index it, then just boost on that field... Best Erick On Thu, Aug 23, 2012 at 8:33 AM, Husain, Yavar yhus...@firstam.com wrote: I am aware that IDF is not distributed. Suppose I have to boost or give higher rank to documents which are matching in a specific/particular shard, how can I accomplish that? ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you.- ** FAFLD
Boosting documents matching in a specific shard
I am aware that IDF is not distributed. Suppose I have to boost or give higher rank to documents which are matching in a specific/particular shard, how can I accomplish that? ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you.- ** FAFLD
Boosting documents based on search term/phrase
Is there a way to boost documents based on the search term/phrase?
Re: Boosting documents based on search term/phrase
Do you mean besides query elevation? http://wiki.apache.org/solr/QueryElevationComponent And besides explicit boosting by the user (the ^ suffix operator after a term/phrase)? -- Jack Krupansky -Original Message- From: Donald Organ Sent: Tuesday, May 01, 2012 3:59 PM To: solr-user Subject: Boosting documents based on search term/phrase Is there a way to boost documents based on the search term/phrase?
Re: Boosting documents based on search term/phrase
query elevation was exactly what I was talking about. Now is there a way to add this to the default query handler? On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky j...@basetechnology.comwrote: Do you mean besides query elevation? http://wiki.apache.org/solr/**QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent And besides explicit boosting by the user (the ^ suffix operator after a term/phrase)? -- Jack Krupansky -Original Message- From: Donald Organ Sent: Tuesday, May 01, 2012 3:59 PM To: solr-user Subject: Boosting documents based on search term/phrase Is there a way to boost documents based on the search term/phrase?
Re: Boosting documents based on search term/phrase
Yes, you can add in last-components section on default query handler. arr name=last-components strelevator/str /arr - Jeevanandam On 02-05-2012 3:53 am, Donald Organ wrote: query elevation was exactly what I was talking about. Now is there a way to add this to the default query handler? On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky j...@basetechnology.comwrote: Do you mean besides query elevation? http://wiki.apache.org/solr/**QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent And besides explicit boosting by the user (the ^ suffix operator after a term/phrase)? -- Jack Krupansky -Original Message- From: Donald Organ Sent: Tuesday, May 01, 2012 3:59 PM To: solr-user Subject: Boosting documents based on search term/phrase Is there a way to boost documents based on the search term/phrase?
Re: Boosting documents based on search term/phrase
Here's some doc from Lucid: http://lucidworks.lucidimagination.com/display/solr/The+Query+Elevation+Component -- Jack Krupansky -Original Message- From: Donald Organ Sent: Tuesday, May 01, 2012 5:23 PM To: solr-user@lucene.apache.org Subject: Re: Boosting documents based on search term/phrase query elevation was exactly what I was talking about. Now is there a way to add this to the default query handler? On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky j...@basetechnology.comwrote: Do you mean besides query elevation? http://wiki.apache.org/solr/**QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent And besides explicit boosting by the user (the ^ suffix operator after a term/phrase)? -- Jack Krupansky -Original Message- From: Donald Organ Sent: Tuesday, May 01, 2012 3:59 PM To: solr-user Subject: Boosting documents based on search term/phrase Is there a way to boost documents based on the search term/phrase?
Re: Boosting documents based on search term/phrase
Hi, Can you please give an example of what you mean? OtisĀ Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spmĀ From: Donald Organ dor...@donaldorgan.com To: solr-user solr-user@lucene.apache.org Sent: Tuesday, May 1, 2012 3:59 PM Subject: Boosting documents based on search term/phrase Is there a way to boost documents based on the search term/phrase?
Re: Boosting documents based on search term/phrase
Perfect, this is working well. On Tue, May 1, 2012 at 5:33 PM, Jeevanandam je...@myjeeva.com wrote: Yes, you can add in last-components section on default query handler. arr name=last-components strelevator/str /arr - Jeevanandam On 02-05-2012 3:53 am, Donald Organ wrote: query elevation was exactly what I was talking about. Now is there a way to add this to the default query handler? On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky j...@basetechnology.com**wrote: Do you mean besides query elevation? http://wiki.apache.org/solr/QueryElevationComponenthttp://wiki.apache.org/solr/**QueryElevationComponent http:/**/wiki.apache.org/solr/**QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent And besides explicit boosting by the user (the ^ suffix operator after a term/phrase)? -- Jack Krupansky -Original Message- From: Donald Organ Sent: Tuesday, May 01, 2012 3:59 PM To: solr-user Subject: Boosting documents based on search term/phrase Is there a way to boost documents based on the search term/phrase?
Re: Boosting documents based on the vote count
Thanks, will look into those. Andu On Mon, Oct 18, 2010 at 4:14 PM, Ahmet Arslan iori...@yahoo.com wrote: I know but I can't figure out what functions to use. :) Oh, I see. Why not just use {!boost b=log(vote)}? May be scale(vote,0.5,10)?
Boosting documents based on the vote count
Hello all, I have a field in my schema which holds the number of votes a document has. How can I boost documents based on that number? Something like the one which has the maximum number has a boost of 10, the one with the smallest number has 0.5 and in between the values get calculated automatically. Thanks, Alexandru Badiu
Re: Boosting documents based on the vote count
I have a field in my schema which holds the number of votes a document has. How can I boost documents based on that number? you can do it with http://wiki.apache.org/solr/FunctionQuery
Re: Boosting documents based on the vote count
I know but I can't figure out what functions to use. :) On Mon, Oct 18, 2010 at 1:38 PM, Ahmet Arslan iori...@yahoo.com wrote: I have a field in my schema which holds the number of votes a document has. How can I boost documents based on that number? you can do it with http://wiki.apache.org/solr/FunctionQuery
Re: Boosting documents based on the vote count
I know but I can't figure out what functions to use. :) Oh, I see. Why not just use {!boost b=log(vote)}? May be scale(vote,0.5,10)?
Re: FunctionQuery and boosting documents using date arithmetic
On 8/12/07, Chris Hostetter [EMAIL PROTECTED] wrote: : I'm having the date boosting function as well. I'm using this function: : F = recip(rord(creationDate),1,1000,1000)^10. However, since I have around : 10,000 of documents added in one day, rord(createDate) returns very : different values for the same createDate. For example, the last document you may want to consider rounding dates down to the nearest day when indexing, that way everything published on the same day would have the same value and thus the same ordinal value. Yeah, and that will save index space and a lot of memory (smaller FieldCache entry) too. -Yonik
Re: FunctionQuery and boosting documents using date arithmetic
On 11/08/07, Chris Hostetter [EMAIL PROTECTED] wrote: i would agree with you there, this is where a more robust (ie: less efficient) DateField-ish class that supports configuration options to specify: 1) the output format 2) the input format(s) 3) the indexed format ...as SimpleDateFormatter pattern strings would be handy. The ValueSource it uses could return seconds (or some other unit based on another config option) since epoch as the intValue. That definitely sounds like a sensible and flexible approach, I'll have to take a closer look at the ValueSource and FunctionQuery classes and see what I can come up with. it's been discussed before, but there are a lot of tricky issues involved which is probably why no one has really tackled it. It does seem somehow related to the issue of making the value of NOW constant during the entire execution of a query, hopefully not in the to-hard basket. be careful what you wish for. you are 100% correct that functions using hte (r)ord value of a DateField aren't a function of true age, but dependong on how you look at it that may be better then using the real age (i think so anyway). I understand the problems you describe with using true age values, although I wonder how much recip() (or perhaps some other logarithmic function) would be able to dampen any unpleasant side-effects created by unusual publishing patterns, not publishing on weekends, etc. Using min age sounds like a much better idea than using NOW to avoid any of the described weirdness too, but that might increase the complexity of the function. I'm still keen to get something working, at least to compare the results it generates with the current ordinal method. Piete
Re: FunctionQuery and boosting documents using date arithmetic
Do you consistently add 10,000 documents to your index every day or does the number of new documents added per day vary? On 11/08/07, climbingrose [EMAIL PROTECTED] wrote: I'm having the date boosting function as well. I'm using this function: F = recip(rord(creationDate),1,1000,1000)^10. However, since I have around 10,000 of documents added in one day, rord(createDate) returns very different values for the same createDate. For example, the last document added with have rord(createdDate) =1 while the last document added will have rord(createdDate) = 10,000. When createDate 10,000, value of F is approaching 0. Therefore, the boost query doesn't make any difference between the the last document added today and the document added 10 days ago. Now if I replace 1000 in F with a large number, say 10, the boost function suddenly gives the last few documents enormous boost and make the other query scores irrelevant. So in my case (and many others' I believe), the true date value would be more appropriate. I'm thinking along the same line of adding timestamp. It wouldn't add much overhead this way, would it?
Re: FunctionQuery and boosting documents using date arithmetic
I'm having the date boosting function as well. I'm using this function: F = recip(rord(creationDate),1,1000,1000)^10. However, since I have around 10,000 of documents added in one day, rord(createDate) returns very different values for the same createDate. For example, the last document added with have rord(createdDate) =1 while the last document added will have rord(createdDate) = 10,000. When createDate 10,000, value of F is approaching 0. Therefore, the boost query doesn't make any difference between the the last document added today and the document added 10 days ago. Now if I replace 1000 in F with a large number, say 10, the boost function suddenly gives the last few documents enormous boost and make the other query scores irrelevant. So in my case (and many others' I believe), the true date value would be more appropriate. I'm thinking along the same line of adding timestamp. It wouldn't add much overhead this way, would it? Regards, On 8/11/07, Chris Hostetter [EMAIL PROTECTED] wrote: : Actually, just thinking about this a bit more, perhaps adding a function : call such as parseDate() might add too much overhead to the actual query, : perhaps it would be better to first convert the date to a timestamp at index : time and store it in a field type slong? This might be more efficient but i would agree with you there, this is where a more robust (ie: less efficient) DateField-ish class that supports configuration options to specify: 1) the output format 2) the input format(s) 3) the indexed format ...as SimpleDateFormatter pattern strings would be handy. The ValueSource it uses could return seconds (or some other unit based on another config option) since epoch as the intValue. it's been discussed before, but there are a lot of tricky issues involved which is probably why no one has really tackled it. : that still leaves the problem of obtaining the current timestamp to use in : the boost function. it would be pretty easy to write a ValueSource that just knew about now as seconds since epoch. : While it seems to work pretty well, I've realised that this may not be : quite as effective as i had hoped given that the calculation is based on the : ordinal of the field value rather than the value of the field itself. In : cases where the field type is 'date' and the actual field values are not : distributed evenly across all documents in the index, the value returned by : rord() is not going to give a true reflection of document age. For example, be careful what you wish for. you are 100% correct that functions using hte (r)ord value of a DateField aren't a function of true age, but dependong on how you look at it that may be better then using the real age (i think so anyway). Why it sounds appealing to say that docA should score half as high as docB if it is twice as old, that typically isn't all that important when dealing with recent dates; and when dealing with older dates the ordinal value tends to approximate it decently well ... where a true measure of age might screw you up is when you have situations where few/no new articles get published on weekends (or late at night). it's also very confusing to people when the ordering of documents changes even though no new documents have been published -- that can easily happen if you are heavily boosting on a true age calculation but will never happen when dealing with an ordinal ranking of documents by age. (allthough, this could be compensated by doing all of your true age calculations relative the min age of all articles in your index -- but you would still get really weird 'big' shifts in scores as soon as that first article gets published on monday morning. -Hoss -- Regards, Cuong Hoang
Re: FunctionQuery and boosting documents using date arithmetic
: Actually, just thinking about this a bit more, perhaps adding a function : call such as parseDate() might add too much overhead to the actual query, : perhaps it would be better to first convert the date to a timestamp at index : time and store it in a field type slong? This might be more efficient but i would agree with you there, this is where a more robust (ie: less efficient) DateField-ish class that supports configuration options to specify: 1) the output format 2) the input format(s) 3) the indexed format ...as SimpleDateFormatter pattern strings would be handy. The ValueSource it uses could return seconds (or some other unit based on another config option) since epoch as the intValue. it's been discussed before, but there are a lot of tricky issues involved which is probably why no one has really tackled it. : that still leaves the problem of obtaining the current timestamp to use in : the boost function. it would be pretty easy to write a ValueSource that just knew about now as seconds since epoch. : While it seems to work pretty well, I've realised that this may not be : quite as effective as i had hoped given that the calculation is based on the : ordinal of the field value rather than the value of the field itself. In : cases where the field type is 'date' and the actual field values are not : distributed evenly across all documents in the index, the value returned by : rord() is not going to give a true reflection of document age. For example, be careful what you wish for. you are 100% correct that functions using hte (r)ord value of a DateField aren't a function of true age, but dependong on how you look at it that may be better then using the real age (i think so anyway). Why it sounds appealing to say that docA should score half as high as docB if it is twice as old, that typically isn't all that important when dealing with recent dates; and when dealing with older dates the ordinal value tends to approximate it decently well ... where a true measure of age might screw you up is when you have situations where few/no new articles get published on weekends (or late at night). it's also very confusing to people when the ordering of documents changes even though no new documents have been published -- that can easily happen if you are heavily boosting on a true age calculation but will never happen when dealing with an ordinal ranking of documents by age. (allthough, this could be compensated by doing all of your true age calculations relative the min age of all articles in your index -- but you would still get really weird 'big' shifts in scores as soon as that first article gets published on monday morning. -Hoss
FunctionQuery and boosting documents using date arithmetic
I've been using a simple variation of the boost function given in the examples used to boost more recent documents: recip(rord(creationDate),1,1000,1000)^1.3 While it seems to work pretty well, I've realised that this may not be quite as effective as i had hoped given that the calculation is based on the ordinal of the field value rather than the value of the field itself. In cases where the field type is 'date' and the actual field values are not distributed evenly across all documents in the index, the value returned by rord() is not going to give a true reflection of document age. For example, using Hoss' new date faceting feature, I can see that the rate at which documents have been added to the index I'm maintaining has been slowly but steadily increasing over the past few months, and I fear this fact will skew the boost value calculated by the function listed above. There doesn't seem to be currently any way of performing date arithmetic or convert a date field into an integer (seconds since epoch?), ideally I'd like to be able to do something like: recip(intval(parseDate('NOW')-parseDate(creationDate)),1,1000,1000)^1.3 so that the function calculates the boost based on the actual document age, rather than the relative age. Does anybody have any thoughts or comments on this approach? cheers, Piete
Re: FunctionQuery and boosting documents using date arithmetic
Actually, just thinking about this a bit more, perhaps adding a function call such as parseDate() might add too much overhead to the actual query, perhaps it would be better to first convert the date to a timestamp at index time and store it in a field type slong? This might be more efficient but that still leaves the problem of obtaining the current timestamp to use in the boost function. On 06/08/07, Pieter Berkel [EMAIL PROTECTED] wrote: I've been using a simple variation of the boost function given in the examples used to boost more recent documents: recip(rord(creationDate),1,1000,1000)^1.3 While it seems to work pretty well, I've realised that this may not be quite as effective as i had hoped given that the calculation is based on the ordinal of the field value rather than the value of the field itself. In cases where the field type is 'date' and the actual field values are not distributed evenly across all documents in the index, the value returned by rord() is not going to give a true reflection of document age. For example, using Hoss' new date faceting feature, I can see that the rate at which documents have been added to the index I'm maintaining has been slowly but steadily increasing over the past few months, and I fear this fact will skew the boost value calculated by the function listed above. There doesn't seem to be currently any way of performing date arithmetic or convert a date field into an integer (seconds since epoch?), ideally I'd like to be able to do something like: recip(intval(parseDate('NOW')-parseDate(creationDate)),1,1000,1000)^ 1.3 so that the function calculates the boost based on the actual document age, rather than the relative age. Does anybody have any thoughts or comments on this approach? cheers, Piete