Re: SOLR ranking
;> >>>>>> changed configuration below: >>>>>> >>>>>> Hi Emir, >>>>>> >>>>>> I have changed the cofiguration as per your suggestion, added pf2 / >>>>>> >>>>> pf3. >>>> >>>>> Yes, i saw the difference but still the ranking is not getting >>>>>> >>>>> followed >>> >>>> correctly in case of phrases. >>>>>> >>>>>> Changed configuration; >>>>>> >>>>>> >>>>> >>>>> stored="true" >>>>> >>>>>> /> >>>>>> >>>>> >>>>> stored="false" >>> >>>> /> >>>>> >>>>>> >>>>>> >>>>> stored="true"/> >>>>>> >>>>> >>>>> stored="false"/> >>>>> >>>>>> >>>>>> >>>>> multiValued="true"/> >>>>>> >>>>> >>>>> stored="false" >>> >>>> multiValued="true"/> >>>>>> >>>>>> >>>>> multiValued="true"/> >>>>>> >>>>> >>>>> stored="false" >>>> >>>>> multiValued="true"/> >>>>>> >>>>>> >>>>> >>>>> stored="false"/> >>>> >>>>> >>>>>> Copy fields again for the reference : >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Added following field type: >>>>>> >>>>>> >>>>> positionIncrementGap="100" omitNorms="true"> >>>>>> >>>>>> >>>>>>>>>>> >>>>> ignoreCase="true" >>> >>>> words="stopwords.txt" /> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Removed the string type from the copy fields. >>>>>> >>>>>> Changed Query : >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true; >>> >>>> pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& >>>>>> pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& >>>>>> pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& >>>>>> qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3 >>>>>> >>>>>> After making these changes, I am able to get my search results >>>>>> >>>>> correctly >>>> >>>>> for >>>>> >>>>>> a single term but in case of phrase search, i am still not able to >>>>>> >>>>> get >>> >>>> the >>>>> >>>>>> results in the correct order. >>>>>> >>>>>> Hi Modassar, >>>>>> >>>>>> I tried using mm=100, but the order is still the same. >>>>>> >>>>>> Hi Alessandro, >>>>>> >>>>>> I have not yet tried the slope parameter. By default it is taking it >>>>>> >>>>> as >>> >>>> 1.0 >>>>> >>>>>> when i looked it in debug mode. Will revert you definitely. So, let >>>>>> >>>>> me >>> >>>> try >>>>> >>>>>> this option too. >>>>>> >>>>>> All, >>>>>> >>>>>> Please suggest if anyone is having any other suggestion on this. I >>>>>> >>>>> have >>> >>>> to >>>>> >>>>>> implement it on urgent basis and i think i am very close to it. >>>>>> >>>>> Thanks >>> >>>> all >>>>> >>>>>> of you. I have reached to this level just because of you guys. >>>>>> >>>>>> Thanks and Regards, >>>>>> Nitin >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> >>>>> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html >>>>> >>>>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>>>> >>>>> >>>>> -- >>>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management >>>>> Solr & Elasticsearch Support * http://sematext.com/ >>>>> >>>>> -- >>>>> >>>> Regards, >>>> Binoy Dalal >>>> >>>> >>> >>> >>> -- >>> -- >>> >>> Benedetti Alessandro >>> Visiting card : http://about.me/alessandro_benedetti >>> >>> "Tyger, tyger burning bright >>> In the forests of the night, >>> What immortal hand or eye >>> Could frame thy fearful symmetry?" >>> >>> William Blake - Songs of Experience -1794 England >>> >>> > -- > Ere Maijala > Kansalliskirjasto / The National Library of Finland > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: SOLR ranking
If he needs faceting or something (I didn't see that specified), doing two queries won't do, of course.. --Ere 19.2.2016, 2.22, Binoy Dalal kirjoitti: Hi Alessandro, Don't get me wrong. Using mm, ps and pf can and absolutely will solve his problem. Like I said above, my solution is meant to be a quick and dirty fix. It's really not that complex and shouldn't take more than an hour to setup at the app level. Moreover I suggested it because he said it was urgent for him and setting up a proper config with mm, pf and ps might take him much longer. Hope this clears things up :) On Fri, 19 Feb 2016, 05:31 Alessandro Benedetti <abenede...@apache.org> wrote: Hey Binoi , can't understand why such complexity to be honest :/ Can you explain me why playing with : edismax mm ( percentage of query terms you want to be in the results) pf ( the fields you want to be boosted if phrase matches ) ps ( slop to allow) Should not solve the problem instead of the 2 phases query ? Cheers On 18 February 2016 at 18:09, Binoy Dalal <binoydala...@gmail.com> wrote: Here's an alternative solution that may be of some help. Here I'm assuming that you are not directly outputting the search results to the user and have some sort of layer between the results from solr and presentation to the user where some additional processing can be performed. 1) You already know that you want phrase matches to show up higher than single matches. In this case, why not do an explicit phrase match first, with some slop or as is based on how close you want the phrase terms be to each other. 2) Once you have the results from the first query, fire an OR query with your terms and get those results. 3) Put results from (2) after (1) and present to the user. This happens in the app layer. This is essentially the same as running a query as such: "Rheumatoid Arthritis"~slop OR (Rhuematoid AND Arthritis) but you don't need to worry about the ordering because you're sorting your results. Now, this will obviously take more time since you're querying twice and then doing the addtional processing in the app layer, but provided your architecture is balanced enough and can cope with a little extra load, I do not think that your performance will take that bad a hit. Moreover since you're in a hurry, you could implement this as a quick and dirty solution to meet the project goals, provided it fits the acceptance parameters and then later play around with the scoring/sorting and figure out the best possible setup to suit your needs. On Thu, Feb 18, 2016 at 4:22 PM Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Hi Nitin, Can you send us how your parsed query looks like (from debug output). Thanks, Emir On 17.02.2016 08:38, Nitin.K wrote: Hi Binoy, We are searching for both phrases and individual words but we want that only those documents which are having phrases will come first in the order and then the individual app. termPositions = true is also not working in my case. I have also removed the string type from copy fields. kindly look into the changed configuration below: Hi Emir, I have changed the cofiguration as per your suggestion, added pf2 / pf3. Yes, i saw the difference but still the ranking is not getting followed correctly in case of phrases. Changed configuration; stored="true" /> stored="false" /> stored="false"/> stored="false" multiValued="true"/> stored="false" multiValued="true"/> stored="false"/> Copy fields again for the reference : Added following field type: ignoreCase="true" words="stopwords.txt" /> Removed the string type from the copy fields. Changed Query : http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true; pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3 After making these changes, I am able to get my search results correctly for a single term but in case of phrase search, i am still not able to get the results in the correct order. Hi Modassar, I tried using mm=100, but the order is still the same. Hi Alessandro, I have not yet tried the slope parameter. By default it is taking it as 1.0 when i looked it in debug mode. Will revert you definitely. So, let me try this option too. All, Please suggest if anyone is having any other suggestion on this. I have to implement it on urgent basis and i think i am very close to it. Thanks all of you. I have reached to this level just because of you guys. Thanks and Regards, Nitin -- Vi
Re: SOLR ranking
> > > > > > > > > > > > > > > > > Added following field type: > > > > > > > > > > > positionIncrementGap="100" omitNorms="true"> > > > > > > > > > > > >ignoreCase="true" > > > > words="stopwords.txt" /> > > > > > > > > > > > > > > > > > > > > Removed the string type from the copy fields. > > > > > > > > Changed Query : > > > > > > > > > > > > > > http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true; > > > > pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& > > > > pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& > > > > pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& > > > > qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3 > > > > > > > > After making these changes, I am able to get my search results > > correctly > > > for > > > > a single term but in case of phrase search, i am still not able to > get > > > the > > > > results in the correct order. > > > > > > > > Hi Modassar, > > > > > > > > I tried using mm=100, but the order is still the same. > > > > > > > > Hi Alessandro, > > > > > > > > I have not yet tried the slope parameter. By default it is taking it > as > > > 1.0 > > > > when i looked it in debug mode. Will revert you definitely. So, let > me > > > try > > > > this option too. > > > > > > > > All, > > > > > > > > Please suggest if anyone is having any other suggestion on this. I > have > > > to > > > > implement it on urgent basis and i think i am very close to it. > Thanks > > > all > > > > of you. I have reached to this level just because of you guys. > > > > > > > > Thanks and Regards, > > > > Nitin > > > > > > > > > > > > > > > > -- > > > > View this message in context: > > > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html > > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > -- > > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > -- > > Regards, > > Binoy Dalal > > > > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- Regards, Binoy Dalal
Re: SOLR ranking
Hey Binoi , can't understand why such complexity to be honest :/ Can you explain me why playing with : edismax mm ( percentage of query terms you want to be in the results) pf ( the fields you want to be boosted if phrase matches ) ps ( slop to allow) Should not solve the problem instead of the 2 phases query ? Cheers On 18 February 2016 at 18:09, Binoy Dalal <binoydala...@gmail.com> wrote: > Here's an alternative solution that may be of some help. > Here I'm assuming that you are not directly outputting the search results > to the user and have some sort of layer between the results from solr and > presentation to the user where some additional processing can be performed. > > 1) You already know that you want phrase matches to show up higher than > single matches. In this case, why not do an explicit phrase match first, > with some slop or as is based on how close you want the phrase terms be to > each other. > 2) Once you have the results from the first query, fire an OR query with > your terms and get those results. > 3) Put results from (2) after (1) and present to the user. This happens in > the app layer. > > This is essentially the same as running a query as such: "Rheumatoid > Arthritis"~slop OR (Rhuematoid AND Arthritis) but you don't need to worry > about the ordering because you're sorting your results. > > Now, this will obviously take more time since you're querying twice and > then doing the addtional processing in the app layer, but provided your > architecture is balanced enough and can cope with a little extra load, I do > not think that your performance will take that bad a hit. Moreover since > you're in a hurry, you could implement this as a quick and dirty solution > to meet the project goals, provided it fits the acceptance parameters and > then later play around with the scoring/sorting and figure out the best > possible setup to suit your needs. > > On Thu, Feb 18, 2016 at 4:22 PM Emir Arnautovic < > emir.arnauto...@sematext.com> wrote: > > > Hi Nitin, > > Can you send us how your parsed query looks like (from debug output). > > > > Thanks, > > Emir > > > > On 17.02.2016 08:38, Nitin.K wrote: > > > Hi Binoy, > > > > > > We are searching for both phrases and individual words > > > but we want that only those documents which are having phrases will > come > > > first in the order and then the individual app. > > > > > > termPositions = true is also not working in my case. > > > > > > I have also removed the string type from copy fields. kindly look into > > the > > > changed configuration below: > > > > > > Hi Emir, > > > > > > I have changed the cofiguration as per your suggestion, added pf2 / > pf3. > > > Yes, i saw the difference but still the ranking is not getting followed > > > correctly in case of phrases. > > > > > > Changed configuration; > > > > > > > stored="true" > > > /> > > > > /> > > > > > > > > stored="true"/> > > > > stored="false"/> > > > > > > > > multiValued="true"/> > > > > > multiValued="true"/> > > > > > > > > multiValued="true"/> > > > stored="false" > > > multiValued="true"/> > > > > > > stored="false"/> > > > > > > Copy fields again for the reference : > > > > > > > > > > > > > > > > > > > > > > > > Added following field type: > > > > > > > > positionIncrementGap="100" omitNorms="true"> > > > > > > > > >> > words="stopwords.txt" /> > > > > > > > > > > > > > > > Removed the string type from the copy fields. > > > > > > Changed Query : > > > > > > > > > http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true; > > > pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& > > > pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& > > > pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& > > > qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3 > > > > > > After making these changes, I am able to get my search results > corr
Re: SOLR ranking
Here's an alternative solution that may be of some help. Here I'm assuming that you are not directly outputting the search results to the user and have some sort of layer between the results from solr and presentation to the user where some additional processing can be performed. 1) You already know that you want phrase matches to show up higher than single matches. In this case, why not do an explicit phrase match first, with some slop or as is based on how close you want the phrase terms be to each other. 2) Once you have the results from the first query, fire an OR query with your terms and get those results. 3) Put results from (2) after (1) and present to the user. This happens in the app layer. This is essentially the same as running a query as such: "Rheumatoid Arthritis"~slop OR (Rhuematoid AND Arthritis) but you don't need to worry about the ordering because you're sorting your results. Now, this will obviously take more time since you're querying twice and then doing the addtional processing in the app layer, but provided your architecture is balanced enough and can cope with a little extra load, I do not think that your performance will take that bad a hit. Moreover since you're in a hurry, you could implement this as a quick and dirty solution to meet the project goals, provided it fits the acceptance parameters and then later play around with the scoring/sorting and figure out the best possible setup to suit your needs. On Thu, Feb 18, 2016 at 4:22 PM Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi Nitin, > Can you send us how your parsed query looks like (from debug output). > > Thanks, > Emir > > On 17.02.2016 08:38, Nitin.K wrote: > > Hi Binoy, > > > > We are searching for both phrases and individual words > > but we want that only those documents which are having phrases will come > > first in the order and then the individual app. > > > > termPositions = true is also not working in my case. > > > > I have also removed the string type from copy fields. kindly look into > the > > changed configuration below: > > > > Hi Emir, > > > > I have changed the cofiguration as per your suggestion, added pf2 / pf3. > > Yes, i saw the difference but still the ranking is not getting followed > > correctly in case of phrases. > > > > Changed configuration; > > > > stored="true" > > /> > > /> > > > > > stored="true"/> > > stored="false"/> > > > > > multiValued="true"/> > > > multiValued="true"/> > > > > > multiValued="true"/> > > > multiValued="true"/> > > > > > > > > Copy fields again for the reference : > > > > > > > > > > > > > > > > Added following field type: > > > > > positionIncrementGap="100" omitNorms="true"> > > > > > >> words="stopwords.txt" /> > > > > > > > > > > Removed the string type from the copy fields. > > > > Changed Query : > > > > > http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true; > > pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& > > pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& > > pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& > > qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3 > > > > After making these changes, I am able to get my search results correctly > for > > a single term but in case of phrase search, i am still not able to get > the > > results in the correct order. > > > > Hi Modassar, > > > > I tried using mm=100, but the order is still the same. > > > > Hi Alessandro, > > > > I have not yet tried the slope parameter. By default it is taking it as > 1.0 > > when i looked it in debug mode. Will revert you definitely. So, let me > try > > this option too. > > > > All, > > > > Please suggest if anyone is having any other suggestion on this. I have > to > > implement it on urgent basis and i think i am very close to it. Thanks > all > > of you. I have reached to this level just because of you guys. > > > > Thanks and Regards, > > Nitin > > > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > -- Regards, Binoy Dalal
Re: SOLR ranking
Hi Nitin, Can you send us how your parsed query looks like (from debug output). Thanks, Emir On 17.02.2016 08:38, Nitin.K wrote: Hi Binoy, We are searching for both phrases and individual words but we want that only those documents which are having phrases will come first in the order and then the individual app. termPositions = true is also not working in my case. I have also removed the string type from copy fields. kindly look into the changed configuration below: Hi Emir, I have changed the cofiguration as per your suggestion, added pf2 / pf3. Yes, i saw the difference but still the ranking is not getting followed correctly in case of phrases. Changed configuration; Copy fields again for the reference : Added following field type: Removed the string type from the copy fields. Changed Query : http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true; pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3 After making these changes, I am able to get my search results correctly for a single term but in case of phrase search, i am still not able to get the results in the correct order. Hi Modassar, I tried using mm=100, but the order is still the same. Hi Alessandro, I have not yet tried the slope parameter. By default it is taking it as 1.0 when i looked it in debug mode. Will revert you definitely. So, let me try this option too. All, Please suggest if anyone is having any other suggestion on this. I have to implement it on urgent basis and i think i am very close to it. Thanks all of you. I have reached to this level just because of you guys. Thanks and Regards, Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html Sent from the Solr - User mailing list archive at Nabble.com. -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: SOLR ranking
Hi Binoy, We are searching for both phrases and individual words but we want that only those documents which are having phrases will come first in the order and then the individual app. termPositions = true is also not working in my case. I have also removed the string type from copy fields. kindly look into the changed configuration below: Hi Emir, I have changed the cofiguration as per your suggestion, added pf2 / pf3. Yes, i saw the difference but still the ranking is not getting followed correctly in case of phrases. Changed configuration; Copy fields again for the reference : Added following field type: Removed the string type from the copy fields. Changed Query : http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true; pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6& qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3 After making these changes, I am able to get my search results correctly for a single term but in case of phrase search, i am still not able to get the results in the correct order. Hi Modassar, I tried using mm=100, but the order is still the same. Hi Alessandro, I have not yet tried the slope parameter. By default it is taking it as 1.0 when i looked it in debug mode. Will revert you definitely. So, let me try this option too. All, Please suggest if anyone is having any other suggestion on this. I have to implement it on urgent basis and i think i am very close to it. Thanks all of you. I have reached to this level just because of you guys. Thanks and Regards, Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR ranking
I just want to interject to say one thing: You *can* sort on multi-valued fields as-of recent Solr 5 releases. it's done using the "field" function query with either a "min" or "max" 2nd argument: https://cwiki.apache.org/confluence/display/solr/Function+Queries Of course it'd be nicer to simply sort asc/desc on the field like normally and not use this special syntax but AFAIK that convenience hasn't been added yet. ~ David On Mon, Feb 15, 2016 at 10:26 AM Binoy Dalal <binoydala...@gmail.com> wrote: > I'm sorry, missed that part. It's true, you cannot sort on multivalued > fields. The workaround will be pretty complex; you'll either have to find > the max or min value of the fields at index time and store those in > separate fields and use those to sort, or somehow come up with some > function that can convert the values from your multivalued field into a > single value (something like sum(field)) but it surely won't be trivial. > > Instead you should do what Emir's saying. > Boost your fields at index or query time based on how you want to sort your > documents. > So in your case, give the highest boost to topic_title then a little lower > to subtopic_title and so on. This should return your documents in the > correct order. > You will have to play around with the boost values a little to get them > right, though. > > Alternatively, you could boost on the multivalued fields and then sort > based on your single valued fields. > > Either ways, you'll have to experiment and see what works best for you. > > On Mon, Feb 15, 2016 at 8:21 PM Nitin.K <nitin.kanu...@adi-mps.com> wrote: > > > Thanks Binoy.. > > > > Actually it is throwing following error: > > > > can not sort on multivalued field: index_term > > > > > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257378.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- > Regards, > Binoy Dalal > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
Re: SOLR ranking
Actually you can get it with the edismax. >> > > > > Just set mm to 100% and then configure a pf field ( or more) . >> > > > > You are going to search all the search terms mandatory and boost >> > > phrases >> > > > > match . >> > > > > >> > > > > Cheers >> > > > > >> > > > > On 16 February 2016 at 07:57, Emir Arnautovic < >> > > > > emir.arnauto...@sematext.com> >> > > > > wrote: >> > > > > >> > > > > > Hi Nitin, >> > > > > > You can use pf parameter to boost results with exact phrase. You >> > can >> > > > also >> > > > > > use pf2 and pf3 to boost results with bigrams (phrase matches >> with >> > 2 >> > > > or 3 >> > > > > > words in case input is with more than 3 words) >> > > > > > >> > > > > > Regards, >> > > > > > Emir >> > > > > > >> > > > > > >> > > > > > On 16.02.2016 06:18, Nitin.K wrote: >> > > > > > >> > > > > >> I am using edismax parser with the following query: >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> > > > >> > > >> > >> localhost:8983/solr/tgl/select?q=eating%20disorders=xml=1.0=200=AND=true=edismax=true=true=true=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6 >> > > > > >> >> > > > > >> Configuration of schema.xml >> > > > > >> >> > > > > >> > > > > > stored="true" >> > > > > >> /> >> > > > > >> > > stored="false"/> >> > > > > >> >> > > > > >> > > > > > >> stored="true"/> >> > > > > >> > > > > stored="false"/> >> > > > > >> >> > > > > >> > > stored="true" >> > > > > >> multiValued="true"/> >> > > > > >> > stored="false" >> > > > > >> multiValued="true"/> >> > > > > >> >> > > > > >> > > > > > >> multiValued="true"/> >> > > > > >> > > stored="false" >> > > > > >> multiValued="true"/> >> > > > > >> >> > > > > >> > > > > stored="true"/> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> > > > > > >> positionIncrementGap="100" omitNorms="true"> >> > > > > >> >> > > > > >> > > > > > class="solr.StandardTokenizerFactory"/> >> > > > > >> > > > > > >> ignoreCase="true" >> > > > > >> words="stopwords.txt" /> >> > > > > >> > > > class="solr.LowerCaseFilterFactory"/> >> > > > > >> >> > > > > >> >> > > > > >> > > > > > class="solr.StandardTokenizerFactory"/> >> > > > > >> > > > > > >> ignoreCase="true" >> > > > > >> words="stopwords.txt" /> >> > > > > >> > class="solr.SynonymFilterFactory" >> > > > > >> synonyms="synonyms.txt" >> > > > > >> ignoreCase="true" expand="true"/> >> > > > > >> > > > class="solr.LowerCaseFilterFactory"/> >> > > > > >> >> > > > > >> >> > > > > >> > > > > > >> positionIncrementGap="100" >> > > > > >> omitTermFreqAndPositions="true" omitNorms="true"> >> > > > > >> >> > > > > >> > > > > > >> class="solr.WhitespaceTokenizerFactory"/> >> > > > > >> > > > > > >> ignoreCase="true" >> > > > > >> words="stopwords.txt" /> >> > > > > >> > > > class="solr.LowerCaseFilterFactory"/> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> I want , if user will search for a phrase then that pharse >> should >> > > > always >> > > > > >> takes the priority in comaprison to the individual words; >> > > > > >> >> > > > > >> Example: "Eating Disorders" >> > > > > >> >> > > > > >> First it will search for "Eating Disorders" together and then >> the >> > > > > >> individual >> > > > > >> words "Eating" and "Disorders" >> > > > > >> but while searching for individual words, it will always return >> > > those >> > > > > >> documents where both the words should exist for which i am >> already >> > > > using >> > > > > >> q.op="AND" in my query. >> > > > > >> >> > > > > >> Thanks, >> > > > > >> Nitin >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> -- >> > > > > >> View this message in context: >> > > > > >> >> > > > >> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html >> > > > > >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > > >> >> > > > > > >> > > > > > -- >> > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log >> > > Management >> > > > > > Solr & Elasticsearch Support * http://sematext.com/ >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > -- >> > > > > >> > > > > Benedetti Alessandro >> > > > > Visiting card : http://about.me/alessandro_benedetti >> > > > > >> > > > > "Tyger, tyger burning bright >> > > > > In the forests of the night, >> > > > > What immortal hand or eye >> > > > > Could frame thy fearful symmetry?" >> > > > > >> > > > > William Blake - Songs of Experience -1794 England >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > -- >> > > >> > > Benedetti Alessandro >> > > Visiting card : http://about.me/alessandro_benedetti >> > > >> > > "Tyger, tyger burning bright >> > > In the forests of the night, >> > > What immortal hand or eye >> > > Could frame thy fearful symmetry?" >> > > >> > > William Blake - Songs of Experience -1794 England >> > > >> > >> -- >> Regards, >> Binoy Dalal >> > > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: SOLR ranking
han 3 words) > > > > > > > > > > > > Regards, > > > > > > Emir > > > > > > > > > > > > > > > > > > On 16.02.2016 06:18, Nitin.K wrote: > > > > > > > > > > > >> I am using edismax parser with the following query: > > > > > >> > > > > > >> > > > > > >> > > > > > > > > > > > > > > > localhost:8983/solr/tgl/select?q=eating%20disorders=xml=1.0=200=AND=true=edismax=true=true=true=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6 > > > > > >> > > > > > >> Configuration of schema.xml > > > > > >> > > > > > >> > > > > stored="true" > > > > > >> /> > > > > > >> > stored="false"/> > > > > > >> > > > > > >> > > > > >> stored="true"/> > > > > > >> > > > stored="false"/> > > > > > >> > > > > > >> > stored="true" > > > > > >> multiValued="true"/> > > > > > >> stored="false" > > > > > >> multiValued="true"/> > > > > > >> > > > > > >> > > > > >> multiValued="true"/> > > > > > >> > stored="false" > > > > > >> multiValued="true"/> > > > > > >> > > > > > >> > > > stored="true"/> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> positionIncrementGap="100" omitNorms="true"> > > > > > >> > > > > > >> > > > > class="solr.StandardTokenizerFactory"/> > > > > > >> > > > > >> ignoreCase="true" > > > > > >> words="stopwords.txt" /> > > > > > >> > > class="solr.LowerCaseFilterFactory"/> > > > > > >> > > > > > >> > > > > > >> > > > > class="solr.StandardTokenizerFactory"/> > > > > > >> > > > > >> ignoreCase="true" > > > > > >> words="stopwords.txt" /> > > > > > >> class="solr.SynonymFilterFactory" > > > > > >> synonyms="synonyms.txt" > > > > > >> ignoreCase="true" expand="true"/> > > > > > >> > > class="solr.LowerCaseFilterFactory"/> > > > > > >> > > > > > >> > > > > > >> > > > > >> positionIncrementGap="100" > > > > > >> omitTermFreqAndPositions="true" omitNorms="true"> > > > > > >> > > > > > >> > > > > >> class="solr.WhitespaceTokenizerFactory"/> > > > > > >> > > > > >> ignoreCase="true" > > > > > >> words="stopwords.txt" /> > > > > > >> > > class="solr.LowerCaseFilterFactory"/> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> I want , if user will search for a phrase then that pharse > should > > > > always > > > > > >> takes the priority in comaprison to the individual words; > > > > > >> > > > > > >> Example: "Eating Disorders" > > > > > >> > > > > > >> First it will search for "Eating Disorders" together and then > the > > > > > >> individual > > > > > >> words "Eating" and "Disorders" > > > > > >> but while searching for individual words, it will always return > > > those > > > > > >> documents where both the words should exist for which i am > already > > > > using > > > > > >> q.op="AND" in my query. > > > > > >> > > > > > >> Thanks, > > > > > >> Nitin > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> -- > > > > > >> View this message in context: > > > > > >> > > > > > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html > > > > > >> Sent from the Solr - User mailing list archive at Nabble.com. > > > > > >> > > > > > > > > > > > > -- > > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > > > Management > > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -- > > > > > > > > > > Benedetti Alessandro > > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > > > "Tyger, tyger burning bright > > > > > In the forests of the night, > > > > > What immortal hand or eye > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > -- > > > -- > > > > > > Benedetti Alessandro > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > > > > > > -- > Regards, > Binoy Dalal > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: SOLR ranking
> > >> > > > > >> stored="true" > > > > >> multiValued="true"/> > > > > >> > > > >> multiValued="true"/> > > > > >> > > > > >> > > > >> multiValued="true"/> > > > > >> stored="false" > > > > >> multiValued="true"/> > > > > >> > > > > >> > > stored="true"/> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > >> positionIncrementGap="100" omitNorms="true"> > > > > >> > > > > >> > > > class="solr.StandardTokenizerFactory"/> > > > > >> > > > >> ignoreCase="true" > > > > >> words="stopwords.txt" /> > > > > >> > class="solr.LowerCaseFilterFactory"/> > > > > >> > > > > >> > > > > >> > > > class="solr.StandardTokenizerFactory"/> > > > > >> > > > >> ignoreCase="true" > > > > >> words="stopwords.txt" /> > > > > >> > > > >> synonyms="synonyms.txt" > > > > >> ignoreCase="true" expand="true"/> > > > > >> > class="solr.LowerCaseFilterFactory"/> > > > > >> > > > > >> > > > > >> > > > >> positionIncrementGap="100" > > > > >> omitTermFreqAndPositions="true" omitNorms="true"> > > > > >> > > > > >> > > > >> class="solr.WhitespaceTokenizerFactory"/> > > > > >> > > > >> ignoreCase="true" > > > > >> words="stopwords.txt" /> > > > > >> > class="solr.LowerCaseFilterFactory"/> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> I want , if user will search for a phrase then that pharse should > > > always > > > > >> takes the priority in comaprison to the individual words; > > > > >> > > > > >> Example: "Eating Disorders" > > > > >> > > > > >> First it will search for "Eating Disorders" together and then the > > > > >> individual > > > > >> words "Eating" and "Disorders" > > > > >> but while searching for individual words, it will always return > > those > > > > >> documents where both the words should exist for which i am already > > > using > > > > >> q.op="AND" in my query. > > > > >> > > > > >> Thanks, > > > > >> Nitin > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> -- > > > > >> View this message in context: > > > > >> > > > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html > > > > >> Sent from the Solr - User mailing list archive at Nabble.com. > > > > >> > > > > > > > > > > -- > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > > Management > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > > > > > > -- > > > > -- > > > > > > > > Benedetti Alessandro > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > "Tyger, tyger burning bright > > > > In the forests of the night, > > > > What immortal hand or eye > > > > Could frame thy fearful symmetry?" > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > -- > > -- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > -- Regards, Binoy Dalal
Re: SOLR ranking
/> > > > >> class="solr.LowerCaseFilterFactory"/> > > > >> > > > >> > > > >> > > class="solr.StandardTokenizerFactory"/> > > > >> > > >> ignoreCase="true" > > > >> words="stopwords.txt" /> > > > >> > > >> synonyms="synonyms.txt" > > > >> ignoreCase="true" expand="true"/> > > > >> class="solr.LowerCaseFilterFactory"/> > > > >> > > > >> > > > >> > > >> positionIncrementGap="100" > > > >> omitTermFreqAndPositions="true" omitNorms="true"> > > > >> > > > >> > > >> class="solr.WhitespaceTokenizerFactory"/> > > > >> > > >> ignoreCase="true" > > > >> words="stopwords.txt" /> > > > >> class="solr.LowerCaseFilterFactory"/> > > > >> > > > >> > > > >> > > > >> > > > >> I want , if user will search for a phrase then that pharse should > > always > > > >> takes the priority in comaprison to the individual words; > > > >> > > > >> Example: "Eating Disorders" > > > >> > > > >> First it will search for "Eating Disorders" together and then the > > > >> individual > > > >> words "Eating" and "Disorders" > > > >> but while searching for individual words, it will always return > those > > > >> documents where both the words should exist for which i am already > > using > > > >> q.op="AND" in my query. > > > >> > > > >> Thanks, > > > >> Nitin > > > >> > > > >> > > > >> > > > >> > > > >> -- > > > >> View this message in context: > > > >> > > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html > > > >> Sent from the Solr - User mailing list archive at Nabble.com. > > > >> > > > > > > > > -- > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > Management > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > -- > > > -- > > > > > > Benedetti Alessandro > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >
Re: SOLR ranking
If I remember well , it is going to be as a phrase query ( when you use the "quotes") . So the close proximity means a match of the phrase with 0 tolerance ( so the terms must respect the position distance in the query). If I remember well I debugged that recently. Cheers On 16 February 2016 at 11:42, Modassar Ather <modather1...@gmail.com> wrote: > Actually you can get it with the edismax. > Just set mm to 100% and then configure a pf field ( or more) . > You are going to search all the search terms mandatory and boost phrases > match . > > @Alessandro Thanks for your insight. > I thought that the document will be boosted if all of the terms appear in > close proximity by setting pf. Not sure how much is meant by the close > proximity. Checked it on dismax query parser wiki too. > > Best, > Modassar > > On Tue, Feb 16, 2016 at 3:36 PM, Alessandro Benedetti < > abenede...@apache.org > > wrote: > > > Binoy, the omitTermFreqAndPositions is set only for text_ws which is used > > only on the "indexed_terms" field. > > The text_general fields seem fine to me. > > > > Are you omitting norms on purpose ? To be fair it could be relevant in > > title or short topic searches to boost up short field values, containing > a > > lot of terms from the searched query. > > > > To respond Modassar : > > > > I don't think the phrase will be searched as individual ANDed terms until > > > the query has it like below. > > > "Eating Disorders" OR (Eating AND Disorders). > > > > > > > Actually you can get it with the edismax. > > Just set mm to 100% and then configure a pf field ( or more) . > > You are going to search all the search terms mandatory and boost phrases > > match . > > > > Cheers > > > > On 16 February 2016 at 07:57, Emir Arnautovic < > > emir.arnauto...@sematext.com> > > wrote: > > > > > Hi Nitin, > > > You can use pf parameter to boost results with exact phrase. You can > also > > > use pf2 and pf3 to boost results with bigrams (phrase matches with 2 > or 3 > > > words in case input is with more than 3 words) > > > > > > Regards, > > > Emir > > > > > > > > > On 16.02.2016 06:18, Nitin.K wrote: > > > > > >> I am using edismax parser with the following query: > > >> > > >> > > >> > > > localhost:8983/solr/tgl/select?q=eating%20disorders=xml=1.0=200=AND=true=edismax=true=true=true=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6 > > >> > > >> Configuration of schema.xml > > >> > > >> > stored="true" > > >> /> > > >> > > >> > > >> > >> stored="true"/> > > >> stored="false"/> > > >> > > >> > >> multiValued="true"/> > > >> > >> multiValued="true"/> > > >> > > >> > >> multiValued="true"/> > > >> > >> multiValued="true"/> > > >> > > >> stored="true"/> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > >> positionIncrementGap="100" omitNorms="true"> > > >> > > >> > class="solr.StandardTokenizerFactory"/> > > >> > >> ignoreCase="true" > > >> words="stopwords.txt" /> > > >> > > >> > > >> > > >> > class="solr.StandardTokenizerFactory"/> > > >> > >> ignoreCase="true" > > >> words="stopwords.txt" /> > > >> > >> synonyms="synonyms.txt" > > >> ignoreCase="true" expand="true"/> > > >> > > >> > > >> > > >> > >> positionIncrementGap="100" > > >> omitTermFreqAndPositions="true" omitNorms="true"> > > >> > > >> > >> class="solr.WhitespaceTokenizerFactory"/> > >
Re: SOLR ranking
Actually you can get it with the edismax. Just set mm to 100% and then configure a pf field ( or more) . You are going to search all the search terms mandatory and boost phrases match . @Alessandro Thanks for your insight. I thought that the document will be boosted if all of the terms appear in close proximity by setting pf. Not sure how much is meant by the close proximity. Checked it on dismax query parser wiki too. Best, Modassar On Tue, Feb 16, 2016 at 3:36 PM, Alessandro Benedetti <abenede...@apache.org > wrote: > Binoy, the omitTermFreqAndPositions is set only for text_ws which is used > only on the "indexed_terms" field. > The text_general fields seem fine to me. > > Are you omitting norms on purpose ? To be fair it could be relevant in > title or short topic searches to boost up short field values, containing a > lot of terms from the searched query. > > To respond Modassar : > > I don't think the phrase will be searched as individual ANDed terms until > > the query has it like below. > > "Eating Disorders" OR (Eating AND Disorders). > > > > Actually you can get it with the edismax. > Just set mm to 100% and then configure a pf field ( or more) . > You are going to search all the search terms mandatory and boost phrases > match . > > Cheers > > On 16 February 2016 at 07:57, Emir Arnautovic < > emir.arnauto...@sematext.com> > wrote: > > > Hi Nitin, > > You can use pf parameter to boost results with exact phrase. You can also > > use pf2 and pf3 to boost results with bigrams (phrase matches with 2 or 3 > > words in case input is with more than 3 words) > > > > Regards, > > Emir > > > > > > On 16.02.2016 06:18, Nitin.K wrote: > > > >> I am using edismax parser with the following query: > >> > >> > >> > localhost:8983/solr/tgl/select?q=eating%20disorders=xml=1.0=200=AND=true=edismax=true=true=true=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6 > >> > >> Configuration of schema.xml > >> > >> stored="true" > >> /> > >> > >> > >> >> stored="true"/> > >> > >> > >> >> multiValued="true"/> > >> >> multiValued="true"/> > >> > >> >> multiValued="true"/> > >> >> multiValued="true"/> > >> > >> > >> > >> > >> > >> > >> > >> > >> >> positionIncrementGap="100" omitNorms="true"> > >> > >> class="solr.StandardTokenizerFactory"/> > >> >> ignoreCase="true" > >> words="stopwords.txt" /> > >> > >> > >> > >> class="solr.StandardTokenizerFactory"/> > >> >> ignoreCase="true" > >> words="stopwords.txt" /> > >> >> synonyms="synonyms.txt" > >> ignoreCase="true" expand="true"/> > >> > >> > >> > >> >> positionIncrementGap="100" > >> omitTermFreqAndPositions="true" omitNorms="true"> > >> > >> >> class="solr.WhitespaceTokenizerFactory"/> > >> >> ignoreCase="true" > >> words="stopwords.txt" /> > >> > >> > >> > >> > >> > >> I want , if user will search for a phrase then that pharse should always > >> takes the priority in comaprison to the individual words; > >> > >> Example: "Eating Disorders" > >> > >> First it will search for "Eating Disorders" together and then the > >> individual > >> words "Eating" and "Disorders" > >> but while searching for individual words, it will always return those > >> documents where both the words should exist for which i am already using > >> q.op="AND" in my query. > >> > >> Thanks, > >> Nitin > >> > >> > >> > >> > >> -- > >> View this message in context: > >> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > > > > -- > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >
Re: SOLR ranking
Hi Nitin, Not sure if you changed what fields you use for phrase boost, but in example you sent, all fields except content are "string" fields and content is boosted with 6 while topic_title in qf is boosted with 100. Try setting same field you use in qf in pf2 and you should see the difference. After that you can play with field analysis and which field to use just for boosting. Regards, Emir On 16.02.2016 11:30, Nitin.K wrote: Hi Emir, I tried using the boost parameters for phrase search by removing the omitTermFreqAndPositions from the multivalued field type but somehow while searching phrases; the documents that have exact match are not coming up in the order. Instead; in the content field, it is considering the mutual count of both the terms and based on that, its deciding the order. kindly let me know, how can i first search the phrase and then go to the individual words (i.e word-1 AND word-2) -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257556.html Sent from the Solr - User mailing list archive at Nabble.com. -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: SOLR ranking
Nithin, have you read my reply ? kindly let me know, how can i first search the phrase and then go to the > individual words (i.e word-1 AND word-2) > On 16 February 2016 at 10:45, Binoy Dalal <binoydala...@gmail.com> wrote: > Based on a quick look at the documentation, I think that you should use > termPositions=true to achieve what you want. > > On Tue, 16 Feb 2016, 16:08 Nitin.K <nitin.kanu...@adi-mps.com> wrote: > > > Hi Emir, > > > > I tried using the boost parameters for phrase search by removing the > > omitTermFreqAndPositions from the multivalued field type but somehow > while > > searching phrases; the documents that have exact match are not coming up > in > > the order. Instead; in the content field, it is considering the mutual > > count > > of both the terms and based on that, its deciding the order. > > > > kindly let me know, how can i first search the phrase and then go to the > > individual words (i.e word-1 AND word-2) > > > > > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257556.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- > Regards, > Binoy Dalal > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: SOLR ranking
Based on a quick look at the documentation, I think that you should use termPositions=true to achieve what you want. On Tue, 16 Feb 2016, 16:08 Nitin.K <nitin.kanu...@adi-mps.com> wrote: > Hi Emir, > > I tried using the boost parameters for phrase search by removing the > omitTermFreqAndPositions from the multivalued field type but somehow while > searching phrases; the documents that have exact match are not coming up in > the order. Instead; in the content field, it is considering the mutual > count > of both the terms and based on that, its deciding the order. > > kindly let me know, how can i first search the phrase and then go to the > individual words (i.e word-1 AND word-2) > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257556.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Regards, Binoy Dalal
Re: SOLR ranking
Hi Emir, I tried using the boost parameters for phrase search by removing the omitTermFreqAndPositions from the multivalued field type but somehow while searching phrases; the documents that have exact match are not coming up in the order. Instead; in the content field, it is considering the mutual count of both the terms and based on that, its deciding the order. kindly let me know, how can i first search the phrase and then go to the individual words (i.e word-1 AND word-2) -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257556.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR ranking
You are absolutely right Binoy..!! But my problem is; We don't want the term frequency to take into account for index term as well as drug. (i.e. Don't want to consider the no. of occurrences of search term for both of these fields.) Is it possible that i can omit the term frequency for these two fields and also indexed them with term positions for phrase search ?? I tried using omitTermFreqAndPositions="true" and omitPositions="false" but thats not working for me. Thanks, Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257551.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR ranking
@Nitin Why are you phrase boosting on string fields? More often than not, it won't do anything because the phrases simply won't match the entire string. On Tue, 16 Feb 2016, 15:36 Alessandro Benedetti <abenede...@apache.org> wrote: > Binoy, the omitTermFreqAndPositions is set only for text_ws which is used > only on the "indexed_terms" field. > The text_general fields seem fine to me. > > Are you omitting norms on purpose ? To be fair it could be relevant in > title or short topic searches to boost up short field values, containing a > lot of terms from the searched query. > > To respond Modassar : > > I don't think the phrase will be searched as individual ANDed terms until > > the query has it like below. > > "Eating Disorders" OR (Eating AND Disorders). > > > > Actually you can get it with the edismax. > Just set mm to 100% and then configure a pf field ( or more) . > You are going to search all the search terms mandatory and boost phrases > match . > > Cheers > > On 16 February 2016 at 07:57, Emir Arnautovic < > emir.arnauto...@sematext.com> > wrote: > > > Hi Nitin, > > You can use pf parameter to boost results with exact phrase. You can also > > use pf2 and pf3 to boost results with bigrams (phrase matches with 2 or 3 > > words in case input is with more than 3 words) > > > > Regards, > > Emir > > > > > > On 16.02.2016 06:18, Nitin.K wrote: > > > >> I am using edismax parser with the following query: > >> > >> > >> > localhost:8983/solr/tgl/select?q=eating%20disorders=xml=1.0=200=AND=true=edismax=true=true=true=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6 > >> > >> Configuration of schema.xml > >> > >> stored="true" > >> /> > >> > >> > >> >> stored="true"/> > >> > >> > >> >> multiValued="true"/> > >> >> multiValued="true"/> > >> > >> >> multiValued="true"/> > >> >> multiValued="true"/> > >> > >> > >> > >> > >> > >> > >> > >> > >> >> positionIncrementGap="100" omitNorms="true"> > >> > >> class="solr.StandardTokenizerFactory"/> > >> >> ignoreCase="true" > >> words="stopwords.txt" /> > >> > >> > >> > >> class="solr.StandardTokenizerFactory"/> > >> >> ignoreCase="true" > >> words="stopwords.txt" /> > >> >> synonyms="synonyms.txt" > >> ignoreCase="true" expand="true"/> > >> > >> > >> > >> >> positionIncrementGap="100" > >> omitTermFreqAndPositions="true" omitNorms="true"> > >> > >> >> class="solr.WhitespaceTokenizerFactory"/> > >> >> ignoreCase="true" > >> words="stopwords.txt" /> > >> > >> > >> > >> > >> > >> I want , if user will search for a phrase then that pharse should always > >> takes the priority in comaprison to the individual words; > >> > >> Example: "Eating Disorders" > >> > >> First it will search for "Eating Disorders" together and then the > >> individual > >> words "Eating" and "Disorders" > >> but while searching for individual words, it will always return those > >> documents where both the words should exist for which i am already using > >> q.op="AND" in my query. > >> > >> Thanks, > >> Nitin > >> > >> > >> > >> > >> -- > >> View this message in context: > >> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > > > > -- > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- Regards, Binoy Dalal
Re: SOLR ranking
Binoy, the omitTermFreqAndPositions is set only for text_ws which is used only on the "indexed_terms" field. The text_general fields seem fine to me. Are you omitting norms on purpose ? To be fair it could be relevant in title or short topic searches to boost up short field values, containing a lot of terms from the searched query. To respond Modassar : I don't think the phrase will be searched as individual ANDed terms until > the query has it like below. > "Eating Disorders" OR (Eating AND Disorders). > Actually you can get it with the edismax. Just set mm to 100% and then configure a pf field ( or more) . You are going to search all the search terms mandatory and boost phrases match . Cheers On 16 February 2016 at 07:57, Emir Arnautovic <emir.arnauto...@sematext.com> wrote: > Hi Nitin, > You can use pf parameter to boost results with exact phrase. You can also > use pf2 and pf3 to boost results with bigrams (phrase matches with 2 or 3 > words in case input is with more than 3 words) > > Regards, > Emir > > > On 16.02.2016 06:18, Nitin.K wrote: > >> I am using edismax parser with the following query: >> >> >> localhost:8983/solr/tgl/select?q=eating%20disorders=xml=1.0=200=AND=true=edismax=true=true=true=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6 >> >> Configuration of schema.xml >> >> > /> >> >> >> > stored="true"/> >> >> >> > multiValued="true"/> >> > multiValued="true"/> >> >> > multiValued="true"/> >> > multiValued="true"/> >> >> >> >> >> >> >> >> >> > positionIncrementGap="100" omitNorms="true"> >> >> >> > ignoreCase="true" >> words="stopwords.txt" /> >> >> >> >> >> > ignoreCase="true" >> words="stopwords.txt" /> >> > synonyms="synonyms.txt" >> ignoreCase="true" expand="true"/> >> >> >> >> > positionIncrementGap="100" >> omitTermFreqAndPositions="true" omitNorms="true"> >> >> > class="solr.WhitespaceTokenizerFactory"/> >> > ignoreCase="true" >> words="stopwords.txt" /> >> >> >> >> >> >> I want , if user will search for a phrase then that pharse should always >> takes the priority in comaprison to the individual words; >> >> Example: "Eating Disorders" >> >> First it will search for "Eating Disorders" together and then the >> individual >> words "Eating" and "Disorders" >> but while searching for individual words, it will always return those >> documents where both the words should exist for which i am already using >> q.op="AND" in my query. >> >> Thanks, >> Nitin >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: SOLR ranking
Hi Nitin, You can use pf parameter to boost results with exact phrase. You can also use pf2 and pf3 to boost results with bigrams (phrase matches with 2 or 3 words in case input is with more than 3 words) Regards, Emir On 16.02.2016 06:18, Nitin.K wrote: I am using edismax parser with the following query: localhost:8983/solr/tgl/select?q=eating%20disorders=xml=1.0=200=AND=true=edismax=true=true=true=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6 Configuration of schema.xml I want , if user will search for a phrase then that pharse should always takes the priority in comaprison to the individual words; Example: "Eating Disorders" First it will search for "Eating Disorders" together and then the individual words "Eating" and "Disorders" but while searching for individual words, it will always return those documents where both the words should exist for which i am already using q.op="AND" in my query. Thanks, Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html Sent from the Solr - User mailing list archive at Nabble.com. -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: SOLR ranking
Firstly to do phrase searching, you need to set omitTermFreqAndPositions=false. You've set this to true. This will require a reindex. Secondly it will be helpful to check the debug Query output and see how the query is parsed and searched. On Tue, 16 Feb 2016, 12:28 Modassar Ather <modather1...@gmail.com> wrote: > First it will search for "Eating Disorders" together and then the > individual > words "Eating" and "Disorders" > > I don't think the phrase will be searched as individual ANDed terms until > the query has it like below. > "Eating Disorders" OR (Eating AND Disorders). > > Best, > Modassar > > On Tue, Feb 16, 2016 at 10:48 AM, Nitin.K <nitin.kanu...@adi-mps.com> > wrote: > > > I am using edismax parser with the following query: > > > > > > > localhost:8983/solr/tgl/select?q=eating%20disorders=xml=1.0=200=AND=true=edismax=true=true=true=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6 > > > > Configuration of schema.xml > > > > stored="true" > > /> > > > > > > > stored="true"/> > > > > > > > multiValued="true"/> > > > multiValued="true"/> > > > > > multiValued="true"/> > > > multiValued="true"/> > > > > > > > > > > > > > > > > > > > positionIncrementGap="100" omitNorms="true"> > > > > class="solr.StandardTokenizerFactory"/> > > > ignoreCase="true" > > words="stopwords.txt" /> > > > > > > > > class="solr.StandardTokenizerFactory"/> > > > ignoreCase="true" > > words="stopwords.txt" /> > > > synonyms="synonyms.txt" > > ignoreCase="true" expand="true"/> > > > > > > > > positionIncrementGap="100" > > omitTermFreqAndPositions="true" omitNorms="true"> > > > > > class="solr.WhitespaceTokenizerFactory"/> > > > ignoreCase="true" > > words="stopwords.txt" /> > > > > > > > > > > > > I want , if user will search for a phrase then that pharse should always > > takes the priority in comaprison to the individual words; > > > > Example: "Eating Disorders" > > > > First it will search for "Eating Disorders" together and then the > > individual > > words "Eating" and "Disorders" > > but while searching for individual words, it will always return those > > documents where both the words should exist for which i am already using > > q.op="AND" in my query. > > > > Thanks, > > Nitin > > > > > > > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- Regards, Binoy Dalal
Re: SOLR ranking
First it will search for "Eating Disorders" together and then the individual words "Eating" and "Disorders" I don't think the phrase will be searched as individual ANDed terms until the query has it like below. "Eating Disorders" OR (Eating AND Disorders). Best, Modassar On Tue, Feb 16, 2016 at 10:48 AM, Nitin.K <nitin.kanu...@adi-mps.com> wrote: > I am using edismax parser with the following query: > > > localhost:8983/solr/tgl/select?q=eating%20disorders=xml=1.0=200=AND=true=edismax=true=true=true=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6 > > Configuration of schema.xml > > /> > > > stored="true"/> > > > multiValued="true"/> > multiValued="true"/> > > multiValued="true"/> > multiValued="true"/> > > > > > > > > > positionIncrementGap="100" omitNorms="true"> > > > ignoreCase="true" > words="stopwords.txt" /> > > > > > ignoreCase="true" > words="stopwords.txt" /> > synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > > > > omitTermFreqAndPositions="true" omitNorms="true"> > > class="solr.WhitespaceTokenizerFactory"/> > ignoreCase="true" > words="stopwords.txt" /> > > > > > > I want , if user will search for a phrase then that pharse should always > takes the priority in comaprison to the individual words; > > Example: "Eating Disorders" > > First it will search for "Eating Disorders" together and then the > individual > words "Eating" and "Disorders" > but while searching for individual words, it will always return those > documents where both the words should exist for which i am already using > q.op="AND" in my query. > > Thanks, > Nitin > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: SOLR ranking
I am using edismax parser with the following query: localhost:8983/solr/tgl/select?q=eating%20disorders=xml=1.0=200=AND=true=edismax=true=true=true=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6 Configuration of schema.xml I want , if user will search for a phrase then that pharse should always takes the priority in comaprison to the individual words; Example: "Eating Disorders" First it will search for "Eating Disorders" together and then the individual words "Eating" and "Disorders" but while searching for individual words, it will always return those documents where both the words should exist for which i am already using q.op="AND" in my query. Thanks, Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR ranking
You'll have to provide more information. How exactly do you want phrase search to work and how is it not working properly? On Tue, 16 Feb 2016, 00:08 Nitin.K <nitin.kanu...@adi-mps.com> wrote: > Thanks Binoy.. > > I have used the boost parameters and its working as expected. > I also need to give the priority to the phrase search. Kindly suggest on > this. > I am using edismax parser right now. > Using pf, pf2 and pf3 parameters but that too are not working properly. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257420.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Regards, Binoy Dalal
Re: SOLR ranking
Thanks Binoy.. I have used the boost parameters and its working as expected. I also need to give the priority to the phrase search. Kindly suggest on this. I am using edismax parser right now. Using pf, pf2 and pf3 parameters but that too are not working properly. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257420.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR ranking
I'm sorry, missed that part. It's true, you cannot sort on multivalued fields. The workaround will be pretty complex; you'll either have to find the max or min value of the fields at index time and store those in separate fields and use those to sort, or somehow come up with some function that can convert the values from your multivalued field into a single value (something like sum(field)) but it surely won't be trivial. Instead you should do what Emir's saying. Boost your fields at index or query time based on how you want to sort your documents. So in your case, give the highest boost to topic_title then a little lower to subtopic_title and so on. This should return your documents in the correct order. You will have to play around with the boost values a little to get them right, though. Alternatively, you could boost on the multivalued fields and then sort based on your single valued fields. Either ways, you'll have to experiment and see what works best for you. On Mon, Feb 15, 2016 at 8:21 PM Nitin.K <nitin.kanu...@adi-mps.com> wrote: > Thanks Binoy.. > > Actually it is throwing following error: > > can not sort on multivalued field: index_term > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257378.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Regards, Binoy Dalal
Re: SOLR ranking
Thanks Binoy.. Actually it is throwing following error: can not sort on multivalued field: index_term -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257378.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR ranking
Hi, Not sure how ordering will help (maybe missing question) but what seems to me that would help your case is simple boosting. See https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_make_.22superman.22_in_the_title_field_score_higher_than_in_the_subject_field Regards, Emir On 15.02.2016 14:14, Binoy Dalal wrote: Use the sort parameter with your query and pass the fields in the order in which you want to sort them. So if you want topic > subtopic > index > drug > content all ascending, your sort parameter will look like =topic asc,subtopic asc,index asc,drug asc,content asc On Mon, 15 Feb 2016, 18:17 Nitin.K <nitin.kanu...@adi-mps.com> wrote: I have five fields in SOLR topic_title subtopic_title index_terms - Multivalued drug - Multivalued content - Now, I want to rank the documents with all these fields; I want all those documents that are haivng the search term in topic_title will come first in the order then documents having search term in subtopic title and then so on. Example : If two documents are having search term in topic_title then the solr should look for subtopic_ title similarly if the search term is present in both topic_title and subtopic_title fields then it should look for index term and so on; to decide the ranking order - I dont want to consider the no. of occurrences in multivalued fields but if the two documents are having search term in topic_title, subtopic_title, index_term and drug then the documents should be ranked in the order of no. of occurrences inside the content field. Kindly help in this. I will be really thankful -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367.html Sent from the Solr - User mailing list archive at Nabble.com. -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: SOLR ranking
Use the sort parameter with your query and pass the fields in the order in which you want to sort them. So if you want topic > subtopic > index > drug > content all ascending, your sort parameter will look like =topic asc,subtopic asc,index asc,drug asc,content asc On Mon, 15 Feb 2016, 18:17 Nitin.K <nitin.kanu...@adi-mps.com> wrote: > I have five fields in SOLR > topic_title > subtopic_title > index_terms - Multivalued > drug - Multivalued > content > > - Now, I want to rank the documents with all these fields; I want all those > documents that are haivng the search term in topic_title will come first in > the order > then documents having search term in subtopic title and then so on. > > Example : If two documents are having search term in topic_title then the > solr should look for subtopic_ title similarly > if the search term is present in both topic_title and subtopic_title fields > then it should look for index term and so on; to decide the ranking order > > - I dont want to consider the no. of occurrences in multivalued fields but > if the two documents are having search term in topic_title, subtopic_title, > index_term and drug then the documents > should be ranked in the order of no. of occurrences inside the content > field. > > > Kindly help in this. I will be really thankful > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Regards, Binoy Dalal
SOLR ranking
I have five fields in SOLR topic_title subtopic_title index_terms - Multivalued drug - Multivalued content - Now, I want to rank the documents with all these fields; I want all those documents that are haivng the search term in topic_title will come first in the order then documents having search term in subtopic title and then so on. Example : If two documents are having search term in topic_title then the solr should look for subtopic_ title similarly if the search term is present in both topic_title and subtopic_title fields then it should look for index term and so on; to decide the ranking order - I dont want to consider the no. of occurrences in multivalued fields but if the two documents are having search term in topic_title, subtopic_title, index_term and drug then the documents should be ranked in the order of no. of occurrences inside the content field. Kindly help in this. I will be really thankful -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr ranking query..
Hi Chris, I think what you are looking for could be solved using the eDismax query parser. https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser 1. Your Query Fields ( qf ) would be - urlKeywords^60 title^40 fulltxt^1 2. To check on adultFlag:N you could use fq=adultFlag:N 3. For Lowest Domain Rank within the same group to rank higher you could use the boost parameter and use a recip ( http://wiki.apache.org/solr/FunctionQuery#recip ) function query to achieve this. Hope this works for you On Tue, Feb 4, 2014 at 12:19 PM, Chris christu...@gmail.com wrote: Hi, I have a document structure that looks like the below. I would like to implement something like - (urlKeywords:+keyword+ AND domainRank:[3 TO 1] AND adultFlag:N)^60 + OR (title:+keyword+ AND domainRank:[3 TO 1] AND adultFlag:N)^20 + OR (title:+keyword+ AND domainRank:[10001 TO *] AND adultFlag:N)^2 + OR (fulltxt:+keyword+) ); In case we have multiple words in keywords - A B C D then for the documents that have all the words should rank highest (Group1), then 3 words(Group2), then 2 words(Group 3) etc AND - Within each group (Group1, 2, 3) I would want the ones with the lowest domain rank value to rank higher (but within the group) How can i do this in a single query? and please advice on the fastest way possible, (open to implementing fq other techniques to speed it up) Please advice. Document Structure in XML - doc str name=subDomainwww/str str name=domainncoah.com/str str name=path/links.html/str str name=urlFullhttp://www.ncoah.com/links.html/str str name=titleNorth Carolina Office of Administrative Hearings - Links/str arr name=text strNorth Carolina Office of Administrative Hearings - Links/str /arr str name=relatedLinks - a href=http://www.ncoah.com/links.html; title=HearingsHearings/a - a href=http://www.ncoah.com/links.html; title=RulesRules/a - a href=http://www.ncoah.com/links.html; title=Civil RightsCivil Rights/a - a href=http://www.ncoah.com/links.html; title=WelcomeWelcome/a - a href=http://www.ncoah.com/links.html; title=General InformationGeneral Information/a - a href=http://www.ncoah.com/links.html; title=Directions to OAHDirections to OAH/a - a href=http://www.ncoah.com/links.html; title=Establishment of OAHEstablishment of OAH/a - a href=http://www.ncoah.com/links.html; title=G.S. 150BG.S. 150B/a - a href=http://www.ncoah.com/links.html; title=FormsForms/a - a href=http://www.ncoah.com/links.html; title=LinksLinks/a - a href=http://www.nc.gov/; title=Visit the North Carolina State web portalVisit the North Carolina State web portal/a - a href=http://ncinfo.iog.unc.edu/library/counties.html; title=North Carolina CountiesNorth Carolina Counties/a - a href=http://ncinfo.iog.unc.edu/library/cities.html; title=North Carolina Cities TownsNorth Carolina Cities Towns/a - a href=http://www.nccourts.org/; title=Administrative Office of the CourtsAdministrative Office of the Courts/a - a href=http://www.ncleg.net/; title=North Carolina General AssemblyNorth Carolina General Assembly/a - a href=http://www.doa.state.nc.us/; title=Department of AdministrationDepartment of Administration/a - a href=http://www.ncagr.com/; title=Department of AgricultureDepartment of Agriculture/a - a href=http://www.nccommerce.com; title=Department of CommerceDepartment of Commerce/a - a href=http://www.doc.state.nc.us/; title=Department of CorrectionDepartment of Correction/a - a href=http://www.nccrimecontrol.org/; title=Department of Crime Control Public SafetyDepartment of Crime Control Public Safety/a - a href=http://www.ncdcr.gov/; title=Department of Cultural ResourcesDepartment of Cultural Resources/a - a href=http://www.ncdenr.gov/; title=Department of Environment and Natural ResourcesDepartment of Environment and Natural Resources/a - a href=http://www.dhhs.state.nc.us; title=Department of Health and Human ServicesDepartment of Health and Human Services/a - a href=http://www.ncdoi.com/; title=Department of InsuranceDepartment of Insurance/a - a href=http://www.ncdoj.com/; title=Department of JusticeDepartment of Justice/a - a href=http://www.juvjus.state.nc.us/; title=Department of Juvenile Justice and Delinquency PreventionDepartment of Juvenile Justice and Delinquency Prevention/a - a href=http://www.nclabor.com/; title=Department of LaborDepartment of Labor/a - a href=http://www.dpi.state.nc.us/; title=Department of Public InstructionDepartment of Public Instruction/a - a href=http://www.dor.state.nc.us/; title=Department of RevenueDepartment of Revenue/a - a href=http://www.treasurer.state.nc.us/; title=Department of State TreasurerDepartment of State Treasurer/a - a href=http://www.ncdot.org/; title=Department of TransportationDepartment of Transportation/a - a href=http://www.secstate.state.nc.us/; title=Department of the
Re: Solr ranking query..
Dear Varun, Thank you for your replies, I managed to get point 1 2 done, but for the boost query, I am unable to figure it out. Could you be kind enough to point me to an example or maybe advice a bit more on that one? Thanks for your help, Chris On Tue, Feb 4, 2014 at 3:14 PM, Varun Thacker varunthacker1...@gmail.comwrote: Hi Chris, I think what you are looking for could be solved using the eDismax query parser. https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser 1. Your Query Fields ( qf ) would be - urlKeywords^60 title^40 fulltxt^1 2. To check on adultFlag:N you could use fq=adultFlag:N 3. For Lowest Domain Rank within the same group to rank higher you could use the boost parameter and use a recip ( http://wiki.apache.org/solr/FunctionQuery#recip ) function query to achieve this. Hope this works for you On Tue, Feb 4, 2014 at 12:19 PM, Chris christu...@gmail.com wrote: Hi, I have a document structure that looks like the below. I would like to implement something like - (urlKeywords:+keyword+ AND domainRank:[3 TO 1] AND adultFlag:N)^60 + OR (title:+keyword+ AND domainRank:[3 TO 1] AND adultFlag:N)^20 + OR (title:+keyword+ AND domainRank:[10001 TO *] AND adultFlag:N)^2 + OR (fulltxt:+keyword+) ); In case we have multiple words in keywords - A B C D then for the documents that have all the words should rank highest (Group1), then 3 words(Group2), then 2 words(Group 3) etc AND - Within each group (Group1, 2, 3) I would want the ones with the lowest domain rank value to rank higher (but within the group) How can i do this in a single query? and please advice on the fastest way possible, (open to implementing fq other techniques to speed it up) Please advice. Document Structure in XML - doc str name=subDomainwww/str str name=domainncoah.com/str str name=path/links.html/str str name=urlFullhttp://www.ncoah.com/links.html/str str name=titleNorth Carolina Office of Administrative Hearings - Links/str arr name=text strNorth Carolina Office of Administrative Hearings - Links/str /arr str name=relatedLinks - a href=http://www.ncoah.com/links.html; title=HearingsHearings/a - a href=http://www.ncoah.com/links.html; title=RulesRules/a - a href=http://www.ncoah.com/links.html; title=Civil RightsCivil Rights/a - a href=http://www.ncoah.com/links.html; title=WelcomeWelcome/a - a href=http://www.ncoah.com/links.html; title=General InformationGeneral Information/a - a href=http://www.ncoah.com/links.html; title=Directions to OAHDirections to OAH/a - a href=http://www.ncoah.com/links.html; title=Establishment of OAHEstablishment of OAH/a - a href=http://www.ncoah.com/links.html; title=G.S. 150BG.S. 150B/a - a href=http://www.ncoah.com/links.html; title=FormsForms/a - a href=http://www.ncoah.com/links.html; title=LinksLinks/a - a href=http://www.nc.gov/; title=Visit the North Carolina State web portalVisit the North Carolina State web portal/a - a href=http://ncinfo.iog.unc.edu/library/counties.html; title=North Carolina CountiesNorth Carolina Counties/a - a href=http://ncinfo.iog.unc.edu/library/cities.html; title=North Carolina Cities TownsNorth Carolina Cities Towns/a - a href=http://www.nccourts.org/; title=Administrative Office of the CourtsAdministrative Office of the Courts/a - a href=http://www.ncleg.net/; title=North Carolina General AssemblyNorth Carolina General Assembly/a - a href=http://www.doa.state.nc.us/; title=Department of AdministrationDepartment of Administration/a - a href=http://www.ncagr.com/; title=Department of AgricultureDepartment of Agriculture/a - a href=http://www.nccommerce.com; title=Department of CommerceDepartment of Commerce/a - a href=http://www.doc.state.nc.us/; title=Department of CorrectionDepartment of Correction/a - a href=http://www.nccrimecontrol.org/; title=Department of Crime Control Public SafetyDepartment of Crime Control Public Safety/a - a href=http://www.ncdcr.gov/; title=Department of Cultural ResourcesDepartment of Cultural Resources/a - a href=http://www.ncdenr.gov/; title=Department of Environment and Natural ResourcesDepartment of Environment and Natural Resources/a - a href=http://www.dhhs.state.nc.us; title=Department of Health and Human ServicesDepartment of Health and Human Services/a - a href=http://www.ncdoi.com/; title=Department of InsuranceDepartment of Insurance/a - a href=http://www.ncdoj.com/; title=Department of JusticeDepartment of Justice/a - a href=http://www.juvjus.state.nc.us/; title=Department of Juvenile Justice and Delinquency PreventionDepartment of Juvenile Justice and Delinquency Prevention/a - a href=http://www.nclabor.com/; title=Department of LaborDepartment of Labor/a - a href=http://www.dpi.state.nc.us/; title=Department of
Re: Solr ranking query..
Hi Chris, An example for point 3 could be - boost=recip(field(domainRank),0.1,1,1) http://wiki.apache.org/solr/FunctionQuery#recip recip(x,m,a,b) implementing a/(m*x+b). m,a,b are constants, x is any numeric field or arbitrarily complex function. So with these values when domainRank is 1 it will multiply by10, when domain rank is 10 it will multiplied by 1 and so on. You could choose better values of a,b and m to suit your data On Tue, Feb 4, 2014 at 9:04 PM, Chris christu...@gmail.com wrote: Dear Varun, Thank you for your replies, I managed to get point 1 2 done, but for the boost query, I am unable to figure it out. Could you be kind enough to point me to an example or maybe advice a bit more on that one? Thanks for your help, Chris On Tue, Feb 4, 2014 at 3:14 PM, Varun Thacker varunthacker1...@gmail.com wrote: Hi Chris, I think what you are looking for could be solved using the eDismax query parser. https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser 1. Your Query Fields ( qf ) would be - urlKeywords^60 title^40 fulltxt^1 2. To check on adultFlag:N you could use fq=adultFlag:N 3. For Lowest Domain Rank within the same group to rank higher you could use the boost parameter and use a recip ( http://wiki.apache.org/solr/FunctionQuery#recip ) function query to achieve this. Hope this works for you On Tue, Feb 4, 2014 at 12:19 PM, Chris christu...@gmail.com wrote: Hi, I have a document structure that looks like the below. I would like to implement something like - (urlKeywords:+keyword+ AND domainRank:[3 TO 1] AND adultFlag:N)^60 + OR (title:+keyword+ AND domainRank:[3 TO 1] AND adultFlag:N)^20 + OR (title:+keyword+ AND domainRank:[10001 TO *] AND adultFlag:N)^2 + OR (fulltxt:+keyword+) ); In case we have multiple words in keywords - A B C D then for the documents that have all the words should rank highest (Group1), then 3 words(Group2), then 2 words(Group 3) etc AND - Within each group (Group1, 2, 3) I would want the ones with the lowest domain rank value to rank higher (but within the group) How can i do this in a single query? and please advice on the fastest way possible, (open to implementing fq other techniques to speed it up) Please advice. Document Structure in XML - doc str name=subDomainwww/str str name=domainncoah.com/str str name=path/links.html/str str name=urlFullhttp://www.ncoah.com/links.html/str str name=titleNorth Carolina Office of Administrative Hearings - Links/str arr name=text strNorth Carolina Office of Administrative Hearings - Links/str /arr str name=relatedLinks - a href=http://www.ncoah.com/links.html; title=HearingsHearings/a - a href=http://www.ncoah.com/links.html; title=RulesRules/a - a href=http://www.ncoah.com/links.html; title=Civil RightsCivil Rights/a - a href=http://www.ncoah.com/links.html; title=WelcomeWelcome/a - a href=http://www.ncoah.com/links.html; title=General InformationGeneral Information/a - a href=http://www.ncoah.com/links.html; title=Directions to OAHDirections to OAH/a - a href=http://www.ncoah.com/links.html; title=Establishment of OAHEstablishment of OAH/a - a href=http://www.ncoah.com/links.html; title=G.S. 150BG.S. 150B/a - a href=http://www.ncoah.com/links.html; title=FormsForms/a - a href=http://www.ncoah.com/links.html; title=LinksLinks/a - a href=http://www.nc.gov/; title=Visit the North Carolina State web portalVisit the North Carolina State web portal/a - a href=http://ncinfo.iog.unc.edu/library/counties.html; title=North Carolina CountiesNorth Carolina Counties/a - a href=http://ncinfo.iog.unc.edu/library/cities.html; title=North Carolina Cities TownsNorth Carolina Cities Towns/a - a href=http://www.nccourts.org/; title=Administrative Office of the CourtsAdministrative Office of the Courts/a - a href=http://www.ncleg.net/; title=North Carolina General AssemblyNorth Carolina General Assembly/a - a href=http://www.doa.state.nc.us/; title=Department of AdministrationDepartment of Administration/a - a href=http://www.ncagr.com/; title=Department of AgricultureDepartment of Agriculture/a - a href=http://www.nccommerce.com; title=Department of CommerceDepartment of Commerce/a - a href=http://www.doc.state.nc.us/; title=Department of CorrectionDepartment of Correction/a - a href=http://www.nccrimecontrol.org/; title=Department of Crime Control Public SafetyDepartment of Crime Control Public Safety/a - a href=http://www.ncdcr.gov/; title=Department of Cultural ResourcesDepartment of Cultural Resources/a - a href=http://www.ncdenr.gov/; title=Department of Environment and Natural ResourcesDepartment of Environment and Natural Resources/a - a
Solr ranking query..
Hi, I have a document structure that looks like the below. I would like to implement something like - (urlKeywords:+keyword+ AND domainRank:[3 TO 1] AND adultFlag:N)^60 + OR (title:+keyword+ AND domainRank:[3 TO 1] AND adultFlag:N)^20 + OR (title:+keyword+ AND domainRank:[10001 TO *] AND adultFlag:N)^2 + OR (fulltxt:+keyword+) ); In case we have multiple words in keywords - A B C D then for the documents that have all the words should rank highest (Group1), then 3 words(Group2), then 2 words(Group 3) etc AND - Within each group (Group1, 2, 3) I would want the ones with the lowest domain rank value to rank higher (but within the group) How can i do this in a single query? and please advice on the fastest way possible, (open to implementing fq other techniques to speed it up) Please advice. Document Structure in XML - doc str name=subDomainwww/str str name=domainncoah.com/str str name=path/links.html/str str name=urlFullhttp://www.ncoah.com/links.html/str str name=titleNorth Carolina Office of Administrative Hearings - Links/str arr name=text strNorth Carolina Office of Administrative Hearings - Links/str /arr str name=relatedLinks - a href=http://www.ncoah.com/links.html; title=HearingsHearings/a - a href=http://www.ncoah.com/links.html; title=RulesRules/a - a href=http://www.ncoah.com/links.html; title=Civil RightsCivil Rights/a - a href=http://www.ncoah.com/links.html; title=WelcomeWelcome/a - a href=http://www.ncoah.com/links.html; title=General InformationGeneral Information/a - a href=http://www.ncoah.com/links.html; title=Directions to OAHDirections to OAH/a - a href=http://www.ncoah.com/links.html; title=Establishment of OAHEstablishment of OAH/a - a href=http://www.ncoah.com/links.html; title=G.S. 150BG.S. 150B/a - a href=http://www.ncoah.com/links.html; title=FormsForms/a - a href=http://www.ncoah.com/links.html; title=LinksLinks/a - a href=http://www.nc.gov/; title=Visit the North Carolina State web portalVisit the North Carolina State web portal/a - a href=http://ncinfo.iog.unc.edu/library/counties.html; title=North Carolina CountiesNorth Carolina Counties/a - a href=http://ncinfo.iog.unc.edu/library/cities.html; title=North Carolina Cities TownsNorth Carolina Cities Towns/a - a href=http://www.nccourts.org/; title=Administrative Office of the CourtsAdministrative Office of the Courts/a - a href=http://www.ncleg.net/; title=North Carolina General AssemblyNorth Carolina General Assembly/a - a href=http://www.doa.state.nc.us/; title=Department of AdministrationDepartment of Administration/a - a href=http://www.ncagr.com/; title=Department of AgricultureDepartment of Agriculture/a - a href=http://www.nccommerce.com; title=Department of CommerceDepartment of Commerce/a - a href=http://www.doc.state.nc.us/; title=Department of CorrectionDepartment of Correction/a - a href=http://www.nccrimecontrol.org/; title=Department of Crime Control Public SafetyDepartment of Crime Control Public Safety/a - a href=http://www.ncdcr.gov/; title=Department of Cultural ResourcesDepartment of Cultural Resources/a - a href=http://www.ncdenr.gov/; title=Department of Environment and Natural ResourcesDepartment of Environment and Natural Resources/a - a href=http://www.dhhs.state.nc.us; title=Department of Health and Human ServicesDepartment of Health and Human Services/a - a href=http://www.ncdoi.com/; title=Department of InsuranceDepartment of Insurance/a - a href=http://www.ncdoj.com/; title=Department of JusticeDepartment of Justice/a - a href=http://www.juvjus.state.nc.us/; title=Department of Juvenile Justice and Delinquency PreventionDepartment of Juvenile Justice and Delinquency Prevention/a - a href=http://www.nclabor.com/; title=Department of LaborDepartment of Labor/a - a href=http://www.dpi.state.nc.us/; title=Department of Public InstructionDepartment of Public Instruction/a - a href=http://www.dor.state.nc.us/; title=Department of RevenueDepartment of Revenue/a - a href=http://www.treasurer.state.nc.us/; title=Department of State TreasurerDepartment of State Treasurer/a - a href=http://www.ncdot.org/; title=Department of TransportationDepartment of Transportation/a - a href=http://www.secstate.state.nc.us/; title=Department of the Secretary of StateDepartment of the Secretary of State/a - a href=http://www.osp.state.nc.us/; title=Office of State PersonnelOffice of State Personnel/a - a href=http://www.governor.state.nc.us/; title=Office of the GovernorOffice of the Governor/a - a href=http://www.ltgov.state.nc.us/; title=Office of the Lt. GovernorOffice of the Lt. Governor/a - a href=http://www.ncauditor.net/; title=Office of the State AuditorOffice of the State Auditor/a - a href=http://www.osc.nc.gov/; title=Office of the State ControllerOffice of the State Controller/a - a href=http://www.ncbar.org/; title=North Carolina Bar AssociationNorth Carolina Bar Association/a - a
Re: Confused by Solr Ranking
I kind of suspected stemming to be the reason behind this. But I consider stemming to be a good feature. This is the side effect of stemming. Stemming increases recall while harming precision. This is a side effect of stemming, the way it is currently implemented in Lucene. Stemming could theoretically increase recall without hurting precision or relevancy. One way to do this would be to always store the original token, along with the stemmed token. Then, at scoring time, give a boost to matches which are closer to the original form. -- Avi
Re: Confused by Solr Ranking
On 09.03.2010 16:01 Ahmet Arslan wrote: I kind of suspected stemming to be the reason behind this. But I consider stemming to be a good feature. This is the side effect of stemming. Stemming increases recall while harming precision. But most people want the best possible combination of both, something like: (raw_field:word OR stemmed_field:word^0.5) and it is nice that Solr allows such arrangements but it would be even nicer to have some sort of automatic take this field, transform the contents in a couple of ways and do some boosting in the order given. At least this would be my wish for the recent question about the one feature I would like to see. Or even better, allow not only a hierarchy of transformations but also a hierarchy of fields (like in dismax, but with the full power of the standard request handler) -Michael
Re: Confused by Solr Ranking
Well, that's a matter of opinion, isn't it? If *your* application requires this, you could always copy the field to a non-stemmed field and apply boosts... Erick On Tue, Mar 9, 2010 at 9:21 AM, abhishes abhis...@gmail.com wrote: I kind of suspected stemming to be the reason behind this. But I consider stemming to be a good feature. The point is that if an exact match exists, then solr should report that first and then stemmed results should be reported. disabling stemming altogether would be a step in the wrong direction. Shalin Shekhar Mangar wrote: On Tue, Mar 9, 2010 at 4:38 PM, abhishes abhis...@gmail.com wrote: I am indexing a column in a database. I have chosen field type of text for this column (this type was defined in the sample schema file which comes in the Solr Example). When I search for the word impress and top 3 results. I get these 3 documents str name=TEXTbare desire pronounce villainy draught beasts blockish impression acquit/str str name=TEXTbare impression villainy pronounce beasts desire blockish draught acquit/str str name=TEXTbeasts desire villainy pronounce bare acquit impression draught blockish/str But here the TEXT doesn't really contain the word impress it contains the word impression Now the database does contain a few rows where the word impress is there, but those rows do not come in top 3 results. So my question is that why did the rows containing the word impression got ranked higher than the rows containing the word impress when I searched for impress? The text type is configured to do stemming on the input. So I'm guessing that impression and impress both stem to the same form. You can remove the EnglishPorterFilterFactory from the text type if you don't need stemming. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://old.nabble.com/Confused-by-Solr-Ranking-tp27834227p27836299.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Confused by Solr Ranking
I kind of suspected stemming to be the reason behind this. But I consider stemming to be a good feature. The point is that if an exact match exists, then solr should report that first and then stemmed results should be reported. disabling stemming altogether would be a step in the wrong direction. Shalin Shekhar Mangar wrote: On Tue, Mar 9, 2010 at 4:38 PM, abhishes abhis...@gmail.com wrote: I am indexing a column in a database. I have chosen field type of text for this column (this type was defined in the sample schema file which comes in the Solr Example). When I search for the word impress and top 3 results. I get these 3 documents str name=TEXTbare desire pronounce villainy draught beasts blockish impression acquit/str str name=TEXTbare impression villainy pronounce beasts desire blockish draught acquit/str str name=TEXTbeasts desire villainy pronounce bare acquit impression draught blockish/str But here the TEXT doesn't really contain the word impress it contains the word impression Now the database does contain a few rows where the word impress is there, but those rows do not come in top 3 results. So my question is that why did the rows containing the word impression got ranked higher than the rows containing the word impress when I searched for impress? The text type is configured to do stemming on the input. So I'm guessing that impression and impress both stem to the same form. You can remove the EnglishPorterFilterFactory from the text type if you don't need stemming. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://old.nabble.com/Confused-by-Solr-Ranking-tp27834227p27836299.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Confused by Solr Ranking
I kind of suspected stemming to be the reason behind this. But I consider stemming to be a good feature. This is the side effect of stemming. Stemming increases recall while harming precision.
Re: Confused by Solr Ranking
On Tue, Mar 9, 2010 at 4:38 PM, abhishes abhis...@gmail.com wrote: I am indexing a column in a database. I have chosen field type of text for this column (this type was defined in the sample schema file which comes in the Solr Example). When I search for the word impress and top 3 results. I get these 3 documents str name=TEXTbare desire pronounce villainy draught beasts blockish impression acquit/str str name=TEXTbare impression villainy pronounce beasts desire blockish draught acquit/str str name=TEXTbeasts desire villainy pronounce bare acquit impression draught blockish/str But here the TEXT doesn't really contain the word impress it contains the word impression Now the database does contain a few rows where the word impress is there, but those rows do not come in top 3 results. So my question is that why did the rows containing the word impression got ranked higher than the rows containing the word impress when I searched for impress? The text type is configured to do stemming on the input. So I'm guessing that impression and impress both stem to the same form. You can remove the EnglishPorterFilterFactory from the text type if you don't need stemming. -- Regards, Shalin Shekhar Mangar.
Re: Question regarding Solr ranking
: I am not really clear to what the analysis mode is supposed to give me. It : requires me to specify a field when I specify a query. What does that do? : Also, I don't see anything in the analyzer to explain the weighting of a : particular document. i think what Otis ment is that the analysis tool would help you verify that your Analyzers are doing what you expect them to be doing. If you try that with your locRvwText and the text you are asking about you would see that RemoveDuplicatesTokenFilterFactory does not make it the same as a single instance of Pizza ... per the docs... Filters out any tokens which are at the same logical position in the tokenstream as a previous token with the same text. ... http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-b05ef0377d71df53b47b9dd9cc28c26d95097a0b so it isn't removing any tokens in your situation because they do not existing in the same logical position. -Hoss
Re: Question regarding Solr ranking
Otis Gospodnetic wrote: It's a little hard to read that message, but if I were you I'd go to the Solr admin page, analysis section, enter your query, and see what index and query time analyzers spit out. I think that should at least give you some hints. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch I am not really clear to what the analysis mode is supposed to give me. It requires me to specify a field when I specify a query. What does that do? Also, I don't see anything in the analyzer to explain the weighting of a particular document. Regardless, what I have it narrowed down to is that my locRvwText (defined as multiple value text field) and it has a field that looks like this: Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... . Solr is counting this as 20 hits, but I was under the impression that the RemoveDuplicatesTokenFilterFactory should filter this result to have it count as just 1 hit. Am I understanding was RemoveDuplicatesTokenFilterFactory does incorrectly? -- View this message in context: http://www.nabble.com/Question-regarding-Solr-ranking-tp15719752p15768743.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question regarding Solr ranking
It's a little hard to read that message, but if I were you I'd go to the Solr admin page, analysis section, enter your query, and see what index and query time analyzers spit out. I think that should at least give you some hints. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: oleg_gnatovskiy [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, February 27, 2008 1:33:44 PM Subject: Re: Question regarding Solr ranking Sorry about the previous message, I had some formatting issues. Below is the actual message! oleg_gnatovskiy wrote: Hello everyone. I've run into a weird problem with Solr's ranking engine. In a nutshell, the problem involves certain results getting EXTREMELY high rank scores. Here is an example: locRvwText:Pizza Pizza^10 OR locName:Pizza Pizza^30 The way I understand it is that the locName part of the query should be boosted 3x more then the locRvwText. However, when running this query the first result is: 10.8226 Johnnie's New York Pizzeria Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... 10.8226 = (MATCH) product of: 21.6452 = (MATCH) sum of: 21.6452 = weight(locRvwText:pizza pizza^10.0 in 3792465), product of: 0.3354544 = queryWeight(locRvwText:pizza pizza^10.0), product of: 10.0 = boost 14.428232 = idf(locRvwText: pizza=8156 pizza=8156) 0.0023249863 = queryNorm 64.52502 = fieldWeight(locRvwText:pizza pizza in 3792465), product of: 4.472136 = tf(phraseFreq=20.0) 14.428232 = idf(locRvwText: pizza=8156 pizza=8156) 1.0 = fieldNorm(field=locRvwText, doc=3792465) 0.5 = coord(1/2) How come the phrase frequency for rvwText comes back as 20? The field rvwText is defined in the following way: required=false multiValued=true omitNorms=true/ And my text fields are defined in the following way: ignoreCase=true expand=true/ words=stopwords.txt/ generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ protected=protwords.txt/ ignoreCase=true expand=true/ words=stopwords.txt/ generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ protected=protwords.txt/ Forgive me if I am wrong, but shouldn't the RemoveDuplicatesTokenFilterFactory have the string Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Count as simplu one Pizza? I'd appreciate any help I can get! Thanks! -- View this message in context: http://www.nabble.com/Question-regarding-Solr-ranking-tp15719752p15719834.html Sent from the Solr - User mailing list archive at Nabble.com.
Question regarding Solr ranking
Hello everyone. I've run into a weird problem with Solr's ranking engine. In a nutshell, the problem involves certain results getting EXTREMELY high rank scores. Here is an example: locRvwText:Pizza Pizza^10 OR locName:Pizza Pizza^30 The way I understand it is that the locName part of the query should be boosted 3x more then the locRvwText. However, when running this query the first result is: 10.8226 Johnnie's New York Pizzeria Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... − 10.8226 = (MATCH) product of: 21.6452 = (MATCH) sum of: 21.6452 = weight(locRvwText:pizza pizza^10.0 in 3792465), product of: 0.3354544 = queryWeight(locRvwText:pizza pizza^10.0), product of: 10.0 = boost 14.428232 = idf(locRvwText: pizza=8156 pizza=8156) 0.0023249863 = queryNorm 64.52502 = fieldWeight(locRvwText:pizza pizza in 3792465), product of: 4.472136 = tf(phraseFreq=20.0) 14.428232 = idf(locRvwText: pizza=8156 pizza=8156) 1.0 = fieldNorm(field=locRvwText, doc=3792465) 0.5 = coord(1/2) How come the phrase frequency for rvwText comes back as 20? The field rvwText is defined in the following way: And my text fields are defined in the following way: Forgive me if I am wrong, but shouldn't the RemoveDuplicatesTokenFilterFactory have the string Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Count as simplu one Pizza? I'd appreciate any help I can get! Thanks! -- View this message in context: http://www.nabble.com/Question-regarding-Solr-ranking-tp15719752p15719752.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question regarding Solr ranking
Sorry about the previous message, I had some formatting issues. Below is the actual message! oleg_gnatovskiy wrote: Hello everyone. I've run into a weird problem with Solr's ranking engine. In a nutshell, the problem involves certain results getting EXTREMELY high rank scores. Here is an example: locRvwText:Pizza Pizza^10 OR locName:Pizza Pizza^30 The way I understand it is that the locName part of the query should be boosted 3x more then the locRvwText. However, when running this query the first result is: float name=score10.8226/float str name=locNameJohnnie's New York Pizzeria/str arr name=locRvwText str Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... /str /arr lst name=explain str name=id=157789,internal_docid=3792465 10.8226 = (MATCH) product of: 21.6452 = (MATCH) sum of: 21.6452 = weight(locRvwText:pizza pizza^10.0 in 3792465), product of: 0.3354544 = queryWeight(locRvwText:pizza pizza^10.0), product of: 10.0 = boost 14.428232 = idf(locRvwText: pizza=8156 pizza=8156) 0.0023249863 = queryNorm 64.52502 = fieldWeight(locRvwText:pizza pizza in 3792465), product of: 4.472136 = tf(phraseFreq=20.0) 14.428232 = idf(locRvwText: pizza=8156 pizza=8156) 1.0 = fieldNorm(field=locRvwText, doc=3792465) 0.5 = coord(1/2) /str /lst How come the phrase frequency for rvwText comes back as 20? The field rvwText is defined in the following way: field name=locRvwText type=text index=false stored=true required=false multiValued=true omitNorms=true/ And my text fields are defined in the following way: fieldtype name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time -- filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldtype Forgive me if I am wrong, but shouldn't the RemoveDuplicatesTokenFilterFactory have the string Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Count as simplu one Pizza?br I'd appreciate any help I can get! Thanks! -- View this message in context: http://www.nabble.com/Question-regarding-Solr-ranking-tp15719752p15719834.html Sent from the Solr - User mailing list archive at Nabble.com.