StopWords behavior with phrases
Hi, We make query to solr as below *q="market and cloud" OR (market and cloud)&q.op=AND&deftype=edismax* Our intent to look for results with both phrase match and AND query together where solr itself takes care of relevancy. But due to presence of stopword in phrase query a gap is left which gives different results as against a keyword "market cloud". "parsedquery_toString":"+(+(content:\"market ? cloud\" | search_field:\"market ? cloud\"))", There are suggestion that for phrase query create a separate field with no stopword,But then we'll not be able to achieve both phrase and AND in a single request. Is there anyway ? can be removed from phrase or any suggestion for our requirement. Please suggest Regards Ashish -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Spellcheck Collations Phrase based instead of AND
Hi, For a sample collation during spellcheck. "collation",{ "collationQuery":"smart connected factory", "hits":109, "misspellingsAndCorrections":[ "smart","smart", "connected","connected", "fator","factory"]}, "collation",{ "collationQuery":"smart connected faster", "hits":325, "misspellingsAndCorrections":[ "smart","smart", "connected","connected", "fator","faster"]}, "collation",{ "collationQuery":"sparc connected factory", "hits":14, "misspellingsAndCorrections":[ "smart","sparc", "connected","connected", "fator","factory"]}, The hits in the collationQuery are based on AND between the keyword . Is it possible to get the collations sorted based on phrase instead of AND Regards Ashish -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Stopwords param of edismax parser not working
Hi, We are trying to remove stopwords from analysis using edismax parser parameter.The documentation says *stopwords A Boolean parameter indicating if the StopFilterFactory configured in the query analyzer should be respected when parsing the query. If this is set to false, then the StopFilterFactory in the query analyzer is ignored.* https://lucene.apache.org/solr/guide/7_3/the-extended-dismax-query-parser.html But seems like its not working. http://Box-1:8983/solr/SalesCentralDev_4/select?q=internet of things&rows=0&defType=edismax&qf=search_field content*&stopwords=false*&debug=true "parsedquery":"+(DisjunctionMaxQuery((content:internet | search_field:internet)) DisjunctionMaxQuery((content:thing | search_field:thing)))", * "parsedquery_toString":"+((content:internet | search_field:internet) (content:thing | search_field:thing))",* Are we missing something here? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Spellchecker -File based vs Index based
Spellcheck configuration is default one.. solr.FileBasedSpellChecker file spellings.txt UTF-8 ./spellcheckerFile default jkdefault file on true 10 5 5 true 10 true 10 5 Also the words are present in the file..For e.g things word which is corrected is present inside file.Also the suggestions related to it are present. I don't want suggestions for right word (of,things)..Any problem with request .Tried two combinations. 1./spell?spellcheck.q=intnet of things&spellcheck=true&spellcheck.collateParam.q.op=AND&df=spellcontent&spellcheck.dictionary=file 2.spell?q=intnet of things&defType=edismax&qf=spellcontent&wt=json&rows=0&&spellcheck=true&spellcheck.dictionary=file&q.op=AND Please suggest -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Spellchecker -File based vs Index based
Spellcheck configuration is default one.. solr.FileBasedSpellChecker file spellings.txt UTF-8 ./spellcheckerFile default jkdefault file on true 10 5 5 true 10 true 10 5 Also the words are present in the file..For e.g things word which is corrected is present inside file.Also the suggestions related to it are present. *I don't want suggestions for right word (of,things)..Any problem with request .Tried two combinations.* 1./spell?spellcheck.q=intnet of things&spellcheck=true&spellcheck.collateParam.q.op=AND&df=spellcontent&spellcheck.dictionary=file 2./spell?q=intnet of things&defType=edismax&qf=spellcontent&wt=json&rows=0&&spellcheck=true&spellcheck.dictionary=file&q.op=AND Please suggest -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Behavior of Function Query
Please see the below requests and response http://Sol:8983/solr/SCSpell/select?q="*internet of things*"&defType=edismax&qf=spellcontent&wt=json&rows=1&fl=score,internet_of_things:query({!edismax v='"*internet of things*"'}),instant_of_things:query({!edismax v='"instant of things"'}) Response contains score from function query "fl":"score,internet_of_things:query({!edismax v='\"internet of things\"'}),instant_of_things:query({!edismax v='\"instant of things\"'})", "rows":"1", "wt":"json"}}, "response":{"numFound":851,"start":0,"maxScore":7.6176834,"docs":[ { "score":7.6176834, * "internet_of_things":7.6176834*}] }} But if in the same request q is changed,it doesn't give score http://Sol-1:8983/solr/SCSpell/select?q="*wall street*"&defType=edismax&qf=spellcontent&wt=json&rows=1&fl=score,internet_of_things:query({!edismax v='"*internet of things*"'}),instant_of_things:query({!edismax v='"instant of things"'}) "q":"\"wall street\"", "defType":"edismax", "qf":"spellcontent", "fl":"score,internet_of_things:query({!edismax v='\"internet of things\"'}),instant_of_things:query({!edismax v='\"instant of things\"'})", "rows":"1", "wt":"json"}}, "response":{"numFound":46,"start":0,"maxScore":15.670144,"docs":[ { "score":15.670144}] }} Why score of function query is getting applied when q is a different. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Different behavior when using function queries
Can someone please explain the below behavior.For different q parameter function query response differs although function queries are same http://:8983/solr/SCSpell/select?q="*market place*"&defType=edismax&qf=spellcontent&wt=json&rows=1&fl=internet_of_things:if(exists(query({!edismax v='"internet of things"'})),true,false),instant_of_things:if(exists(query({!edismax v='"instant of things"'})),true,false) Response contains function query results "response":{"numFound":80,"start":0,"docs":[ { "internet_of_things":false, "instant_of_things":false}] }} wheras for different q http://:8983/solr/SCSpell/select?q="*intent of things*"&defType=edismax&qf=spellcontent&wt=json&rows=1&fl=internet_of_things:if(exists(query({!edismax v='"internet of things"'})),true,false),instant_of_things:if(exists(query({!edismax v='"instant of things"'})),true,false) Response doesnot contain function query results "response":{"numFound":0,"start":0,"docs":[] }} >From the results it looks like if the results of q doesn't yield result function queries don't work. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Spellchecker -File based vs Index based
Hi, I am seeing difference in file based spellcheck and index based spellcheck implementations. Using index based http://:8983/solr/SCSpell/spell?q=*intnet of things*&defType=edismax&qf=spellcontent&wt=json&rows=0&spellcheck=true&spellcheck.dictionary=*default*&q.op=AND "suggestions":[ "intnet",{ "numFound":10, "startOffset":0, "endOffset":6, "origFreq Suggestion get build up only for wrong word. But while suing file based,they get build up for right words too which messes collations http://:8983/solr/SCSpell/spell?q=intnet%20of%20things&defType=edismax&qf=spellcontent&wt=json&rows=0&&spellcheck=true&spellcheck.dictionary=*file*&q.op=AND "suggestion":["*internet*", "contnet", "intel", "intent", "intert", "intelect", "intended", "intented", "interest", "botnets"]}, "*of*",{ "numFound":8, "startOffset":7, "endOffset":9, "suggestion":["ofc", "off", "ohf", . "soft"]}, "*things*",{ "numFound":10, "startOffset":10, "endOffset":16, "suggestion":["thing", "brings", "think", "thinkers", . Is there any property in file based which I use to fix this -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Relevancy Score Calculation
Thanks.I Agree. Regards Ashish -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Relevancy Score Calculation
Hi, Currently score is calculated based on "Max Doc" instead of "Num Docs".Is it possible to change it to "Num Docs"(i.e without deleted docs).Will it require a code change or some config change. Regards Ashish -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr relevancy score different on replicated nodes
Thanks Erick and everyone.We are checking on stats cache. I noticed stats skew again and optimized the index to correct the same.As per the documents. https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ and https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/ wanted to check on below points considering we want stats skew to be corrected. 1.When optimized single segment won't be natural merged easily.As we might be doing manual optimize every time,what I visualize is at a certain point in future we might be having a single large segment.What impact this large segment is going to have? Our index ~30k documents i.e files with content(Segment size <1Gb as of now) 1.Do you recommend going for optimize in these situations?Probably it will be done only when stats skew.Is it safe? Regards Ashish -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr relevancy score different on replicated nodes
Hi Erick, Our business wanted score not to be totally based on default relevancy algo. Instead a mix of solr relevancy+usermetrics(80%+20%). Each result doc is calculated against max score as a fraction of 80.Remaining 20 is from user metrics. Finally sort happens on new score. But say we got first page correctly, and for the second page if the request goes to other replica where max score is different. UI may result give wrong sort as compared to first page. For e.g last value of page 1 is 70 and first value of second page can be 72 I. e distorted sorting. On top of it we are not using pagination but a infinite scroll which makes it more noticeable. Please suggest. Regards Ashish -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr relevancy score different on replicated nodes
Hi Erick, To test this scenario I added replica again and from few days have been monitoring metrics like Num Docs, Max Doc, Deleted Docs from *Overview* section of core.Checked *Segments Info* section too.Everything looks in sync. http://:8983/solr/#/MyTestCollection_*shard1_replica_n7*/ http://:8983/solr/#/MyTestCollection_*4_shard1_replica_n7*/ If in future they go out of sync,just wanted to confirm if this is a bug although you mentioned as *bq. Shouldn't both replica and leader come to same state after this much long period. No. After that long, the docs will be the same, all the docs present on one replica will be present and searchable on the other. However, they will be in different segments so the "stats skew" will remain. * We need these score,so as a temporary solution if we monitor these metrics for any issues and take action (either optimize or delete-add replica) accordingly.Does it make sense? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr relevancy score different on replicated nodes
Hi Erick, Your statement "*At best, I've seen UIs where they display, say, 1 to 5 stars that are just showing the percentile that the particular doc had _relative to the max score*" is something we are trying to achieve,but we are dealing in percentages rather stars(ratings) Change in MaxScore per node is messing it. I was thinking if it possible to make one complete request(for a term) go though one replica,i.e if to the client we could tell which replica hit the first request and subsequently further paginated requests should go though that replica until keyword is changed.Do you think it is possible or a good idea?If yes is there a way in solr to know which replica served request? Regards Ashish -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr relevancy score different on replicated nodes
Thank you Erick for explaining. In my senario, I stopped indexing and updates too and waited for 1 day. Restarted solr too.Shouldn't both replica and leader come to same state after this much long period. As you said this gets corrected by segment merging, hope it is internal process itself and no manual activity required. For us score matters as we are using it to display some scenarios on search and it gave changing values.As of now we are dependent of single shard-replica but in future we might need more replicas Will planning indexing and updates outside peak query hour help? I have tried the exact cache while debugging score difference during sharding.Didn't help much.Anyhow that's a different topic. Thanks again, Regards Ashish Bisht -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr relevancy score different on replicated nodes
Hi Erick, Thank you for the details,but doesn't look like a time difference in autocommit caused this issue.As I said if I do retrieve all query/keyword query on both server,they returned correct number of docs,its just relevancy score is taking diff values. I waited for brief period,still discrepancy was coming(no indexing also).So I went ahead deleting the follower node(thinking leader replica should be in correct state).After adding the new replica again,the issue is not appearing. We will monitor same if it appears in future. Regards Ashish -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr relevancy score different on replicated nodes
Hi Erick, I have updated that I am not facing this problem in a new collection. As per 3) I can try deleting a replica and adding it again, but the confusion is which one out of two should I delete.(wondering which replica is giving correct score for query) Both replicas give same number of docs while doing all query.Its strange that in query explain docCount and docFreq is differing. Regards Ashish -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html