RE: Slow Solr 8 response for long query
query takes so long to return even 10 rows but gets faster when you move the clause to a filter query, but my intuition is that there’s something else going on as well to account for the difference when you return 300 rows. Best, Erick > On Sep 29, 2020, at 8:52 PM, Alexandre Rafalovitch wrote: > > What do the debug versions of the query show between two versions? > > One thing that changed is sow (split on whitespace) parameter among > many. It is unlikely to be the cause, but I am mentioning just in > case. > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org > _solr_guide_8-5F6_the-2Dstandard-2Dquery-2Dparser.html-23standard-2Dqu > ery-2Dparser-2Dparameters=DwIFaQ=birp9sjcGzT9DCP3EIAtLA=T7Y0P9fY > -fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g=RUATSH_cpLfFDdUDmbHILMZFCZb7-4Ld > nFI45UJRwrk=tkGnQKurRTwtyBUB8v3-C8khRra5oR7My0EaXsA7_LI= > > Regards, > Alex > > On Tue, 29 Sep 2020 at 20:47, Permakoff, Vadim > wrote: >> >> Hi Solr Experts! >> We are moving from Solr 6.5.1 to Solr 8.5.0 and having a problem with long >> query, which has a search text plus many OR and AND conditions (all in one >> place, the query is about 20KB long). >> For the same set of data (about 500K docs) and the same schema the query in >> Solr 6 return results in less than 2 sec, Solr 8 takes more than 10 sec to >> get 10 results. If I increase the number of rows to 300, in Solr 6 it takes >> about 10 sec, in Solr 8 it takes more than 1 min. The results are small, >> just IDs. It looks like the relevancy scoring plays role, because if I move >> this query to filter query - both Solr versions work pretty fast. >> The right way should be to change the query, but unfortunately it is >> difficult to modify the application which creates these queries, so I want >> to find some temporary workaround. >> >> What was changed from Solr 6 to Solr 8 in terms of scoring with many >> conditions, which affects the search speed negatively? >> Is there anything to configure in Solr 8 to get the same performance for >> such query like it was in Solr 6? >> >> Thank you, >> Vadim >> >> >> >> This email is intended solely for the recipient. It may contain privileged, >> proprietary or confidential information or material. If you are not the >> intended recipient, please delete this email and any attachments and notify >> the sender of the error.
Slow Solr 8 response for long query
Hi Solr Experts! We are moving from Solr 6.5.1 to Solr 8.5.0 and having a problem with long query, which has a search text plus many OR and AND conditions (all in one place, the query is about 20KB long). For the same set of data (about 500K docs) and the same schema the query in Solr 6 return results in less than 2 sec, Solr 8 takes more than 10 sec to get 10 results. If I increase the number of rows to 300, in Solr 6 it takes about 10 sec, in Solr 8 it takes more than 1 min. The results are small, just IDs. It looks like the relevancy scoring plays role, because if I move this query to filter query - both Solr versions work pretty fast. The right way should be to change the query, but unfortunately it is difficult to modify the application which creates these queries, so I want to find some temporary workaround. What was changed from Solr 6 to Solr 8 in terms of scoring with many conditions, which affects the search speed negatively? Is there anything to configure in Solr 8 to get the same performance for such query like it was in Solr 6? Thank you, Vadim This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.
RE: Query in quotes cannot find results
Thank you Walter, I'll look into “mm” (minimum match) parameter. Best Regards, Vadim Permakoff -Original Message- From: Walter Underwood Sent: Tuesday, June 30, 2020 2:31 PM To: solr-user@lucene.apache.org Subject: Re: Query in quotes cannot find results This is exactly why the “mm” (minimum match) parameter exists, to reduce the number of hits with fewer matches. Think of it as a sliding scale between OR and AND. On the other hand, I don’t usually worry about hits with fewer matches. Those are not on the first page, so I don’t care. In general, you can either optimize more related hits or optimize fewer unrelated hits. Everything you do to reduce the unrelated hits will cause some related hits to not match. Also, do all of your tuning with real user queries from logs. Making up queries for testing will lead to fixing problems that never occur in production and to missing problems that do occur. wunder Walter Underwood wun...@wunderwood.org https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_=DwIFaQ=birp9sjcGzT9DCP3EIAtLA=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g=Ol5cKm0H8yMMumWsju-SIp8XXKG9UsM1SZdwwfYwRFI=Wfu_hghIf8SKFF7k-pk9A0xMA5CMWm0MVNuK2XJSKuQ= (my blog) > On Jun 30, 2020, at 11:07 AM, Permakoff, Vadim > wrote: > > Hi Erick, > Thank you for the suggestion, I should of add it. Actually before asking this > question here, I tried to add and remove the FlattenGraphFilterFactory, plus > other variations, like expand / not expand, autoGeneratePhraseQueries / not > autoGeneratePhraseQueries - it just does not work with this particular > example. You can try it yourself. > > Regarding removing the stopwords, I agree, there are many cases when you > don't want to remove the stopwords, but there is one very compelling case > when you want them to be removed. > > Imagine, you have one document with the following text: > 1. "to expand the methods for mailing cancellation" > And another document with the text: > 2. "to expand methods for mailing cancellation" > > The user query is (without quotes): q=expand the methods for mailing > cancellation I don't want to bring all the documents with condition q.op=OR, > it will find too many unrelated documents, so I want to search with q.op=AND. > Unfortunately, the document 2 will not be found as it has no stop word "the" > in it. > What should I do now? > > Best Regards, > Vadim Permakoff > > > -Original Message- > From: Erick Erickson > Sent: Tuesday, June 30, 2020 12:15 PM > To: solr-user@lucene.apache.org > Subject: Re: Query in quotes cannot find results > > Well, the first thing is that you haven’t include FlattenGraphFilterFactory > in the index analysis chain, see: > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F5_filter-2Ddescriptions.html-23synonym-2Dgraph-2Dfilter=DwIFaQ=birp9sjcGzT9DCP3EIAtLA=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g=v9L0OP7Vty3QDsAE5HHzmT17u-0nP9KxGEYASOsZDRc=LALOI9o1-14JCwd0WYWGCPwTSfWMg0K23bAk3wDp-g4= > . IDK whether that actually pertains, but I’d reindex with that included > before pursuing. > > Second, “I have a requirement to remove the stopwords”. Why? Who thinks it’s > necessary? Is there any evidence for this or any use-case that shows it _is_ > necessary? Removing stopwords became common in the long-ago days when memory > and disk capacity were vastly more constrained than now. At this point, I > require proof that it’s _necessary_ to remove them before accepting this kind > of requirement. > > There are situations where removing stopwords is worth the difficulty it > causes. But I’ve seen far too many unnecessary requirements to let that one > pass without pushing back ;). > > And you can hack around this by adding slop to the phrase, perhaps you can > get “good enough” results by adding one slop for every stopword, i.e. if the > input is “expand the methods”, detect that there’s one stopword and change it > to “expand the methods”~1. That’ll introduce other problems of course. > > Best, > Erick > >> On Jun 30, 2020, at 11:56 AM, Permakoff, Vadim >> wrote: >> >> Hi Erik, >> That's what I did in the past, but this is an enterprise search and I have a >> requirement to remove the stopwords. >> To have both features I can add synonyms in the front-end application, I >> know it will work, but I need a justification why I have to do it in the >> application as it is an additional effort. >> I thought there is a bug for such case to which I can refer, because >> according to documentation it should work, right? >> Anyway, there is more to it. If I'll add the same synonym processing to the >>
RE: Query in quotes cannot find results
Hi Walter, I'm with you, sometimes the stopwords are very important, I did a few years back just for fun the Solr demo for Wikipedia search, you can see - nothing is removed: http://www.softcorporation.com/lab/solr/wiki/?sq=to+be+or+not+to+be But with the enterprise search, sometimes you will be better off removing the stopwords, I replied to Erick why. My question is not "Should we remove the stopwords?", my question is: "Apparently the synonyms with spaces are not working if we are removing the stopwords. Is there a way to fix it or is there a jira for it?" Best Regards, Vadim Permakoff -Original Message- From: Walter Underwood Sent: Tuesday, June 30, 2020 12:50 PM To: solr-user@lucene.apache.org Subject: Re: Query in quotes cannot find results Removing stopwords is a dumb requirement. “Doctor, it hurts when I shove hedgehogs up my arse.” Part of our job as search engineers is to solve the real problem, not implement a pile of requirements from people who don’t understand how search works. Here is an article I wrote 13 years ago about why we didn’t remove stopwords at Netflix. https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2007_05_31_do-2Dall-2Dstopword-2Dqueries-2Dmatter_=DwIFaQ=birp9sjcGzT9DCP3EIAtLA=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g=kjHjId_IfQN_w0ISSEAUWfFIrgqEl2H7YiZSx22eRys=RhKQkdqdNNyweNUackNjcCPnj-0ahUz7oHjupG4v9yM= wunder Walter Underwood wun...@wunderwood.org https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_=DwIFaQ=birp9sjcGzT9DCP3EIAtLA=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g=kjHjId_IfQN_w0ISSEAUWfFIrgqEl2H7YiZSx22eRys=8xpxLnqquGUWswYROoC61WTpDxzjwNOnEoRNw3vNvmM= (my blog) > On Jun 30, 2020, at 8:56 AM, Permakoff, Vadim > wrote: > > Hi Erik, > That's what I did in the past, but this is an enterprise search and I have a > requirement to remove the stopwords. > To have both features I can add synonyms in the front-end application, I know > it will work, but I need a justification why I have to do it in the > application as it is an additional effort. > I thought there is a bug for such case to which I can refer, because > according to documentation it should work, right? > Anyway, there is more to it. If I'll add the same synonym processing to the > indexing part, i.e. the configuration will be like this: > > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > > > ignoreCase="true"/> > words="stopwords.txt"/> > > > > > ignoreCase="true" expand="true"/> > words="stopwords.txt"/> > > > > > The analysis shows the parsing is matching now for indexing and querying > path, but the exact match result still cannot be found! This is weird. > Any thoughts? > > Best Regards, > Vadim Permakoff > > > -Original Message- > From: Erick Erickson > Sent: Monday, June 29, 2020 10:19 PM > To: solr-user@lucene.apache.org > Subject: Re: Query in quotes cannot find results > > Looks like you’re removing stopwords. Stopwords cause issues like this with > the positions being off. > > It’s becoming more and more common to _NOT_ remove stopwords, is that an > option? > > > > Best, > Erick > >> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim >> wrote: >> >> Hi Shawn, >> Many thanks for the response, I checked the field and it is correct. Let's >> call it _text_ to make it easier. >> I believe the parsing is also correct, please see below: >> - Query without quotes (works): >> "querystring":"expand the methods", >> "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) >> _text_:methods", >> >> - Query with quotes (does not work): >> "querystring":"\"expand the methods\"", >> "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, >> _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))", >> >> The document has text: >> "to expand the methods for mailing cancellation" >> >> The analysis on this field shows that all words are present in the index and >> the query, the order is also correct, but the word "methods" in moved one >> position, I guess that's why the result is not found. >> >> Best Regards, >> Vadim Permakoff >> >> >> >> >> -Original Message- >> From: Shawn Heisey >> Sent: Monday, June 29, 2020 6:28 PM >> To: solr-user@lucene
RE: Query in quotes cannot find results
Hi Erick, Thank you for the suggestion, I should of add it. Actually before asking this question here, I tried to add and remove the FlattenGraphFilterFactory, plus other variations, like expand / not expand, autoGeneratePhraseQueries / not autoGeneratePhraseQueries - it just does not work with this particular example. You can try it yourself. Regarding removing the stopwords, I agree, there are many cases when you don't want to remove the stopwords, but there is one very compelling case when you want them to be removed. Imagine, you have one document with the following text: 1. "to expand the methods for mailing cancellation" And another document with the text: 2. "to expand methods for mailing cancellation" The user query is (without quotes): q=expand the methods for mailing cancellation I don't want to bring all the documents with condition q.op=OR, it will find too many unrelated documents, so I want to search with q.op=AND. Unfortunately, the document 2 will not be found as it has no stop word "the" in it. What should I do now? Best Regards, Vadim Permakoff -Original Message- From: Erick Erickson Sent: Tuesday, June 30, 2020 12:15 PM To: solr-user@lucene.apache.org Subject: Re: Query in quotes cannot find results Well, the first thing is that you haven’t include FlattenGraphFilterFactory in the index analysis chain, see: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F5_filter-2Ddescriptions.html-23synonym-2Dgraph-2Dfilter=DwIFaQ=birp9sjcGzT9DCP3EIAtLA=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g=v9L0OP7Vty3QDsAE5HHzmT17u-0nP9KxGEYASOsZDRc=LALOI9o1-14JCwd0WYWGCPwTSfWMg0K23bAk3wDp-g4= . IDK whether that actually pertains, but I’d reindex with that included before pursuing. Second, “I have a requirement to remove the stopwords”. Why? Who thinks it’s necessary? Is there any evidence for this or any use-case that shows it _is_ necessary? Removing stopwords became common in the long-ago days when memory and disk capacity were vastly more constrained than now. At this point, I require proof that it’s _necessary_ to remove them before accepting this kind of requirement. There are situations where removing stopwords is worth the difficulty it causes. But I’ve seen far too many unnecessary requirements to let that one pass without pushing back ;). And you can hack around this by adding slop to the phrase, perhaps you can get “good enough” results by adding one slop for every stopword, i.e. if the input is “expand the methods”, detect that there’s one stopword and change it to “expand the methods”~1. That’ll introduce other problems of course. Best, Erick > On Jun 30, 2020, at 11:56 AM, Permakoff, Vadim > wrote: > > Hi Erik, > That's what I did in the past, but this is an enterprise search and I have a > requirement to remove the stopwords. > To have both features I can add synonyms in the front-end application, I know > it will work, but I need a justification why I have to do it in the > application as it is an additional effort. > I thought there is a bug for such case to which I can refer, because > according to documentation it should work, right? > Anyway, there is more to it. If I'll add the same synonym processing to the > indexing part, i.e. the configuration will be like this: > > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > > > ignoreCase="true"/> > words="stopwords.txt"/> > > > > > ignoreCase="true" expand="true"/> > words="stopwords.txt"/> > > > > > The analysis shows the parsing is matching now for indexing and querying > path, but the exact match result still cannot be found! This is weird. > Any thoughts? > > Best Regards, > Vadim Permakoff > > > -Original Message- > From: Erick Erickson > Sent: Monday, June 29, 2020 10:19 PM > To: solr-user@lucene.apache.org > Subject: Re: Query in quotes cannot find results > > Looks like you’re removing stopwords. Stopwords cause issues like this with > the positions being off. > > It’s becoming more and more common to _NOT_ remove stopwords, is that an > option? > > > > Best, > Erick > >> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim >> wrote: >> >> Hi Shawn, >> Many thanks for the response, I checked the field and it is correct. Let's >> call it _text_ to make it easier. >> I believe the parsing is also correct, please see below: >> - Query without quotes (works): >> "querystring":"expand the methods", >> "parsedquery":"(PhraseQuery(_text_:\"blow up
RE: Query in quotes cannot find results
Hi Erik, That's what I did in the past, but this is an enterprise search and I have a requirement to remove the stopwords. To have both features I can add synonyms in the front-end application, I know it will work, but I need a justification why I have to do it in the application as it is an additional effort. I thought there is a bug for such case to which I can refer, because according to documentation it should work, right? Anyway, there is more to it. If I'll add the same synonym processing to the indexing part, i.e. the configuration will be like this: The analysis shows the parsing is matching now for indexing and querying path, but the exact match result still cannot be found! This is weird. Any thoughts? Best Regards, Vadim Permakoff -Original Message- From: Erick Erickson Sent: Monday, June 29, 2020 10:19 PM To: solr-user@lucene.apache.org Subject: Re: Query in quotes cannot find results Looks like you’re removing stopwords. Stopwords cause issues like this with the positions being off. It’s becoming more and more common to _NOT_ remove stopwords, is that an option? Best, Erick > On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim > wrote: > > Hi Shawn, > Many thanks for the response, I checked the field and it is correct. Let's > call it _text_ to make it easier. > I believe the parsing is also correct, please see below: > - Query without quotes (works): >"querystring":"expand the methods", >"parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) > _text_:methods", > > - Query with quotes (does not work): >"querystring":"\"expand the methods\"", >"parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, > _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))", > > The document has text: > "to expand the methods for mailing cancellation" > > The analysis on this field shows that all words are present in the index and > the query, the order is also correct, but the word "methods" in moved one > position, I guess that's why the result is not found. > > Best Regards, > Vadim Permakoff > > > > > -----Original Message- > From: Shawn Heisey > Sent: Monday, June 29, 2020 6:28 PM > To: solr-user@lucene.apache.org > Subject: Re: Query in quotes cannot find results > > On 6/29/2020 3:34 PM, Permakoff, Vadim wrote: >> The basic query q=expand the methods <<< finds the document, >> the query (in quotes) q="expand the methods" <<< cannot find the document >> >> Am I doing something wrong, or is it known bug (I saw similar issues >> discussed in the past, but not for exact match query) and if yes - what is >> the Jira for it? > > The most helpful information will come from running both queries with debug > enabled, so you can see how the query is parsed. If you add a parameter > "debugQuery=true" to the URL, then the response should include the parsed > query. Compare those, and see if you can tell what the differences are. > > One of the most common problems for queries like this is that you're not > searching the field that you THINK you're searching. I don't know whether > this is the problem, I just mention it because it is a common error. > > Thanks, > Shawn > > > > This email is intended solely for the recipient. It may contain privileged, > proprietary or confidential information or material. If you are not the > intended recipient, please delete this email and any attachments and notify > the sender of the error.
RE: Query in quotes cannot find results
Hi Shawn, Many thanks for the response, I checked the field and it is correct. Let's call it _text_ to make it easier. I believe the parsing is also correct, please see below: - Query without quotes (works): "querystring":"expand the methods", "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) _text_:methods", - Query with quotes (does not work): "querystring":"\"expand the methods\"", "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))", The document has text: "to expand the methods for mailing cancellation" The analysis on this field shows that all words are present in the index and the query, the order is also correct, but the word "methods" in moved one position, I guess that's why the result is not found. Best Regards, Vadim Permakoff -Original Message- From: Shawn Heisey Sent: Monday, June 29, 2020 6:28 PM To: solr-user@lucene.apache.org Subject: Re: Query in quotes cannot find results On 6/29/2020 3:34 PM, Permakoff, Vadim wrote: > The basic query q=expand the methods <<< finds the document, > the query (in quotes) q="expand the methods" <<< cannot find the document > > Am I doing something wrong, or is it known bug (I saw similar issues > discussed in the past, but not for exact match query) and if yes - what is > the Jira for it? The most helpful information will come from running both queries with debug enabled, so you can see how the query is parsed. If you add a parameter "debugQuery=true" to the URL, then the response should include the parsed query. Compare those, and see if you can tell what the differences are. One of the most common problems for queries like this is that you're not searching the field that you THINK you're searching. I don't know whether this is the problem, I just mention it because it is a common error. Thanks, Shawn This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.
Query in quotes cannot find results
Hi, This might be known issue, but I cannot find a reference for this specific case - searching for exact query with synonyms and stopwords. I have a simple configuration for catch-all field: The synonyms.txt file has only one line: expand,blow up The stopwords.txt file has only one line: the There is only one document: { "id":"1", "title":"to expand the methods for mailing cancellation" } Everything else is default basic configuaration. Tested with Solr 6.5.1 and Solr 8.5.2. The basic query q=expand the methods <<< finds the document, the query (in quotes) q="expand the methods" <<< cannot find the document Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it? Best Regards, Vadim Permakoff This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.