RE: Slow Solr 8 response for long query

2020-10-05 Thread Permakoff, Vadim
query takes so long to return even 10 rows but gets faster when you move 
the clause to a filter query, but my intuition is that there’s something else 
going on as well to account for the difference when you return 300 rows.

Best,
Erick

> On Sep 29, 2020, at 8:52 PM, Alexandre Rafalovitch  wrote:
>
> What do the debug versions of the query show between two versions?
>
> One thing that changed is sow (split on whitespace) parameter among 
> many. It is unlikely to be the cause, but I am mentioning just in 
> case.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org
> _solr_guide_8-5F6_the-2Dstandard-2Dquery-2Dparser.html-23standard-2Dqu
> ery-2Dparser-2Dparameters=DwIFaQ=birp9sjcGzT9DCP3EIAtLA=T7Y0P9fY
> -fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g=RUATSH_cpLfFDdUDmbHILMZFCZb7-4Ld
> nFI45UJRwrk=tkGnQKurRTwtyBUB8v3-C8khRra5oR7My0EaXsA7_LI=
>
> Regards,
>   Alex
>
> On Tue, 29 Sep 2020 at 20:47, Permakoff, Vadim 
>  wrote:
>>
>> Hi Solr Experts!
>> We are moving from Solr 6.5.1 to Solr 8.5.0 and having a problem with long 
>> query, which has a search text plus many OR and AND conditions (all in one 
>> place, the query is about 20KB long).
>> For the same set of data (about 500K docs) and the same schema the query in 
>> Solr 6 return results in less than 2 sec, Solr 8 takes more than 10 sec to 
>> get 10 results. If I increase the number of rows to 300, in Solr 6 it takes 
>> about 10 sec, in Solr 8 it takes more than 1 min. The results are small, 
>> just IDs. It looks like the relevancy scoring plays role, because if I move 
>> this query to filter query - both Solr versions work pretty fast.
>> The right way should be to change the query, but unfortunately it is 
>> difficult to modify the application which creates these queries, so I want 
>> to find some temporary workaround.
>>
>> What was changed from Solr 6 to Solr 8 in terms of scoring with many 
>> conditions, which affects the search speed negatively?
>> Is there anything to configure in Solr 8 to get the same performance for 
>> such query like it was in Solr 6?
>>
>> Thank you,
>> Vadim
>>
>> 
>>
>> This email is intended solely for the recipient. It may contain privileged, 
>> proprietary or confidential information or material. If you are not the 
>> intended recipient, please delete this email and any attachments and notify 
>> the sender of the error.



Slow Solr 8 response for long query

2020-09-29 Thread Permakoff, Vadim
Hi Solr Experts!
We are moving from Solr 6.5.1 to Solr 8.5.0 and having a problem with long 
query, which has a search text plus many OR and AND conditions (all in one 
place, the query is about 20KB long).
For the same set of data (about 500K docs) and the same schema the query in 
Solr 6 return results in less than 2 sec, Solr 8 takes more than 10 sec to get 
10 results. If I increase the number of rows to 300, in Solr 6 it takes about 
10 sec, in Solr 8 it takes more than 1 min. The results are small, just IDs. It 
looks like the relevancy scoring plays role, because if I move this query to 
filter query - both Solr versions work pretty fast.
The right way should be to change the query, but unfortunately it is difficult 
to modify the application which creates these queries, so I want to find some 
temporary workaround.

What was changed from Solr 6 to Solr 8 in terms of scoring with many 
conditions, which affects the search speed negatively?
Is there anything to configure in Solr 8 to get the same performance for such 
query like it was in Solr 6?

Thank you,
Vadim



This email is intended solely for the recipient. It may contain privileged, 
proprietary or confidential information or material. If you are not the 
intended recipient, please delete this email and any attachments and notify the 
sender of the error.


RE: Query in quotes cannot find results

2020-06-30 Thread Permakoff, Vadim
Thank you Walter, I'll look into “mm” (minimum match) parameter.

Best Regards,
Vadim Permakoff


-Original Message-
From: Walter Underwood  
Sent: Tuesday, June 30, 2020 2:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Query in quotes cannot find results

This is exactly why the “mm” (minimum match) parameter exists, to reduce the 
number of hits with fewer matches. Think of it as a sliding scale between OR 
and AND.

On the other hand, I don’t usually worry about hits with fewer matches. Those 
are not on the first page, so I don’t care.

In general, you can either optimize more related hits or optimize fewer 
unrelated hits. Everything you do to reduce the unrelated hits will cause some 
related hits to not match. 

Also, do all of your tuning with real user queries from logs. Making up queries 
for testing will lead to fixing problems that never occur in production and to 
missing problems that do occur.

wunder
Walter Underwood
wun...@wunderwood.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_=DwIFaQ=birp9sjcGzT9DCP3EIAtLA=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g=Ol5cKm0H8yMMumWsju-SIp8XXKG9UsM1SZdwwfYwRFI=Wfu_hghIf8SKFF7k-pk9A0xMA5CMWm0MVNuK2XJSKuQ=
   (my blog)

> On Jun 30, 2020, at 11:07 AM, Permakoff, Vadim  
> wrote:
> 
> Hi Erick,
> Thank you for the suggestion, I should of add it. Actually before asking this 
> question here, I tried to add and remove the FlattenGraphFilterFactory, plus 
> other variations, like expand / not expand, autoGeneratePhraseQueries / not 
> autoGeneratePhraseQueries - it just does not work with this particular 
> example. You can try it yourself.
> 
> Regarding removing the stopwords, I agree, there are many cases when you 
> don't want to remove the stopwords, but there is one very compelling case 
> when you want them to be removed.
> 
> Imagine, you have one document with the following text: 
> 1. "to expand the methods for mailing cancellation" 
> And another document with the text: 
> 2. "to expand methods for mailing cancellation"
> 
> The user query is (without quotes): q=expand the methods for mailing 
> cancellation I don't want to bring all the documents with condition q.op=OR, 
> it will find too many unrelated documents, so I want to search with q.op=AND. 
> Unfortunately, the document 2 will not be found as it has no stop word "the" 
> in it.
> What should I do now?
> 
> Best Regards,
> Vadim Permakoff
> 
> 
> -Original Message-
> From: Erick Erickson 
> Sent: Tuesday, June 30, 2020 12:15 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query in quotes cannot find results
> 
> Well, the first thing is that you haven’t include FlattenGraphFilterFactory 
> in the index analysis chain, see: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F5_filter-2Ddescriptions.html-23synonym-2Dgraph-2Dfilter=DwIFaQ=birp9sjcGzT9DCP3EIAtLA=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g=v9L0OP7Vty3QDsAE5HHzmT17u-0nP9KxGEYASOsZDRc=LALOI9o1-14JCwd0WYWGCPwTSfWMg0K23bAk3wDp-g4=
>  . IDK whether that actually pertains, but I’d reindex with that included 
> before pursuing.
> 
> Second, “I have a requirement to remove the stopwords”. Why? Who thinks it’s 
> necessary? Is there any evidence for this or any use-case that shows it _is_ 
> necessary? Removing stopwords became common in the long-ago days when memory 
> and disk capacity were vastly more constrained than now. At this point, I 
> require proof that it’s _necessary_ to remove them before accepting this kind 
> of requirement.
> 
> There are situations where removing stopwords is worth the difficulty it 
> causes. But I’ve seen far too many unnecessary requirements to let that one 
> pass without pushing back ;).
> 
> And you can hack around this by adding slop to the phrase, perhaps you can 
> get “good enough” results by adding one slop for every stopword, i.e. if the 
> input is “expand the methods”, detect that there’s one stopword and change it 
> to “expand the methods”~1. That’ll introduce other problems of course.
> 
> Best,
> Erick
> 
>> On Jun 30, 2020, at 11:56 AM, Permakoff, Vadim  
>> wrote:
>> 
>> Hi Erik,
>> That's what I did in the past, but this is an enterprise search and I have a 
>> requirement to remove the stopwords.
>> To have both features I can add synonyms in the front-end application, I 
>> know it will work, but I need a justification why I have to do it in the 
>> application as it is an additional effort.
>> I thought there is a bug for such case to which I can refer, because 
>> according to documentation it should work, right?
>> Anyway, there is more to it. If I'll add the same synonym processing to the 
>>

RE: Query in quotes cannot find results

2020-06-30 Thread Permakoff, Vadim
Hi Walter,
I'm with you, sometimes the stopwords are very important, I did a few years 
back just for fun the Solr demo for Wikipedia search, you can see - nothing is 
removed:
http://www.softcorporation.com/lab/solr/wiki/?sq=to+be+or+not+to+be

But with the enterprise search, sometimes you will be better off removing the 
stopwords, I replied to Erick why. 
My question is not "Should we remove the stopwords?", my question is: 
"Apparently the synonyms with spaces are not working if we are removing the 
stopwords. Is there a way to fix it or is there a jira for it?"

Best Regards,
Vadim Permakoff


-Original Message-
From: Walter Underwood  
Sent: Tuesday, June 30, 2020 12:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Query in quotes cannot find results

Removing stopwords is a dumb requirement. “Doctor, it hurts when I shove 
hedgehogs up my arse.”

Part of our job as search engineers is to solve the real problem, not implement 
a pile of requirements from people who don’t understand how search works.

Here is an article I wrote 13 years ago about why we didn’t remove stopwords at 
Netflix.

https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2007_05_31_do-2Dall-2Dstopword-2Dqueries-2Dmatter_=DwIFaQ=birp9sjcGzT9DCP3EIAtLA=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g=kjHjId_IfQN_w0ISSEAUWfFIrgqEl2H7YiZSx22eRys=RhKQkdqdNNyweNUackNjcCPnj-0ahUz7oHjupG4v9yM=
 

wunder
Walter Underwood
wun...@wunderwood.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_=DwIFaQ=birp9sjcGzT9DCP3EIAtLA=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g=kjHjId_IfQN_w0ISSEAUWfFIrgqEl2H7YiZSx22eRys=8xpxLnqquGUWswYROoC61WTpDxzjwNOnEoRNw3vNvmM=
   (my blog)

> On Jun 30, 2020, at 8:56 AM, Permakoff, Vadim  
> wrote:
> 
> Hi Erik,
> That's what I did in the past, but this is an enterprise search and I have a 
> requirement to remove the stopwords.
> To have both features I can add synonyms in the front-end application, I know 
> it will work, but I need a justification why I have to do it in the 
> application as it is an additional effort.
> I thought there is a bug for such case to which I can refer, because 
> according to documentation it should work, right?
> Anyway, there is more to it. If I'll add the same synonym processing to the 
> indexing part, i.e. the configuration will be like this:
> 
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>  
>
> ignoreCase="true"/>
> words="stopwords.txt"/>
>
>  
>  
>
> ignoreCase="true" expand="true"/>
> words="stopwords.txt"/>
>
>  
>
> 
> The analysis shows the parsing is matching now for indexing and querying 
> path, but the exact match result still cannot be found! This is weird.
> Any thoughts?
> 
> Best Regards,
> Vadim Permakoff
> 
> 
> -Original Message-
> From: Erick Erickson  
> Sent: Monday, June 29, 2020 10:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query in quotes cannot find results
> 
> Looks like you’re removing stopwords. Stopwords cause issues like this with 
> the positions being off.
> 
> It’s becoming more and more common to _NOT_ remove stopwords, is that an 
> option?
> 
> 
> 
> Best,
> Erick
> 
>> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim  
>> wrote:
>> 
>> Hi Shawn,
>> Many thanks for the response, I checked the field and it is correct. Let's 
>> call it _text_ to make it easier.
>> I believe the parsing is also correct, please see below:
>> - Query without quotes (works):
>>   "querystring":"expand the methods",
>>   "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) 
>> _text_:methods",
>> 
>> - Query with quotes (does not work):
>>   "querystring":"\"expand the methods\"",
>>   "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, 
>> _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
>> 
>> The document has text:
>> "to expand the methods for mailing cancellation"
>> 
>> The analysis on this field shows that all words are present in the index and 
>> the query, the order is also correct, but the word "methods" in moved one 
>> position, I guess that's why the result is not found.
>> 
>> Best Regards,
>> Vadim Permakoff
>> 
>> 
>> 
>> 
>> -Original Message-
>> From: Shawn Heisey 
>> Sent: Monday, June 29, 2020 6:28 PM
>> To: solr-user@lucene

RE: Query in quotes cannot find results

2020-06-30 Thread Permakoff, Vadim
Hi Erick,
Thank you for the suggestion, I should of add it. Actually before asking this 
question here, I tried to add and remove the FlattenGraphFilterFactory, plus 
other variations, like expand / not expand, autoGeneratePhraseQueries / not 
autoGeneratePhraseQueries - it just does not work with this particular example. 
You can try it yourself.

Regarding removing the stopwords, I agree, there are many cases when you don't 
want to remove the stopwords, but there is one very compelling case when you 
want them to be removed.

Imagine, you have one document with the following text: 
1. "to expand the methods for mailing cancellation" 
And another document with the text: 
2. "to expand methods for mailing cancellation"

The user query is (without quotes): q=expand the methods for mailing 
cancellation
I don't want to bring all the documents with condition q.op=OR, it will find 
too many unrelated documents, so I want to search with q.op=AND. Unfortunately, 
the document 2 will not be found as it has no stop word "the" in it.
What should I do now?

Best Regards,
Vadim Permakoff


-Original Message-
From: Erick Erickson  
Sent: Tuesday, June 30, 2020 12:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Query in quotes cannot find results

Well, the first thing is that you haven’t include FlattenGraphFilterFactory in 
the index analysis chain, see: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F5_filter-2Ddescriptions.html-23synonym-2Dgraph-2Dfilter=DwIFaQ=birp9sjcGzT9DCP3EIAtLA=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g=v9L0OP7Vty3QDsAE5HHzmT17u-0nP9KxGEYASOsZDRc=LALOI9o1-14JCwd0WYWGCPwTSfWMg0K23bAk3wDp-g4=
 . IDK whether that actually pertains, but I’d reindex with that included 
before pursuing.

Second, “I have a requirement to remove the stopwords”. Why? Who thinks it’s 
necessary? Is there any evidence for this or any use-case that shows it _is_ 
necessary? Removing stopwords became common in the long-ago days when memory 
and disk capacity were vastly more constrained than now. At this point, I 
require proof that it’s _necessary_ to remove them before accepting this kind 
of requirement.

There are situations where removing stopwords is worth the difficulty it 
causes. But I’ve seen far too many unnecessary requirements to let that one 
pass without pushing back ;).

And you can hack around this by adding slop to the phrase, perhaps you can get 
“good enough” results by adding one slop for every stopword, i.e. if the input 
is “expand the methods”, detect that there’s one stopword and change it to 
“expand the methods”~1. That’ll introduce other problems of course.

Best,
Erick

> On Jun 30, 2020, at 11:56 AM, Permakoff, Vadim  
> wrote:
> 
> Hi Erik,
> That's what I did in the past, but this is an enterprise search and I have a 
> requirement to remove the stopwords.
> To have both features I can add synonyms in the front-end application, I know 
> it will work, but I need a justification why I have to do it in the 
> application as it is an additional effort.
> I thought there is a bug for such case to which I can refer, because 
> according to documentation it should work, right?
> Anyway, there is more to it. If I'll add the same synonym processing to the 
> indexing part, i.e. the configuration will be like this:
> 
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>  
>
> ignoreCase="true"/>
> words="stopwords.txt"/>
>
>  
>  
>
> ignoreCase="true" expand="true"/>
> words="stopwords.txt"/>
>
>  
>
> 
> The analysis shows the parsing is matching now for indexing and querying 
> path, but the exact match result still cannot be found! This is weird.
> Any thoughts?
> 
> Best Regards,
> Vadim Permakoff
> 
> 
> -Original Message-
> From: Erick Erickson  
> Sent: Monday, June 29, 2020 10:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query in quotes cannot find results
> 
> Looks like you’re removing stopwords. Stopwords cause issues like this with 
> the positions being off.
> 
> It’s becoming more and more common to _NOT_ remove stopwords, is that an 
> option?
> 
> 
> 
> Best,
> Erick
> 
>> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim  
>> wrote:
>> 
>> Hi Shawn,
>> Many thanks for the response, I checked the field and it is correct. Let's 
>> call it _text_ to make it easier.
>> I believe the parsing is also correct, please see below:
>> - Query without quotes (works):
>>   "querystring":"expand the methods",
>>   "parsedquery":"(PhraseQuery(_text_:\"blow up

RE: Query in quotes cannot find results

2020-06-30 Thread Permakoff, Vadim
Hi Erik,
That's what I did in the past, but this is an enterprise search and I have a 
requirement to remove the stopwords.
To have both features I can add synonyms in the front-end application, I know 
it will work, but I need a justification why I have to do it in the application 
as it is an additional effort.
I thought there is a bug for such case to which I can refer, because according 
to documentation it should work, right?
Anyway, there is more to it. If I'll add the same synonym processing to the 
indexing part, i.e. the configuration will be like this:


  




  
  




  


The analysis shows the parsing is matching now for indexing and querying path, 
but the exact match result still cannot be found! This is weird.
Any thoughts?

Best Regards,
Vadim Permakoff


-Original Message-
From: Erick Erickson  
Sent: Monday, June 29, 2020 10:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Query in quotes cannot find results

Looks like you’re removing stopwords. Stopwords cause issues like this with the 
positions being off.

It’s becoming more and more common to _NOT_ remove stopwords, is that an option?



Best,
Erick

> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim  
> wrote:
> 
> Hi Shawn,
> Many thanks for the response, I checked the field and it is correct. Let's 
> call it _text_ to make it easier.
> I believe the parsing is also correct, please see below:
> - Query without quotes (works):
>"querystring":"expand the methods",
>"parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) 
> _text_:methods",
> 
> - Query with quotes (does not work):
>"querystring":"\"expand the methods\"",
>"parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, 
> _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
> 
> The document has text:
> "to expand the methods for mailing cancellation"
> 
> The analysis on this field shows that all words are present in the index and 
> the query, the order is also correct, but the word "methods" in moved one 
> position, I guess that's why the result is not found.
> 
> Best Regards,
> Vadim Permakoff
> 
> 
> 
> 
> -----Original Message-
> From: Shawn Heisey 
> Sent: Monday, June 29, 2020 6:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query in quotes cannot find results
> 
> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>> The basic query q=expand the methods   <<< finds the document,
>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>> 
>> Am I doing something wrong, or is it known bug (I saw similar issues 
>> discussed in the past, but not for exact match query) and if yes - what is 
>> the Jira for it?
> 
> The most helpful information will come from running both queries with debug 
> enabled, so you can see how the query is parsed.  If you add a parameter 
> "debugQuery=true" to the URL, then the response should include the parsed 
> query.  Compare those, and see if you can tell what the differences are.
> 
> One of the most common problems for queries like this is that you're not 
> searching the field that you THINK you're searching.  I don't know whether 
> this is the problem, I just mention it because it is a common error.
> 
> Thanks,
> Shawn
> 
> 
> 
> This email is intended solely for the recipient. It may contain privileged, 
> proprietary or confidential information or material. If you are not the 
> intended recipient, please delete this email and any attachments and notify 
> the sender of the error.



RE: Query in quotes cannot find results

2020-06-29 Thread Permakoff, Vadim
Hi Shawn,
Many thanks for the response, I checked the field and it is correct. Let's call 
it _text_ to make it easier.
I believe the parsing is also correct, please see below:
 - Query without quotes (works):
"querystring":"expand the methods",
"parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) 
_text_:methods",

 - Query with quotes (does not work):
"querystring":"\"expand the methods\"",
"parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, 
_text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",

The document has text:
"to expand the methods for mailing cancellation"

The analysis on this field shows that all words are present in the index and 
the query, the order is also correct, but the word "methods" in moved one 
position, I guess that's why the result is not found.

Best Regards,
Vadim Permakoff




-Original Message-
From: Shawn Heisey 
Sent: Monday, June 29, 2020 6:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Query in quotes cannot find results

On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
> The basic query q=expand the methods   <<< finds the document,
> the query (in quotes) q="expand the methods"   <<< cannot find the document
>
> Am I doing something wrong, or is it known bug (I saw similar issues 
> discussed in the past, but not for exact match query) and if yes - what is 
> the Jira for it?

The most helpful information will come from running both queries with debug 
enabled, so you can see how the query is parsed.  If you add a parameter 
"debugQuery=true" to the URL, then the response should include the parsed 
query.  Compare those, and see if you can tell what the differences are.

One of the most common problems for queries like this is that you're not 
searching the field that you THINK you're searching.  I don't know whether this 
is the problem, I just mention it because it is a common error.

Thanks,
Shawn



This email is intended solely for the recipient. It may contain privileged, 
proprietary or confidential information or material. If you are not the 
intended recipient, please delete this email and any attachments and notify the 
sender of the error.


Query in quotes cannot find results

2020-06-29 Thread Permakoff, Vadim
Hi,
This might be known issue, but I cannot find a reference for this specific case 
- searching for exact query with synonyms and stopwords.

I have a simple configuration for catch-all field:


  
   


  
  




  


The synonyms.txt file has only one line:
expand,blow up

The stopwords.txt file has only one line:
the

There is only one document:
{
   "id":"1",
"title":"to expand the methods for mailing cancellation"
}

Everything else is default basic configuaration. Tested with Solr 6.5.1 and 
Solr 8.5.2.

The basic query q=expand the methods   <<< finds the document,
the query (in quotes) q="expand the methods"   <<< cannot find the document

Am I doing something wrong, or is it known bug (I saw similar issues discussed 
in the past, but not for exact match query) and if yes - what is the Jira for 
it?

Best Regards,
Vadim Permakoff




This email is intended solely for the recipient. It may contain privileged, 
proprietary or confidential information or material. If you are not the 
intended recipient, please delete this email and any attachments and notify the 
sender of the error.