Hi Walter,
I'm with you, sometimes the stopwords are very important, I did a few years 
back just for fun the Solr demo for Wikipedia search, you can see - nothing is 
removed:
http://www.softcorporation.com/lab/solr/wiki/?sq=to+be+or+not+to+be

But with the enterprise search, sometimes you will be better off removing the 
stopwords, I replied to Erick why. 
My question is not "Should we remove the stopwords?", my question is: 
"Apparently the synonyms with spaces are not working if we are removing the 
stopwords. Is there a way to fix it or is there a jira for it?"

Best Regards,
Vadim Permakoff


-----Original Message-----
From: Walter Underwood <wun...@wunderwood.org> 
Sent: Tuesday, June 30, 2020 12:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Query in quotes cannot find results

Removing stopwords is a dumb requirement. “Doctor, it hurts when I shove 
hedgehogs up my arse.”

Part of our job as search engineers is to solve the real problem, not implement 
a pile of requirements from people who don’t understand how search works.

Here is an article I wrote 13 years ago about why we didn’t remove stopwords at 
Netflix.

https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2007_05_31_do-2Dall-2Dstopword-2Dqueries-2Dmatter_&d=DwIFaQ&c=birp9sjcGzT9DCP3EIAtLA&r=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g&m=kjHjId_IfQN_w0ISSEAUWfFIrgqEl2H7YiZSx22eRys&s=RhKQkdqdNNyweNUackNjcCPnj-0ahUz7oHjupG4v9yM&e=
 

wunder
Walter Underwood
wun...@wunderwood.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=birp9sjcGzT9DCP3EIAtLA&r=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g&m=kjHjId_IfQN_w0ISSEAUWfFIrgqEl2H7YiZSx22eRys&s=8xpxLnqquGUWswYROoC61WTpDxzjwNOnEoRNw3vNvmM&e=
   (my blog)

> On Jun 30, 2020, at 8:56 AM, Permakoff, Vadim <vadim.permak...@verisk.com> 
> wrote:
> 
> Hi Erik,
> That's what I did in the past, but this is an enterprise search and I have a 
> requirement to remove the stopwords.
> To have both features I can add synonyms in the front-end application, I know 
> it will work, but I need a justification why I have to do it in the 
> application as it is an additional effort.
> I thought there is a bug for such case to which I can refer, because 
> according to documentation it should work, right?
> Anyway, there is more to it. If I'll add the same synonym processing to the 
> indexing part, i.e. the configuration will be like this:
> 
>    <fieldType name="text_test" class="solr.TextField" 
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" 
> ignoreCase="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" 
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
> 
> The analysis shows the parsing is matching now for indexing and querying 
> path, but the exact match result still cannot be found! This is weird.
> Any thoughts?
> 
> Best Regards,
> Vadim Permakoff
> 
> 
> -----Original Message-----
> From: Erick Erickson <erickerick...@gmail.com> 
> Sent: Monday, June 29, 2020 10:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query in quotes cannot find results
> 
> Looks like you’re removing stopwords. Stopwords cause issues like this with 
> the positions being off.
> 
> It’s becoming more and more common to _NOT_ remove stopwords, is that an 
> option?
> 
> 
> 
> Best,
> Erick
> 
>> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim <vadim.permak...@verisk.com> 
>> wrote:
>> 
>> Hi Shawn,
>> Many thanks for the response, I checked the field and it is correct. Let's 
>> call it _text_ to make it easier.
>> I believe the parsing is also correct, please see below:
>> - Query without quotes (works):
>>   "querystring":"expand the methods",
>>   "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) 
>> _text_:methods",
>> 
>> - Query with quotes (does not work):
>>   "querystring":"\"expand the methods\"",
>>   "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, 
>> _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
>> 
>> The document has text:
>> "to expand the methods for mailing cancellation"
>> 
>> The analysis on this field shows that all words are present in the index and 
>> the query, the order is also correct, but the word "methods" in moved one 
>> position, I guess that's why the result is not found.
>> 
>> Best Regards,
>> Vadim Permakoff
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Shawn Heisey <apa...@elyograg.org>
>> Sent: Monday, June 29, 2020 6:28 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Query in quotes cannot find results
>> 
>> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>>> The basic query q=expand the methods   <<< finds the document,
>>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>>> 
>>> Am I doing something wrong, or is it known bug (I saw similar issues 
>>> discussed in the past, but not for exact match query) and if yes - what is 
>>> the Jira for it?
>> 
>> The most helpful information will come from running both queries with debug 
>> enabled, so you can see how the query is parsed.  If you add a parameter 
>> "debugQuery=true" to the URL, then the response should include the parsed 
>> query.  Compare those, and see if you can tell what the differences are.
>> 
>> One of the most common problems for queries like this is that you're not 
>> searching the field that you THINK you're searching.  I don't know whether 
>> this is the problem, I just mention it because it is a common error.
>> 
>> Thanks,
>> Shawn
>> 
>> ________________________________
>> 
>> This email is intended solely for the recipient. It may contain privileged, 
>> proprietary or confidential information or material. If you are not the 
>> intended recipient, please delete this email and any attachments and notify 
>> the sender of the error.
> 

Reply via email to