Hi Erik,
That's what I did in the past, but this is an enterprise search and I have a 
requirement to remove the stopwords.
To have both features I can add synonyms in the front-end application, I know 
it will work, but I need a justification why I have to do it in the application 
as it is an additional effort.
I thought there is a bug for such case to which I can refer, because according 
to documentation it should work, right?
Anyway, there is more to it. If I'll add the same synonym processing to the 
indexing part, i.e. the configuration will be like this:

    <fieldType name="text_test" class="solr.TextField" 
positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

The analysis shows the parsing is matching now for indexing and querying path, 
but the exact match result still cannot be found! This is weird.
Any thoughts?

Best Regards,
Vadim Permakoff


-----Original Message-----
From: Erick Erickson <erickerick...@gmail.com> 
Sent: Monday, June 29, 2020 10:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Query in quotes cannot find results

Looks like you’re removing stopwords. Stopwords cause issues like this with the 
positions being off.

It’s becoming more and more common to _NOT_ remove stopwords, is that an option?



Best,
Erick

> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim <vadim.permak...@verisk.com> 
> wrote:
> 
> Hi Shawn,
> Many thanks for the response, I checked the field and it is correct. Let's 
> call it _text_ to make it easier.
> I believe the parsing is also correct, please see below:
> - Query without quotes (works):
>    "querystring":"expand the methods",
>    "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) 
> _text_:methods",
> 
> - Query with quotes (does not work):
>    "querystring":"\"expand the methods\"",
>    "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, 
> _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
> 
> The document has text:
> "to expand the methods for mailing cancellation"
> 
> The analysis on this field shows that all words are present in the index and 
> the query, the order is also correct, but the word "methods" in moved one 
> position, I guess that's why the result is not found.
> 
> Best Regards,
> Vadim Permakoff
> 
> 
> 
> 
> -----Original Message-----
> From: Shawn Heisey <apa...@elyograg.org>
> Sent: Monday, June 29, 2020 6:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query in quotes cannot find results
> 
> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>> The basic query q=expand the methods   <<< finds the document,
>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>> 
>> Am I doing something wrong, or is it known bug (I saw similar issues 
>> discussed in the past, but not for exact match query) and if yes - what is 
>> the Jira for it?
> 
> The most helpful information will come from running both queries with debug 
> enabled, so you can see how the query is parsed.  If you add a parameter 
> "debugQuery=true" to the URL, then the response should include the parsed 
> query.  Compare those, and see if you can tell what the differences are.
> 
> One of the most common problems for queries like this is that you're not 
> searching the field that you THINK you're searching.  I don't know whether 
> this is the problem, I just mention it because it is a common error.
> 
> Thanks,
> Shawn
> 
> ________________________________
> 
> This email is intended solely for the recipient. It may contain privileged, 
> proprietary or confidential information or material. If you are not the 
> intended recipient, please delete this email and any attachments and notify 
> the sender of the error.

Reply via email to