Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

Walter Underwood Fri, 08 Nov 2019 09:03:40 -0800

I always enable phrase searching in edismax for exactly this reason.

Something like:


       <str name="qf”>title^8 keywords^4 text</str>
       <str name="pf”>title^16 keywords^8 text^2</str>

To deal with concepts in queries, a classifier and/or named entity extractor 
can be helpful. If you have a list of concepts (“controlled vocabulary”) that 
includes “Lamin A”, and that shows up in a query, that term can be queried 
against the field matching that vocabulary.

This is how LinkedIn separates people, companies, and places, for example.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 8, 2019, at 10:48 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> Look at the “mm” parameter, try setting it to 100%. Although that’t not 
> entirely likely to do what you want either since virtually every doc will 
> have “a” in it. But at least you’d get docs that have both terms.
> 
> you may also be able to search for things like “Lamin A” _only as a phrase_ 
> and have some luck. But this is a gnarly problem in general. Some people have 
> been able to substitute synonyms and/or shingles to make this work at the 
> expense of a larger index.
> 
> This is a generic problem with context. “Lamin A” is really a “concept”, not 
> just two words that happen to be near each other. Searching as a phrase is an 
> OOB-but-naive way to try to make it more likely that the ranked results refer 
> to the _concept_ of “Lamin A”. The assumption here is “if these two words 
> appear next to each other, they’re more likely to be what I want”. I say 
> “naive” because “Lamins: A new approach to...” would _also_ be found for a 
> naive phrase search. (I have no idea whether such a title makes sense or not, 
> but you figured that out already)...
> 
> To do this well you’d have to dive in to NLP/Machine learning.
> 
> I truly wish we could have the DWIM search algorithm (Do What I Mean)….
> 
>> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri <gvit...@ebi.ac.uk> wrote:
>> 
>> HI Walter and Paras
>> 
>> I indexed it removing all the references to StopWordFilter and I went from 
>> 121 results to near 20K as the search term q="Lymphoid and a non-Lymphoid 
>> cell" is matching entities such as "IFT A" or  "Lamin A". So I don't think 
>> removing it completely is the way to go from the scenario we have, but I 
>> appreciate the suggestion…
>> 
>> Yes the response is using fl=*
>> I am trying some combinations at the moment, but yet no success.
>> 
>> defType=edismax
>> q.alt=Lymphoid and a non-Lymphoid cell
>> Number of results=1599
>> Quite a considerable increase, even though reasonable meaningful results. 
>> 
>> I am sorry but I didn't understand what do you want me to do exactly with 
>> the lst (??) and qf and bf.
>> 
>> Thanks everyone with their inputs
>> 
>> 
>>> On 8 Nov 2019, at 06:45, Paras Lehana <paras.leh...@indiamart.com> wrote:
>>> 
>>> Hi Guilherme
>>> 
>>> By accident, I ended up querying the using the default handler (/select) 
>>> and it worked. 
>>> 
>>> You've just found the culprit. Thanks for giving the material I requested. 
>>> Your analysis chain is working as expected. I don't see any issue in either 
>>> StopWordFilter or your boosts. I also use a boost of 50 when boosting 
>>> contextual suggestions (boosting "gold iphone" on a page of iphone) but I 
>>> take Walter's suggestion and would try to optimize my weights. I agree that 
>>> this 50 thing was not researched much about by us as well (we never faced 
>>> performance or relevance issues).  
>>> 
>>> See the major difference in both the handlers - edismax. I'm pretty sure 
>>> that your problem lies in the parsing of queries (you can confirm that from 
>>> parsedquery key in debug of both JSON responses). I hope you have provided 
>>> the response with fl=*. Replace q with q.alt in your /search handler query 
>>> and I think you should start getting responses. That's because q.alt uses 
>>> standard parser. If you want to keep using edisMax, I suggest you to test 
>>> the responses removing some combination of lst (qf, bf) and find what's 
>>> restricting the documents to come up. I'm out of office today - would have 
>>> certainly tried analyzing the field values of the document in /select 
>>> request and compare it with qf/bq in solrconfig.xml /search. Do this for me 
>>> and you'd certainly find something.  
>>> 
>>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood <wun...@wunderwood.org 
>>> <mailto:wun...@wunderwood.org>> wrote:
>>> I normally use a weight of 8 for the most important field, like title. 
>>> Other fields might get a 4 or 2.
>>> 
>>> I add a “pf” field with the weights doubled, so that phrase matches have a 
>>> higher weight.
>>> 
>>> The weight of 8 comes from experience at Infoseek and Inktomi, two early 
>>> web search engines. With different relevance algorithms and totally 
>>> different evaluation and tuning systems, they settled on weights of 8 and 
>>> 7.5 for HTML titles. With the the two radically different system getting 
>>> the same number, I decided that was a property of the documents, not of the 
>>> search engines.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my blog)
>>> 
>>>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gvit...@ebi.ac.uk 
>>>> <mailto:gvit...@ebi.ac.uk>> wrote:
>>>> 
>>>> Hi Wunder,
>>>> 
>>>> My indexer takes quite a few hours to be executed I am shortening it to 
>>>> run faster, but I also need to make sure it gives what we are expecting. 
>>>> This implementation's been there for >4y, and massively used.
>>>> 
>>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely high. 
>>>>> I don’t think I’ve ever used a weight higher than 16 in a dozen years of 
>>>>> configuring Solr.
>>>> I've inherited that implementation and I am really keen to adequate it, 
>>>> what would you recommend ?
>>>> 
>>>> Cheers
>>>> Guilherme
>>>> 
>>>>> On 7 Nov 2019, at 14:43, Walter Underwood <wun...@wunderwood.org 
>>>>> <mailto:wun...@wunderwood.org>> wrote:
>>>>> 
>>>>> Thanks for posting the files. Looking at schema.xml, I see that you still 
>>>>> are using StopFilterFactory. The first advice we gave you was to remove 
>>>>> that.
>>>>> 
>>>>> Remove StopFilterFactory everywhere and reindex.
>>>>> 
>>>>> You will continue to have problems matching stopwords until you do that.
>>>>> 
>>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely high. 
>>>>> I don’t think I’ve ever used a weight higher than 16 in a dozen years of 
>>>>> configuring Solr.
>>>>> 
>>>>> wunder
>>>>> Walter Underwood
>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
>>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my 
>>>>> blog)
>>>>> 
>>>>>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk 
>>>>>> <mailto:gvit...@ebi.ac.uk>> wrote:
>>>>>> 
>>>>>> Hi Paras, everyone
>>>>>> 
>>>>>> Thank you again for your inputs and suggestions. I sorry to hear you had 
>>>>>> trouble with the attachments I will host it somewhere and share the 
>>>>>> links. 
>>>>>> I don't tweak my index, I get the data from the graph database, create a 
>>>>>> document as they are and save to solr.
>>>>>> 
>>>>>> So, I am sending the new analysis screen querying the way you suggested. 
>>>>>> Also the results with params and solr query url.
>>>>>> 
>>>>>> During the process of querying what you asked I found something really 
>>>>>> weird (at least for me). By accident, I ended up querying the using the 
>>>>>> default handler (/select) and it worked. Then If I use the one I must 
>>>>>> use, then sadly doesn't work. I am posting both results and I will also 
>>>>>> post the handlers as well.
>>>>>> 
>>>>>> Here is the link with all the files mentioned before
>>>>>> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0<https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0>
>>>>>>  
>>>>>> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0<https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0>>
>>>>>> If the link doesn't work www dot dropbox dot com slash sh slash 
>>>>>> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>>> On 7 Nov 2019, at 05:23, Paras Lehana <paras.leh...@indiamart.com 
>>>>>>> <mailto:paras.leh...@indiamart.com>> wrote:
>>>>>>> 
>>>>>>> Hi Guilherme.
>>>>>>> 
>>>>>>> I am sending they analysis result and the json result as requested.
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks for the effort. Luckily, I can see your attachments (low quality
>>>>>>> though).
>>>>>>> 
>>>>>>> From the analysis screen, the analysis is working as expected. One of 
>>>>>>> the
>>>>>>> reasons for query="lymphoid and *a* non-lymphoid cell" not matching
>>>>>>> document containing "Lymphoid and a non-Lymphoid cell" I can initially
>>>>>>> think of is: the stopword "a" is probably present in post-analysis 
>>>>>>> either
>>>>>>> of query or index. Did you tweak your index time analysis after 
>>>>>>> indexing?
>>>>>>> 
>>>>>>> Do two things:
>>>>>>> 
>>>>>>> 1. Post the analysis screen for and index=*"Immunoregulatory
>>>>>>> interactions between a Lymphoid and a non-Lymphoid cell"* and
>>>>>>> "query=*"lymphoid
>>>>>>> and a non-lymphoid cell"*. Try hosting the image and providing the link
>>>>>>> here.
>>>>>>> 2. Give the same JSON output as you have sent but this time with
>>>>>>> *"echoParams=all"*. Also, post the exact Solr query url.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson <erickerick...@gmail.com 
>>>>>>> <mailto:erickerick...@gmail.com>> wrote:
>>>>>>> 
>>>>>>>> I don’t see the attachments, maybe I deleted old e-mails or some such. 
>>>>>>>> The
>>>>>>>> Apache server is fairly aggressive about stripping attachments though, 
>>>>>>>> so
>>>>>>>> it’s also possible they didn’t make it through.
>>>>>>>> 
>>>>>>>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <gvit...@ebi.ac.uk 
>>>>>>>>> <mailto:gvit...@ebi.ac.uk>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks Erick.
>>>>>>>>> 
>>>>>>>>>> First, your index and analysis chains are considerably different, 
>>>>>>>>>> this
>>>>>>>> can easily be a source of problems. In particular, using two different
>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against this 
>>>>>>>> unless
>>>>>>>> you’re totally sure you understand the consequences. Additionally, 
>>>>>>>> your use
>>>>>>>> of the length filter is suspicious, especially since your problem 
>>>>>>>> statement
>>>>>>>> is about the addition of a single letter term and the min length 
>>>>>>>> allowed on
>>>>>>>> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is
>>>>>>>> filtered out in both cases, but maybe you’ve found something odd about 
>>>>>>>> the
>>>>>>>> interactions.
>>>>>>>>> I will investigate the min length and post the results later.
>>>>>>>>> 
>>>>>>>>>> Second, I have no idea what this will do. Are the equal signs typos?
>>>>>>>> Used by custom code?
>>>>>>>>> This the url in my application, not solr params. That's the query 
>>>>>>>>> string.
>>>>>>>>> 
>>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely that
>>>>>>>> all the params with an equal-sign are totally ignored unless it’s just 
>>>>>>>> a
>>>>>>>> typo.
>>>>>>>>> This is part of the application. Species will be used later on in solr
>>>>>>>> to filter out the result. That's not solr. That my app params.
>>>>>>>>> 
>>>>>>>>>> Third, the easiest way to see what’s happening under the covers is to
>>>>>>>> add “&debug=true” to the query and look at the parsed query. Ignore 
>>>>>>>> all the
>>>>>>>> relevance calculations for the nonce, or specify “&debug=query” to skip
>>>>>>>> that part.
>>>>>>>>> The two json files i've sent, they are debugQuery=on and the explain 
>>>>>>>>> tag
>>>>>>>> is present.
>>>>>>>>> I will try the searching the way you mentioned.
>>>>>>>>> 
>>>>>>>>> Thank for your inputs
>>>>>>>>> 
>>>>>>>>> Guilherme
>>>>>>>>> 
>>>>>>>>>> On 6 Nov 2019, at 14:14, Erick Erickson <erickerick...@gmail.com 
>>>>>>>>>> <mailto:erickerick...@gmail.com>>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Fwd to another server
>>>>>>>>>> 
>>>>>>>>>> First, your index and analysis chains are considerably different, 
>>>>>>>>>> this
>>>>>>>> can easily be a source of problems. In particular, using two different
>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against this 
>>>>>>>> unless
>>>>>>>> you’re totally sure you understand the consequences. Additionally, 
>>>>>>>> your use
>>>>>>>> of the length filter is suspicious, especially since your problem 
>>>>>>>> statement
>>>>>>>> is about the addition of a single letter term and the min length 
>>>>>>>> allowed on
>>>>>>>> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is
>>>>>>>> filtered out in both cases, but maybe you’ve found something odd about 
>>>>>>>> the
>>>>>>>> interactions.
>>>>>>>>>> 
>>>>>>>>>> Second, I have no idea what this will do. Are the equal signs typos?
>>>>>>>> Used by custom code?
>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true<https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>> 
>>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely that
>>>>>>>> all the params with an equal-sign are totally ignored unless it’s just 
>>>>>>>> a
>>>>>>>> typo.
>>>>>>>>>> 
>>>>>>>>>> Third, the easiest way to see what’s happening under the covers is to
>>>>>>>> add “&debug=true” to the query and look at the parsed query. Ignore 
>>>>>>>> all the
>>>>>>>> relevance calculations for the nonce, or specify “&debug=query” to skip
>>>>>>>> that part.
>>>>>>>>>> 
>>>>>>>>>> 90% + of the time, the question “why didn’t this query do what I
>>>>>>>> expect” is answered by looking at the “&debug=query” output and the
>>>>>>>> analysis page in the admin UI. NOTE: for the analysis page be sure to 
>>>>>>>> look
>>>>>>>> at _both_ the query and index output. Also, and very important about 
>>>>>>>> the
>>>>>>>> analysis page (and this is confusing) is that this _assumes_ that what 
>>>>>>>> you
>>>>>>>> put in the text boxes have made it through the query parser intact and 
>>>>>>>> is
>>>>>>>> analyzed by the field selected. Consider the search "q=field:word1 
>>>>>>>> word2".
>>>>>>>> Now you type “word1 word2” into the analysis text box and it looks like
>>>>>>>> what you expect. That’s misleading because the query is _parsed_ as
>>>>>>>> "field:word1 default_search_field:word2”. This is where “&debug=query”
>>>>>>>> helps.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Erick
>>>>>>>>>> 
>>>>>>>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana 
>>>>>>>>>>> <paras.leh...@indiamart.com <mailto:paras.leh...@indiamart.com>>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Walter,
>>>>>>>>>>> 
>>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. Those 
>>>>>>>>>>> words
>>>>>>>> will
>>>>>>>>>>>> not be in the index, so they can never match a query.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I think the OP's concern is different results when adding a 
>>>>>>>>>>> stopword. I
>>>>>>>>>>> think he's using the filter factory correctly - the query chain
>>>>>>>> includes
>>>>>>>>>>> the filter as well so it should remove "a" while querying.
>>>>>>>>>>> 
>>>>>>>>>>> *@Guilherme*, please post results for both the query, the document 
>>>>>>>>>>> in
>>>>>>>>>>> result you are concerned about and post full result of analysis 
>>>>>>>>>>> screen
>>>>>>>> (for
>>>>>>>>>>> both query and index).
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood 
>>>>>>>>>>> <wun...@wunderwood.org <mailto:wun...@wunderwood.org>>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> No.
>>>>>>>>>>>> 
>>>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. Those 
>>>>>>>>>>>> words
>>>>>>>>>>>> will not be in the index, so they can never match a query.
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Remove the lines with solr.StopFilter from every analysis chain 
>>>>>>>>>>>> in
>>>>>>>>>>>> schema.xml.
>>>>>>>>>>>> 2. Reload the collection, restart Solr, or whatever to read the new
>>>>>>>> config.
>>>>>>>>>>>> 3. Reindex all of the documents.
>>>>>>>>>>>> 
>>>>>>>>>>>> When indexed with the new analysis chain, the stopwords will not be
>>>>>>>>>>>> removed and they will be searchable.
>>>>>>>>>>>> 
>>>>>>>>>>>> wunder
>>>>>>>>>>>> Walter Underwood
>>>>>>>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
>>>>>>>>>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  
>>>>>>>>>>>> (my blog)
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk 
>>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk>>
>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Ok. I am kind a lost now.
>>>>>>>>>>>>> If I open up the console > analysis and perform it, that's the 
>>>>>>>>>>>>> final
>>>>>>>>>>>> result.
>>>>>>>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Your suggestion is: get rid of the <filter stopword.txt> in the
>>>>>>>>>>>> schema.xml and during index phase replaceAll("in stopwords.txt"," 
>>>>>>>>>>>> ")
>>>>>>>> then
>>>>>>>>>>>> add to solr. Is that correct ?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks David
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 5 Nov 2019, at 14:48, David Hastings <
>>>>>>>> hastings.recurs...@gmail.com <mailto:hastings.recurs...@gmail.com>
>>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com 
>>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Fwd to another server
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> no,
>>>>>>>>>>>>>>      <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> is still using stopwords and should be removed, in my opinion of
>>>>>>>> course,
>>>>>>>>>>>>>> based on your use case may be different, but i generally axe any
>>>>>>>>>>>> reference
>>>>>>>>>>>>>> to them at all
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri 
>>>>>>>>>>>>>> <gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>
>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>> Haven't I done this here ?
>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField"
>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" >
>>>>>>>>>>>>>>>  <analyzer type="index">
>>>>>>>>>>>>>>>      <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>>>>>>>>>>>>      <filter class="solr.ClassicFilterFactory"/>
>>>>>>>>>>>>>>>      <filter class="solr.LengthFilterFactory" min="2"
>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>      <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>  </analyzer>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:15, David Hastings <
>>>>>>>> hastings.recurs...@gmail.com <mailto:hastings.recurs...@gmail.com>
>>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com 
>>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Fwd to another server
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The first thing you should do is remove any reference to stop
>>>>>>>> words
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> never use them, then re-index your data and try it again.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri <
>>>>>>>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>
>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I am performing a search to match a name (text_field), however
>>>>>>>> this
>>>>>>>>>>>> term
>>>>>>>>>>>>>>>>> contains 'and' and 'a' and it doesn't return any records. If i
>>>>>>>> remove
>>>>>>>>>>>>>>> 'a'
>>>>>>>>>>>>>>>>> then it works.
>>>>>>>>>>>>>>>>> e.g
>>>>>>>>>>>>>>>>> Search Term: lymphoid and a non-lymphoid cell
>>>>>>>>>>>>>>>>> doesn't work:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true<https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true<https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>>>>>>>  
>>>>>>>> <https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Search term: lymphoid and non-lymphoid cell
>>>>>>>>>>>>>>>>> works:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true<https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>>>>>>>  
>>>>>>>> <https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> interested in the first result
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> schema.xml
>>>>>>>>>>>>>>>>> <field name="name"                          type="text_field"
>>>>>>>>>>>>>>>>> indexed="true"  stored="true"   omitNorms="false"
>>>>>>>> required="true"
>>>>>>>>>>>>>>>>> multiValued="false"/>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>  <analyzer type="query">
>>>>>>>>>>>>>>>>>      <tokenizer class="solr.PatternTokenizerFactory"
>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/>
>>>>>>>>>>>>>>>>>      <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/>
>>>>>>>>>>>>>>>>>      <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/>
>>>>>>>>>>>>>>>>>      <filter class="solr.LengthFilterFactory" min="2"
>>>>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.StopFilterFactory"
>>>>>>>>>>>> ignoreCase="true"
>>>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>>>  </analyzer>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField"
>>>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" >
>>>>>>>>>>>>>>>>>  <analyzer type="index">
>>>>>>>>>>>>>>>>>      <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.ClassicFilterFactory"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.LengthFilterFactory" min="2"
>>>>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.StopFilterFactory"
>>>>>>>>>>>> ignoreCase="true"
>>>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>>>  </analyzer>
>>>>>>>>>>>>>>>>>  <analyzer type="query">
>>>>>>>>>>>>>>>>>      <tokenizer class="solr.PatternTokenizerFactory"
>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/>
>>>>>>>>>>>>>>>>>      <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/>
>>>>>>>>>>>>>>>>>      <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/>
>>>>>>>>>>>>>>>>>      <filter class="solr.LengthFilterFactory" min="2"
>>>>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.StopFilterFactory"
>>>>>>>>>>>> ignoreCase="true"
>>>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>>>  </analyzer>
>>>>>>>>>>>>>>>>> </fieldType>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> stopwords.txt
>>>>>>>>>>>>>>>>> #Standard english stop words taken from Lucene's StopAnalyzer
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> b
>>>>>>>>>>>>>>>>> c
>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Running SolR 6.6.2.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Is there anything I could do to prevent this ?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Guilherme
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> Regards,
>>>>>>>>>>> 
>>>>>>>>>>> *Paras Lehana* [65871]
>>>>>>>>>>> Development Engineer, Auto-Suggest,
>>>>>>>>>>> IndiaMART Intermesh Ltd.
>>>>>>>>>>> 
>>>>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
>>>>>>>>>>> Noida, UP, IN - 201303
>>>>>>>>>>> 
>>>>>>>>>>> Mob.: +91-9560911996
>>>>>>>>>>> Work: 01203916600 | Extn:  *8173*
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> IMPORTANT:
>>>>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> -- 
>>>>>>> Regards,
>>>>>>> 
>>>>>>> *Paras Lehana* [65871]
>>>>>>> Development Engineer, Auto-Suggest,
>>>>>>> IndiaMART Intermesh Ltd.
>>>>>>> 
>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
>>>>>>> Noida, UP, IN - 201303
>>>>>>> 
>>>>>>> Mob.: +91-9560911996
>>>>>>> Work: 01203916600 | Extn:  *8173*
>>>>>>> 
>>>>>>> -- 
>>>>>>> IMPORTANT: 
>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone.
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> -- 
>>> Regards,
>>> 
>>> Paras Lehana [65871]
>>> Development Engineer, Auto-Suggest,
>>> IndiaMART Intermesh Ltd.
>>> 
>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
>>> Noida, UP, IN - 201303
>>> 
>>> Mob.: +91-9560911996 <tel:+91-9560911996>
>>> Work: 01203916600 | Extn:  8173
>>> 
>>> IMPORTANT: 
>>> NEVER share your IndiaMART OTP/ Password with anyone.
>

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

Reply via email to