I always enable phrase searching in edismax for exactly this reason. Something like:
<str name="qf”>title^8 keywords^4 text</str> <str name="pf”>title^16 keywords^8 text^2</str> To deal with concepts in queries, a classifier and/or named entity extractor can be helpful. If you have a list of concepts (“controlled vocabulary”) that includes “Lamin A”, and that shows up in a query, that term can be queried against the field matching that vocabulary. This is how LinkedIn separates people, companies, and places, for example. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 8, 2019, at 10:48 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > Look at the “mm” parameter, try setting it to 100%. Although that’t not > entirely likely to do what you want either since virtually every doc will > have “a” in it. But at least you’d get docs that have both terms. > > you may also be able to search for things like “Lamin A” _only as a phrase_ > and have some luck. But this is a gnarly problem in general. Some people have > been able to substitute synonyms and/or shingles to make this work at the > expense of a larger index. > > This is a generic problem with context. “Lamin A” is really a “concept”, not > just two words that happen to be near each other. Searching as a phrase is an > OOB-but-naive way to try to make it more likely that the ranked results refer > to the _concept_ of “Lamin A”. The assumption here is “if these two words > appear next to each other, they’re more likely to be what I want”. I say > “naive” because “Lamins: A new approach to...” would _also_ be found for a > naive phrase search. (I have no idea whether such a title makes sense or not, > but you figured that out already)... > > To do this well you’d have to dive in to NLP/Machine learning. > > I truly wish we could have the DWIM search algorithm (Do What I Mean)…. > >> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri <gvit...@ebi.ac.uk> wrote: >> >> HI Walter and Paras >> >> I indexed it removing all the references to StopWordFilter and I went from >> 121 results to near 20K as the search term q="Lymphoid and a non-Lymphoid >> cell" is matching entities such as "IFT A" or "Lamin A". So I don't think >> removing it completely is the way to go from the scenario we have, but I >> appreciate the suggestion… >> >> Yes the response is using fl=* >> I am trying some combinations at the moment, but yet no success. >> >> defType=edismax >> q.alt=Lymphoid and a non-Lymphoid cell >> Number of results=1599 >> Quite a considerable increase, even though reasonable meaningful results. >> >> I am sorry but I didn't understand what do you want me to do exactly with >> the lst (??) and qf and bf. >> >> Thanks everyone with their inputs >> >> >>> On 8 Nov 2019, at 06:45, Paras Lehana <paras.leh...@indiamart.com> wrote: >>> >>> Hi Guilherme >>> >>> By accident, I ended up querying the using the default handler (/select) >>> and it worked. >>> >>> You've just found the culprit. Thanks for giving the material I requested. >>> Your analysis chain is working as expected. I don't see any issue in either >>> StopWordFilter or your boosts. I also use a boost of 50 when boosting >>> contextual suggestions (boosting "gold iphone" on a page of iphone) but I >>> take Walter's suggestion and would try to optimize my weights. I agree that >>> this 50 thing was not researched much about by us as well (we never faced >>> performance or relevance issues). >>> >>> See the major difference in both the handlers - edismax. I'm pretty sure >>> that your problem lies in the parsing of queries (you can confirm that from >>> parsedquery key in debug of both JSON responses). I hope you have provided >>> the response with fl=*. Replace q with q.alt in your /search handler query >>> and I think you should start getting responses. That's because q.alt uses >>> standard parser. If you want to keep using edisMax, I suggest you to test >>> the responses removing some combination of lst (qf, bf) and find what's >>> restricting the documents to come up. I'm out of office today - would have >>> certainly tried analyzing the field values of the document in /select >>> request and compare it with qf/bq in solrconfig.xml /search. Do this for me >>> and you'd certainly find something. >>> >>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood <wun...@wunderwood.org >>> <mailto:wun...@wunderwood.org>> wrote: >>> I normally use a weight of 8 for the most important field, like title. >>> Other fields might get a 4 or 2. >>> >>> I add a “pf” field with the weights doubled, so that phrase matches have a >>> higher weight. >>> >>> The weight of 8 comes from experience at Infoseek and Inktomi, two early >>> web search engines. With different relevance algorithms and totally >>> different evaluation and tuning systems, they settled on weights of 8 and >>> 7.5 for HTML titles. With the the two radically different system getting >>> the same number, I decided that was a property of the documents, not of the >>> search engines. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> (my blog) >>> >>>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gvit...@ebi.ac.uk >>>> <mailto:gvit...@ebi.ac.uk>> wrote: >>>> >>>> Hi Wunder, >>>> >>>> My indexer takes quite a few hours to be executed I am shortening it to >>>> run faster, but I also need to make sure it gives what we are expecting. >>>> This implementation's been there for >4y, and massively used. >>>> >>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely high. >>>>> I don’t think I’ve ever used a weight higher than 16 in a dozen years of >>>>> configuring Solr. >>>> I've inherited that implementation and I am really keen to adequate it, >>>> what would you recommend ? >>>> >>>> Cheers >>>> Guilherme >>>> >>>>> On 7 Nov 2019, at 14:43, Walter Underwood <wun...@wunderwood.org >>>>> <mailto:wun...@wunderwood.org>> wrote: >>>>> >>>>> Thanks for posting the files. Looking at schema.xml, I see that you still >>>>> are using StopFilterFactory. The first advice we gave you was to remove >>>>> that. >>>>> >>>>> Remove StopFilterFactory everywhere and reindex. >>>>> >>>>> You will continue to have problems matching stopwords until you do that. >>>>> >>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely high. >>>>> I don’t think I’ve ever used a weight higher than 16 in a dozen years of >>>>> configuring Solr. >>>>> >>>>> wunder >>>>> Walter Underwood >>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> (my >>>>> blog) >>>>> >>>>>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk >>>>>> <mailto:gvit...@ebi.ac.uk>> wrote: >>>>>> >>>>>> Hi Paras, everyone >>>>>> >>>>>> Thank you again for your inputs and suggestions. I sorry to hear you had >>>>>> trouble with the attachments I will host it somewhere and share the >>>>>> links. >>>>>> I don't tweak my index, I get the data from the graph database, create a >>>>>> document as they are and save to solr. >>>>>> >>>>>> So, I am sending the new analysis screen querying the way you suggested. >>>>>> Also the results with params and solr query url. >>>>>> >>>>>> During the process of querying what you asked I found something really >>>>>> weird (at least for me). By accident, I ended up querying the using the >>>>>> default handler (/select) and it worked. Then If I use the one I must >>>>>> use, then sadly doesn't work. I am posting both results and I will also >>>>>> post the handlers as well. >>>>>> >>>>>> Here is the link with all the files mentioned before >>>>>> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0<https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0> >>>>>> >>>>>> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0<https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0>> >>>>>> If the link doesn't work www dot dropbox dot com slash sh slash >>>>>> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0 >>>>>> >>>>>> Thanks >>>>>> >>>>>>> On 7 Nov 2019, at 05:23, Paras Lehana <paras.leh...@indiamart.com >>>>>>> <mailto:paras.leh...@indiamart.com>> wrote: >>>>>>> >>>>>>> Hi Guilherme. >>>>>>> >>>>>>> I am sending they analysis result and the json result as requested. >>>>>>> >>>>>>> >>>>>>> Thanks for the effort. Luckily, I can see your attachments (low quality >>>>>>> though). >>>>>>> >>>>>>> From the analysis screen, the analysis is working as expected. One of >>>>>>> the >>>>>>> reasons for query="lymphoid and *a* non-lymphoid cell" not matching >>>>>>> document containing "Lymphoid and a non-Lymphoid cell" I can initially >>>>>>> think of is: the stopword "a" is probably present in post-analysis >>>>>>> either >>>>>>> of query or index. Did you tweak your index time analysis after >>>>>>> indexing? >>>>>>> >>>>>>> Do two things: >>>>>>> >>>>>>> 1. Post the analysis screen for and index=*"Immunoregulatory >>>>>>> interactions between a Lymphoid and a non-Lymphoid cell"* and >>>>>>> "query=*"lymphoid >>>>>>> and a non-lymphoid cell"*. Try hosting the image and providing the link >>>>>>> here. >>>>>>> 2. Give the same JSON output as you have sent but this time with >>>>>>> *"echoParams=all"*. Also, post the exact Solr query url. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson <erickerick...@gmail.com >>>>>>> <mailto:erickerick...@gmail.com>> wrote: >>>>>>> >>>>>>>> I don’t see the attachments, maybe I deleted old e-mails or some such. >>>>>>>> The >>>>>>>> Apache server is fairly aggressive about stripping attachments though, >>>>>>>> so >>>>>>>> it’s also possible they didn’t make it through. >>>>>>>> >>>>>>>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <gvit...@ebi.ac.uk >>>>>>>>> <mailto:gvit...@ebi.ac.uk>> wrote: >>>>>>>>> >>>>>>>>> Thanks Erick. >>>>>>>>> >>>>>>>>>> First, your index and analysis chains are considerably different, >>>>>>>>>> this >>>>>>>> can easily be a source of problems. In particular, using two different >>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against this >>>>>>>> unless >>>>>>>> you’re totally sure you understand the consequences. Additionally, >>>>>>>> your use >>>>>>>> of the length filter is suspicious, especially since your problem >>>>>>>> statement >>>>>>>> is about the addition of a single letter term and the min length >>>>>>>> allowed on >>>>>>>> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is >>>>>>>> filtered out in both cases, but maybe you’ve found something odd about >>>>>>>> the >>>>>>>> interactions. >>>>>>>>> I will investigate the min length and post the results later. >>>>>>>>> >>>>>>>>>> Second, I have no idea what this will do. Are the equal signs typos? >>>>>>>> Used by custom code? >>>>>>>>> This the url in my application, not solr params. That's the query >>>>>>>>> string. >>>>>>>>> >>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely that >>>>>>>> all the params with an equal-sign are totally ignored unless it’s just >>>>>>>> a >>>>>>>> typo. >>>>>>>>> This is part of the application. Species will be used later on in solr >>>>>>>> to filter out the result. That's not solr. That my app params. >>>>>>>>> >>>>>>>>>> Third, the easiest way to see what’s happening under the covers is to >>>>>>>> add “&debug=true” to the query and look at the parsed query. Ignore >>>>>>>> all the >>>>>>>> relevance calculations for the nonce, or specify “&debug=query” to skip >>>>>>>> that part. >>>>>>>>> The two json files i've sent, they are debugQuery=on and the explain >>>>>>>>> tag >>>>>>>> is present. >>>>>>>>> I will try the searching the way you mentioned. >>>>>>>>> >>>>>>>>> Thank for your inputs >>>>>>>>> >>>>>>>>> Guilherme >>>>>>>>> >>>>>>>>>> On 6 Nov 2019, at 14:14, Erick Erickson <erickerick...@gmail.com >>>>>>>>>> <mailto:erickerick...@gmail.com>> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Fwd to another server >>>>>>>>>> >>>>>>>>>> First, your index and analysis chains are considerably different, >>>>>>>>>> this >>>>>>>> can easily be a source of problems. In particular, using two different >>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against this >>>>>>>> unless >>>>>>>> you’re totally sure you understand the consequences. Additionally, >>>>>>>> your use >>>>>>>> of the length filter is suspicious, especially since your problem >>>>>>>> statement >>>>>>>> is about the addition of a single letter term and the min length >>>>>>>> allowed on >>>>>>>> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is >>>>>>>> filtered out in both cases, but maybe you’ve found something odd about >>>>>>>> the >>>>>>>> interactions. >>>>>>>>>> >>>>>>>>>> Second, I have no idea what this will do. Are the equal signs typos? >>>>>>>> Used by custom code? >>>>>>>>>> >>>>>>>>>>>> >>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true<https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true> >>>>>>>>>> >>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely that >>>>>>>> all the params with an equal-sign are totally ignored unless it’s just >>>>>>>> a >>>>>>>> typo. >>>>>>>>>> >>>>>>>>>> Third, the easiest way to see what’s happening under the covers is to >>>>>>>> add “&debug=true” to the query and look at the parsed query. Ignore >>>>>>>> all the >>>>>>>> relevance calculations for the nonce, or specify “&debug=query” to skip >>>>>>>> that part. >>>>>>>>>> >>>>>>>>>> 90% + of the time, the question “why didn’t this query do what I >>>>>>>> expect” is answered by looking at the “&debug=query” output and the >>>>>>>> analysis page in the admin UI. NOTE: for the analysis page be sure to >>>>>>>> look >>>>>>>> at _both_ the query and index output. Also, and very important about >>>>>>>> the >>>>>>>> analysis page (and this is confusing) is that this _assumes_ that what >>>>>>>> you >>>>>>>> put in the text boxes have made it through the query parser intact and >>>>>>>> is >>>>>>>> analyzed by the field selected. Consider the search "q=field:word1 >>>>>>>> word2". >>>>>>>> Now you type “word1 word2” into the analysis text box and it looks like >>>>>>>> what you expect. That’s misleading because the query is _parsed_ as >>>>>>>> "field:word1 default_search_field:word2”. This is where “&debug=query” >>>>>>>> helps. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Erick >>>>>>>>>> >>>>>>>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana >>>>>>>>>>> <paras.leh...@indiamart.com <mailto:paras.leh...@indiamart.com>> >>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Walter, >>>>>>>>>>> >>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. Those >>>>>>>>>>> words >>>>>>>> will >>>>>>>>>>>> not be in the index, so they can never match a query. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think the OP's concern is different results when adding a >>>>>>>>>>> stopword. I >>>>>>>>>>> think he's using the filter factory correctly - the query chain >>>>>>>> includes >>>>>>>>>>> the filter as well so it should remove "a" while querying. >>>>>>>>>>> >>>>>>>>>>> *@Guilherme*, please post results for both the query, the document >>>>>>>>>>> in >>>>>>>>>>> result you are concerned about and post full result of analysis >>>>>>>>>>> screen >>>>>>>> (for >>>>>>>>>>> both query and index). >>>>>>>>>>> >>>>>>>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood >>>>>>>>>>> <wun...@wunderwood.org <mailto:wun...@wunderwood.org>> >>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> No. >>>>>>>>>>>> >>>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. Those >>>>>>>>>>>> words >>>>>>>>>>>> will not be in the index, so they can never match a query. >>>>>>>>>>>> >>>>>>>>>>>> 1. Remove the lines with solr.StopFilter from every analysis chain >>>>>>>>>>>> in >>>>>>>>>>>> schema.xml. >>>>>>>>>>>> 2. Reload the collection, restart Solr, or whatever to read the new >>>>>>>> config. >>>>>>>>>>>> 3. Reindex all of the documents. >>>>>>>>>>>> >>>>>>>>>>>> When indexed with the new analysis chain, the stopwords will not be >>>>>>>>>>>> removed and they will be searchable. >>>>>>>>>>>> >>>>>>>>>>>> wunder >>>>>>>>>>>> Walter Underwood >>>>>>>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>>>>>>>>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> >>>>>>>>>>>> (my blog) >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk >>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk>> >>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Ok. I am kind a lost now. >>>>>>>>>>>>> If I open up the console > analysis and perform it, that's the >>>>>>>>>>>>> final >>>>>>>>>>>> result. >>>>>>>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png> >>>>>>>>>>>>> >>>>>>>>>>>>> Your suggestion is: get rid of the <filter stopword.txt> in the >>>>>>>>>>>> schema.xml and during index phase replaceAll("in stopwords.txt"," >>>>>>>>>>>> ") >>>>>>>> then >>>>>>>>>>>> add to solr. Is that correct ? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks David >>>>>>>>>>>>> >>>>>>>>>>>>>> On 5 Nov 2019, at 14:48, David Hastings < >>>>>>>> hastings.recurs...@gmail.com <mailto:hastings.recurs...@gmail.com> >>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com >>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Fwd to another server >>>>>>>>>>>>>> >>>>>>>>>>>>>> no, >>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>> >>>>>>>>>>>>>> is still using stopwords and should be removed, in my opinion of >>>>>>>> course, >>>>>>>>>>>>>> based on your use case may be different, but i generally axe any >>>>>>>>>>>> reference >>>>>>>>>>>>>> to them at all >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri >>>>>>>>>>>>>> <gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> >>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>> Haven't I done this here ? >>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" >>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > >>>>>>>>>>>>>>> <analyzer type="index"> >>>>>>>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>>>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> >>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>>>>> max="20"/> >>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>> </analyzer> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:15, David Hastings < >>>>>>>> hastings.recurs...@gmail.com <mailto:hastings.recurs...@gmail.com> >>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com >>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Fwd to another server >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The first thing you should do is remove any reference to stop >>>>>>>> words >>>>>>>>>>>> and >>>>>>>>>>>>>>>> never use them, then re-index your data and try it again. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri < >>>>>>>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> >>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am performing a search to match a name (text_field), however >>>>>>>> this >>>>>>>>>>>> term >>>>>>>>>>>>>>>>> contains 'and' and 'a' and it doesn't return any records. If i >>>>>>>> remove >>>>>>>>>>>>>>> 'a' >>>>>>>>>>>>>>>>> then it works. >>>>>>>>>>>>>>>>> e.g >>>>>>>>>>>>>>>>> Search Term: lymphoid and a non-lymphoid cell >>>>>>>>>>>>>>>>> doesn't work: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true<https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true> >>>>>>>>>>>> < >>>>>>>>>>>> >>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true<https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>>>> >>>>>>>> <https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Search term: lymphoid and non-lymphoid cell >>>>>>>>>>>>>>>>> works: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true<https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true> >>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>>>> >>>>>>>> <https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> interested in the first result >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> schema.xml >>>>>>>>>>>>>>>>> <field name="name" type="text_field" >>>>>>>>>>>>>>>>> indexed="true" stored="true" omitNorms="false" >>>>>>>> required="true" >>>>>>>>>>>>>>>>> multiValued="false"/> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <analyzer type="query"> >>>>>>>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" >>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> >>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> >>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> >>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/> >>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>>>>>>>> max="20"/> >>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>>>>>>>>> ignoreCase="true" >>>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>>> </analyzer> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" >>>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > >>>>>>>>>>>>>>>>> <analyzer type="index"> >>>>>>>>>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>>>>>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> >>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>>>>>>>> max="20"/> >>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>>>>>>>>> ignoreCase="true" >>>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>>> </analyzer> >>>>>>>>>>>>>>>>> <analyzer type="query"> >>>>>>>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" >>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> >>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> >>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> >>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/> >>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>>>>>>>> max="20"/> >>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>>>>>>>>> ignoreCase="true" >>>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>>> </analyzer> >>>>>>>>>>>>>>>>> </fieldType> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> stopwords.txt >>>>>>>>>>>>>>>>> #Standard english stop words taken from Lucene's StopAnalyzer >>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>> b >>>>>>>>>>>>>>>>> c >>>>>>>>>>>>>>>>> .... >>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Running SolR 6.6.2. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is there anything I could do to prevent this ? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>> Guilherme >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> -- >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> *Paras Lehana* [65871] >>>>>>>>>>> Development Engineer, Auto-Suggest, >>>>>>>>>>> IndiaMART Intermesh Ltd. >>>>>>>>>>> >>>>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>>>>>>>>>> Noida, UP, IN - 201303 >>>>>>>>>>> >>>>>>>>>>> Mob.: +91-9560911996 >>>>>>>>>>> Work: 01203916600 | Extn: *8173* >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> IMPORTANT: >>>>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> Regards, >>>>>>> >>>>>>> *Paras Lehana* [65871] >>>>>>> Development Engineer, Auto-Suggest, >>>>>>> IndiaMART Intermesh Ltd. >>>>>>> >>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>>>>>> Noida, UP, IN - 201303 >>>>>>> >>>>>>> Mob.: +91-9560911996 >>>>>>> Work: 01203916600 | Extn: *8173* >>>>>>> >>>>>>> -- >>>>>>> IMPORTANT: >>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. >>>>>> >>>>> >>>> >>> >>> >>> >>> -- >>> -- >>> Regards, >>> >>> Paras Lehana [65871] >>> Development Engineer, Auto-Suggest, >>> IndiaMART Intermesh Ltd. >>> >>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>> Noida, UP, IN - 201303 >>> >>> Mob.: +91-9560911996 <tel:+91-9560911996> >>> Work: 01203916600 | Extn: 8173 >>> >>> IMPORTANT: >>> NEVER share your IndiaMART OTP/ Password with anyone. >