Hi Guilherme, Have you tried reindexing the documents and compare the results? No issues if you cannot do that - let's try something else. I was going through the whole mail and your files. You had said:
As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I > don't get anything (which make sense). Why did you think that not getting anything when you add dbId made sense? Asking because I may be missing something here. Also, what is the purpose of so many qf's? Going through your documents and config files, I found that your dbId's are string of numbers and I don't think you want to find your query terms in dbId, right? Do you want to boost the score by the values in dbId? Your qf of dbId^100 boosts documents containing terms in q by 100x. Since your terms don't match with the values in dbId for any document, the score produced by this scoring is 0. 100x or 1x of 0 is still 0. I still need to see how this scoring gets added up in edismax parser but do reevaluate the usage of these qfs. Same goes for other qf boosts. :) On Fri, 15 Nov 2019 at 12:23, Guilherme Viteri <gvit...@ebi.ac.uk> wrote: > Hi Paras > No worries. > No I didn’t find anything. This is annoying now... > Yes! They do contain dbId. Absolutely all my docs contains dbId and it is > actually my key, if you check again the schema.xml > > Cheers > Guilherme > > On 15 Nov 2019, at 05:37, Paras Lehana <paras.leh...@indiamart.com> wrote: > > > Hey Guilherme, > > I was a bit busy for the past few days and couldn't read your mail. So, > did you find anything? Anyways, as I had expected, the culprit is > definitely among the qfs. Do the documents in concern contain dbId? I > suggest you to cross check the fields in your document with those impacting > the result in qf. > > On Tue, 12 Nov 2019 at 16:14, Guilherme Viteri <gvit...@ebi.ac.uk> wrote: > >> What I can't understand is: >> I search for the exact term - "Immunoregulatory interactions between a >> Lymphoid *and a* non-Lymphoid cell" and If i search "I search for the >> exact term - Immunoregulatory interactions between a Lymphoid *and >> *non-Lymphoid >> cell" then it works >> >> On 11 Nov 2019, at 12:24, Guilherme Viteri <gvit...@ebi.ac.uk> wrote: >> >> Thanks >> >> Removing stopwords is another story. I'm curious to find the reason >> assuming that you keep on using stopwords. In some cases, stopwords are >> really necessary. >> >> Yes. It always make sense the way we've been using. >> >> If q.alt is giving you responses, it's confirmed that your stopwords >> filter >> is working as expected. The problem definitely lies in the configuration >> of >> edismax. >> >> I see. >> >> *Let me explain again:* In your solrconfig.xml, look at your /search >> >> Ok, using q now, removed all qf, performed the search and I got 23 >> results, and the one I really want, on the top. >> As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then >> I don't get anything (which make sense). However if I query name_exact, I >> get the 23 results again, and unfortunately if I query stId^1.0 >> name_exact^10.0 I still don't get any results. >> >> In summary >> - without qf - 23 results >> - dbId - 0 results >> - name_exact - 16 results >> - name - 23 results >> - dbId^1.0 >> name_exact^10.0 - 0 results >> - 0 results if any other, stId, dbId (key) is added on top of the >> name(name_exact, etc). >> >> Definitely lost here! :-/ >> >> >> On 11 Nov 2019, at 07:59, Paras Lehana <paras.leh...@indiamart.com> >> wrote: >> >> Hi >> >> So I don't think removing it completely is the way to go from the scenario >> >> we have >> >> >> >> Removing stopwords is another story. I'm curious to find the reason >> assuming that you keep on using stopwords. In some cases, stopwords are >> really necessary. >> >> >> Quite a considerable increase >> >> >> If q.alt is giving you responses, it's confirmed that your stopwords >> filter >> is working as expected. The problem definitely lies in the configuration >> of >> edismax. >> >> >> >> I am sorry but I didn't understand what do you want me to do exactly with >> the lst (??) and qf and bf. >> >> >> >> What combinations did you try? I was referring to the field-level boosting >> you have applied in edismax config. >> >> *Let me explain again:* In your solrconfig.xml, look at your /search >> request handler. There are many qf and some bq boosts. I want you to >> remove >> all of these, check response again (with q now) and keep on adding them >> again (one by one) while looking for when the numFound drastically >> changes. >> >> On Fri, 8 Nov 2019 at 23:47, David Hastings <hastings.recurs...@gmail.com >> > >> wrote: >> >> I use 3 word shingles with stopwords for my MLT ML trainer that worked >> pretty well for such a solution, but for a full index the size became >> prohibitive >> >> On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood <wun...@wunderwood.org> >> wrote: >> >> If we had IDF for phrases, they would be super effective. The 2X weight >> >> is >> >> a hack that mostly works. >> >> Infoseek had phrase IDF and it was a killer algorithm for relevance. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> On Nov 8, 2019, at 11:08 AM, David Hastings < >> >> hastings.recurs...@gmail.com> wrote: >> >> >> the pf and qf fields are REALLY nice for this >> >> On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood < >> >> wun...@wunderwood.org> >> >> wrote: >> >> I always enable phrase searching in edismax for exactly this reason. >> >> Something like: >> >> <str name="qf”>title^8 keywords^4 text</str> >> <str name="pf”>title^16 keywords^8 text^2</str> >> >> To deal with concepts in queries, a classifier and/or named entity >> extractor can be helpful. If you have a list of concepts (“controlled >> vocabulary”) that includes “Lamin A”, and that shows up in a query, >> >> that >> >> term can be queried against the field matching that vocabulary. >> >> This is how LinkedIn separates people, companies, and places, for >> >> example. >> >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> On Nov 8, 2019, at 10:48 AM, Erick Erickson <erickerick...@gmail.com >> >> >> wrote: >> >> >> Look at the “mm” parameter, try setting it to 100%. Although that’t >> >> not >> >> entirely likely to do what you want either since virtually every doc >> >> will >> >> have “a” in it. But at least you’d get docs that have both terms. >> >> >> you may also be able to search for things like “Lamin A” _only as a >> >> phrase_ and have some luck. But this is a gnarly problem in general. >> >> Some >> >> people have been able to substitute synonyms and/or shingles to make >> >> this >> >> work at the expense of a larger index. >> >> >> This is a generic problem with context. “Lamin A” is really a >> >> “concept”, >> >> not just two words that happen to be near each other. Searching as a >> >> phrase >> >> is an OOB-but-naive way to try to make it more likely that the ranked >> results refer to the _concept_ of “Lamin A”. The assumption here is >> >> “if >> >> these two words appear next to each other, they’re more likely to be >> >> what I >> >> want”. I say “naive” because “Lamins: A new approach to...” would >> >> _also_ be >> >> found for a naive phrase search. (I have no idea whether such a title >> >> makes >> >> sense or not, but you figured that out already)... >> >> >> To do this well you’d have to dive in to NLP/Machine learning. >> >> I truly wish we could have the DWIM search algorithm (Do What I >> >> Mean)…. >> >> >> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri <gvit...@ebi.ac.uk> >> >> wrote: >> >> >> HI Walter and Paras >> >> I indexed it removing all the references to StopWordFilter and I >> >> went >> >> from 121 results to near 20K as the search term q="Lymphoid and a >> non-Lymphoid cell" is matching entities such as "IFT A" or "Lamin A". >> >> So I >> >> don't think removing it completely is the way to go from the scenario >> >> we >> >> have, but I appreciate the suggestion… >> >> >> Yes the response is using fl=* >> I am trying some combinations at the moment, but yet no success. >> >> defType=edismax >> q.alt=Lymphoid and a non-Lymphoid cell >> Number of results=1599 >> Quite a considerable increase, even though reasonable meaningful >> >> results. >> >> >> I am sorry but I didn't understand what do you want me to do exactly >> >> with the lst (??) and qf and bf. >> >> >> Thanks everyone with their inputs >> >> >> On 8 Nov 2019, at 06:45, Paras Lehana <paras.leh...@indiamart.com> >> >> wrote: >> >> >> Hi Guilherme >> >> By accident, I ended up querying the using the default handler >> >> (/select) and it worked. >> >> >> You've just found the culprit. Thanks for giving the material I >> >> requested. Your analysis chain is working as expected. I don't see any >> issue in either StopWordFilter or your boosts. I also use a boost of >> >> 50 >> >> when boosting contextual suggestions (boosting "gold iphone" on a page >> >> of >> >> iphone) but I take Walter's suggestion and would try to optimize my >> weights. I agree that this 50 thing was not researched much about by >> >> us >> >> as >> >> well (we never faced performance or relevance issues). >> >> >> See the major difference in both the handlers - edismax. I'm pretty >> >> sure that your problem lies in the parsing of queries (you can confirm >> >> that >> >> from parsedquery key in debug of both JSON responses). I hope you have >> provided the response with fl=*. Replace q with q.alt in your /search >> handler query and I think you should start getting responses. That's >> because q.alt uses standard parser. If you want to keep using >> >> edisMax, I >> >> suggest you to test the responses removing some combination of lst >> >> (qf, >> >> bf) >> >> and find what's restricting the documents to come up. I'm out of >> >> office >> >> today - would have certainly tried analyzing the field values of the >> document in /select request and compare it with qf/bq in >> >> solrconfig.xml >> >> /search. Do this for me and you'd certainly find something. >> >> >> On Thu, 7 Nov 2019 at 21:00, Walter Underwood < >> >> wun...@wunderwood.org >> >> <mailto:wun...@wunderwood.org>> wrote: >> >> I normally use a weight of 8 for the most important field, like >> >> title. >> >> Other fields might get a 4 or 2. >> >> >> I add a “pf” field with the weights doubled, so that phrase matches >> >> have a higher weight. >> >> >> The weight of 8 comes from experience at Infoseek and Inktomi, two >> >> early web search engines. With different relevance algorithms and >> >> totally >> >> different evaluation and tuning systems, they settled on weights of 8 >> >> and >> >> 7.5 for HTML titles. With the the two radically different system >> >> getting >> >> the same number, I decided that was a property of the documents, not >> >> of >> >> the >> >> search engines. >> >> >> wunder >> Walter Underwood >> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> >> >> (my blog) >> >> >> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gvit...@ebi.ac.uk >> >> <mailto:gvit...@ebi.ac.uk>> wrote: >> >> >> Hi Wunder, >> >> My indexer takes quite a few hours to be executed I am shortening >> >> it >> >> to run faster, but I also need to make sure it gives what we are >> >> expecting. >> >> This implementation's been there for >4y, and massively used. >> >> >> In your edismax handlers, weights of 20, 50, and 100 are >> >> extremely >> >> high. I don’t think I’ve ever used a weight higher than 16 in a dozen >> >> years >> >> of configuring Solr. >> >> I've inherited that implementation and I am really keen to >> >> adequate >> >> it, what would you recommend ? >> >> >> Cheers >> Guilherme >> >> On 7 Nov 2019, at 14:43, Walter Underwood <wun...@wunderwood.org >> >> <mailto:wun...@wunderwood.org>> wrote: >> >> >> Thanks for posting the files. Looking at schema.xml, I see that >> >> you >> >> still are using StopFilterFactory. The first advice we gave you was to >> remove that. >> >> >> Remove StopFilterFactory everywhere and reindex. >> >> You will continue to have problems matching stopwords until you >> >> do >> >> that. >> >> >> In your edismax handlers, weights of 20, 50, and 100 are >> >> extremely >> >> high. I don’t think I’ve ever used a weight higher than 16 in a dozen >> >> years >> >> of configuring Solr. >> >> >> wunder >> Walter Underwood >> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >> http://observer.wunderwood.org/ <http://observer.wunderwood.org/ >> >> >> (my blog) >> >> >> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk >> >> <mailto:gvit...@ebi.ac.uk>> wrote: >> >> >> Hi Paras, everyone >> >> Thank you again for your inputs and suggestions. I sorry to hear >> >> you had trouble with the attachments I will host it somewhere and >> >> share >> >> the >> >> links. >> >> I don't tweak my index, I get the data from the graph database, >> >> create a document as they are and save to solr. >> >> >> So, I am sending the new analysis screen querying the way you >> >> suggested. Also the results with params and solr query url. >> >> >> During the process of querying what you asked I found something >> >> really weird (at least for me). By accident, I ended up querying the >> >> using >> >> the default handler (/select) and it worked. Then If I use the one I >> >> must >> >> use, then sadly doesn't work. I am posting both results and I will >> >> also >> >> post the handlers as well. >> >> >> Here is the link with all the files mentioned before >> >> >> >> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 >> < >> >> >> >> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 >> > >> >> < >> >> >> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 >> >> < >> >> >> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 >> >> >> If the link doesn't work www dot dropbox dot com slash sh slash >> >> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0 >> >> >> Thanks >> >> On 7 Nov 2019, at 05:23, Paras Lehana < >> >> paras.leh...@indiamart.com >> >> <mailto:paras.leh...@indiamart.com>> wrote: >> >> >> Hi Guilherme. >> >> I am sending they analysis result and the json result as >> >> requested. >> >> >> >> Thanks for the effort. Luckily, I can see your attachments (low >> >> quality >> >> though). >> >> From the analysis screen, the analysis is working as expected. >> >> One >> >> of the >> >> reasons for query="lymphoid and *a* non-lymphoid cell" not >> >> matching >> >> document containing "Lymphoid and a non-Lymphoid cell" I can >> >> initially >> >> think of is: the stopword "a" is probably present in >> >> post-analysis >> >> either >> >> of query or index. Did you tweak your index time analysis after >> >> indexing? >> >> >> Do two things: >> >> 1. Post the analysis screen for and index=*"Immunoregulatory >> interactions between a Lymphoid and a non-Lymphoid cell"* and >> "query=*"lymphoid >> and a non-lymphoid cell"*. Try hosting the image and providing >> >> the >> >> link >> >> here. >> 2. Give the same JSON output as you have sent but this time >> >> with >> >> *"echoParams=all"*. Also, post the exact Solr query url. >> >> >> >> On Wed, 6 Nov 2019 at 21:07, Erick Erickson < >> >> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> wrote: >> >> >> I don’t see the attachments, maybe I deleted old e-mails or >> >> some >> >> such. The >> >> Apache server is fairly aggressive about stripping attachments >> >> though, so >> >> it’s also possible they didn’t make it through. >> >> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri < >> >> gvit...@ebi.ac.uk >> >> <mailto:gvit...@ebi.ac.uk>> wrote: >> >> >> Thanks Erick. >> >> First, your index and analysis chains are considerably >> >> different, this >> >> can easily be a source of problems. In particular, using two >> >> different >> >> tokenizers is a huge red flag. I _strongly_ recommend against >> >> this unless >> >> you’re totally sure you understand the consequences. >> >> Additionally, your use >> >> of the length filter is suspicious, especially since your >> >> problem >> >> statement >> >> is about the addition of a single letter term and the min >> >> length >> >> allowed on >> >> that filter is 2. That said, it’s reasonable to suppose that >> >> the >> >> ’a’ is >> >> filtered out in both cases, but maybe you’ve found something >> >> odd >> >> about the >> >> interactions. >> >> I will investigate the min length and post the results later. >> >> Second, I have no idea what this will do. Are the equal >> >> signs >> >> typos? >> >> Used by custom code? >> >> This the url in my application, not solr params. That's the >> >> query string. >> >> >> What does “species=“ do? That’s not Solr syntax, so it’s >> >> likely >> >> that >> >> all the params with an equal-sign are totally ignored unless >> >> it’s >> >> just a >> >> typo. >> >> This is part of the application. Species will be used later >> >> on >> >> in solr >> >> to filter out the result. That's not solr. That my app params. >> >> >> Third, the easiest way to see what’s happening under the >> >> covers >> >> is to >> >> add “&debug=true” to the query and look at the parsed query. >> >> Ignore all the >> >> relevance calculations for the nonce, or specify >> >> “&debug=query” >> >> to skip >> >> that part. >> >> The two json files i've sent, they are debugQuery=on and the >> >> explain tag >> >> is present. >> >> I will try the searching the way you mentioned. >> >> Thank for your inputs >> >> Guilherme >> >> On 6 Nov 2019, at 14:14, Erick Erickson < >> >> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> >> >> wrote: >> >> >> Fwd to another server >> >> First, your index and analysis chains are considerably >> >> different, this >> >> can easily be a source of problems. In particular, using two >> >> different >> >> tokenizers is a huge red flag. I _strongly_ recommend against >> >> this unless >> >> you’re totally sure you understand the consequences. >> >> Additionally, your use >> >> of the length filter is suspicious, especially since your >> >> problem >> >> statement >> >> is about the addition of a single letter term and the min >> >> length >> >> allowed on >> >> that filter is 2. That said, it’s reasonable to suppose that >> >> the >> >> ’a’ is >> >> filtered out in both cases, but maybe you’ve found something >> >> odd >> >> about the >> >> interactions. >> >> >> Second, I have no idea what this will do. Are the equal >> >> signs >> >> typos? >> >> Used by custom code? >> >> >> >> >> >> >> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> < >> >> >> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> >> >> What does “species=“ do? That’s not Solr syntax, so it’s >> >> likely >> >> that >> >> all the params with an equal-sign are totally ignored unless >> >> it’s >> >> just a >> >> typo. >> >> >> Third, the easiest way to see what’s happening under the >> >> covers >> >> is to >> >> add “&debug=true” to the query and look at the parsed query. >> >> Ignore all the >> >> relevance calculations for the nonce, or specify >> >> “&debug=query” >> >> to skip >> >> that part. >> >> >> 90% + of the time, the question “why didn’t this query do >> >> what I >> >> expect” is answered by looking at the “&debug=query” output >> >> and >> >> the >> >> analysis page in the admin UI. NOTE: for the analysis page be >> >> sure to look >> >> at _both_ the query and index output. Also, and very important >> >> about the >> >> analysis page (and this is confusing) is that this _assumes_ >> >> that >> >> what you >> >> put in the text boxes have made it through the query parser >> >> intact and is >> >> analyzed by the field selected. Consider the search >> >> "q=field:word1 word2". >> >> Now you type “word1 word2” into the analysis text box and it >> >> looks like >> >> what you expect. That’s misleading because the query is >> >> _parsed_ >> >> as >> >> "field:word1 default_search_field:word2”. This is where >> >> “&debug=query” >> >> helps. >> >> >> Best, >> Erick >> >> On Nov 6, 2019, at 2:36 AM, Paras Lehana < >> >> paras.leh...@indiamart.com <mailto:paras.leh...@indiamart.com>> >> >> wrote: >> >> >> Hi Walter, >> >> The solr.StopFilter removes all tokens that are stopwords. >> >> Those words >> >> will >> >> not be in the index, so they can never match a query. >> >> >> >> I think the OP's concern is different results when adding a >> >> stopword. I >> >> think he's using the filter factory correctly - the query >> >> chain >> >> includes >> >> the filter as well so it should remove "a" while querying. >> >> *@Guilherme*, please post results for both the query, the >> >> document in >> >> result you are concerned about and post full result of >> >> analysis screen >> >> (for >> >> both query and index). >> >> On Tue, 5 Nov 2019 at 21:38, Walter Underwood < >> >> wun...@wunderwood.org <mailto:wun...@wunderwood.org>> >> >> wrote: >> >> >> No. >> >> The solr.StopFilter removes all tokens that are stopwords. >> >> Those words >> >> will not be in the index, so they can never match a query. >> >> 1. Remove the lines with solr.StopFilter from every >> >> analysis >> >> chain in >> >> schema.xml. >> 2. Reload the collection, restart Solr, or whatever to >> >> read >> >> the new >> >> config. >> >> 3. Reindex all of the documents. >> >> When indexed with the new analysis chain, the stopwords >> >> will >> >> not be >> >> removed and they will be searchable. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >> http://observer.wunderwood.org/ < >> >> http://observer.wunderwood.org/> (my blog) >> >> >> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri < >> >> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>> >> >> wrote: >> >> >> Ok. I am kind a lost now. >> If I open up the console > analysis and perform it, >> >> that's >> >> the final >> >> result. >> >> <Screenshot 2019-11-05 at 14.54.16.png> >> >> Your suggestion is: get rid of the <filter stopword.txt> >> >> in >> >> the >> >> schema.xml and during index phase replaceAll("in >> >> stopwords.txt"," ") >> >> then >> >> add to solr. Is that correct ? >> >> >> Thanks David >> >> On 5 Nov 2019, at 14:48, David Hastings < >> >> hastings.recurs...@gmail.com <mailto: >> >> hastings.recurs...@gmail.com >> >> >> <mailto:hastings.recurs...@gmail.com <mailto: >> >> hastings.recurs...@gmail.com>>> wrote: >> >> >> Fwd to another server >> >> no, >> <filter class="solr.StopFilterFactory" >> >> ignoreCase="true" >> >> words="stopwords.txt"/> >> >> is still using stopwords and should be removed, in my >> >> opinion of >> >> course, >> >> based on your use case may be different, but i generally >> >> axe any >> >> reference >> >> to them at all >> >> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri < >> >> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> >> >> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> >> >> wrote: >> >> >> Thanks. >> Haven't I done this here ? >> <fieldType name="text_field" class="solr.TextField" >> positionIncrementGap="100" omitNorms="false" > >> <analyzer type="index"> >> <tokenizer class="solr.StandardTokenizerFactory"/> >> <filter class="solr.ClassicFilterFactory"/> >> <filter class="solr.LengthFilterFactory" min="2" >> >> max="20"/> >> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.StopFilterFactory" >> >> ignoreCase="true" >> >> words="stopwords.txt"/> >> </analyzer> >> >> >> On 5 Nov 2019, at 14:15, David Hastings < >> >> hastings.recurs...@gmail.com <mailto: >> >> hastings.recurs...@gmail.com >> >> >> <mailto:hastings.recurs...@gmail.com <mailto: >> >> hastings.recurs...@gmail.com>>> >> >> wrote: >> >> >> Fwd to another server >> >> The first thing you should do is remove any reference >> >> to >> >> stop >> >> words >> >> and >> >> never use them, then re-index your data and try it >> >> again. >> >> >> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri < >> >> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> >> >> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> >> >> wrote: >> >> >> Hi, >> >> I am performing a search to match a name >> >> (text_field), >> >> however >> >> this >> >> term >> >> contains 'and' and 'a' and it doesn't return any >> >> records. If i >> >> remove >> >> 'a' >> >> then it works. >> e.g >> Search Term: lymphoid and a non-lymphoid cell >> doesn't work: >> >> >> >> >> >> >> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> < >> >> >> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> >> < >> >> >> >> >> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> < >> >> >> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> >> >> < >> >> >> >> >> >> >> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> < >> >> >> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> >> >> >> Search term: lymphoid and non-lymphoid cell >> works: >> >> >> >> >> >> >> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> < >> >> >> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> >> < >> >> >> >> >> >> >> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> < >> >> >> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> >> >> interested in the first result >> >> schema.xml >> <field name="name" >> >> type="text_field" >> >> indexed="true" stored="true" omitNorms="false" >> >> required="true" >> >> multiValued="false"/> >> >> <analyzer type="query"> >> <tokenizer class="solr.PatternTokenizerFactory" >> pattern="[^a-zA-Z0-9/._:]"/> >> <filter class="solr.PatternReplaceFilterFactory" >> pattern="^[/._:]+" replacement=""/> >> <filter class="solr.PatternReplaceFilterFactory" >> pattern="[/._:]+$" replacement=""/> >> <filter class="solr.PatternReplaceFilterFactory" >> pattern="[_]" replacement=" "/> >> <filter class="solr.LengthFilterFactory" min="2" >> >> max="20"/> >> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.StopFilterFactory" >> >> ignoreCase="true" >> >> words="stopwords.txt"/> >> </analyzer> >> >> <fieldType name="text_field" class="solr.TextField" >> positionIncrementGap="100" omitNorms="false" > >> <analyzer type="index"> >> <tokenizer >> >> class="solr.StandardTokenizerFactory"/> >> >> <filter class="solr.ClassicFilterFactory"/> >> <filter class="solr.LengthFilterFactory" min="2" >> >> max="20"/> >> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.StopFilterFactory" >> >> ignoreCase="true" >> >> words="stopwords.txt"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.PatternTokenizerFactory" >> pattern="[^a-zA-Z0-9/._:]"/> >> <filter class="solr.PatternReplaceFilterFactory" >> pattern="^[/._:]+" replacement=""/> >> <filter class="solr.PatternReplaceFilterFactory" >> pattern="[/._:]+$" replacement=""/> >> <filter class="solr.PatternReplaceFilterFactory" >> pattern="[_]" replacement=" "/> >> <filter class="solr.LengthFilterFactory" min="2" >> >> max="20"/> >> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.StopFilterFactory" >> >> ignoreCase="true" >> >> words="stopwords.txt"/> >> </analyzer> >> </fieldType> >> >> stopwords.txt >> #Standard english stop words taken from Lucene's >> >> StopAnalyzer >> >> a >> b >> c >> .... >> an >> and >> are >> >> Running SolR 6.6.2. >> >> Is there anything I could do to prevent this ? >> >> Thanks >> Guilherme >> >> >> >> >> >> >> >> -- >> -- >> Regards, >> >> *Paras Lehana* [65871] >> Development Engineer, Auto-Suggest, >> IndiaMART Intermesh Ltd. >> >> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >> Noida, UP, IN - 201303 >> >> Mob.: +91-9560911996 >> Work: 01203916600 | Extn: *8173* >> >> -- >> IMPORTANT: >> NEVER share your IndiaMART OTP/ Password with anyone. >> >> >> >> >> >> >> -- >> -- >> Regards, >> >> *Paras Lehana* [65871] >> Development Engineer, Auto-Suggest, >> IndiaMART Intermesh Ltd. >> >> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >> Noida, UP, IN - 201303 >> >> Mob.: +91-9560911996 >> Work: 01203916600 | Extn: *8173* >> >> -- >> IMPORTANT: >> NEVER share your IndiaMART OTP/ Password with anyone. >> >> >> >> >> >> >> >> -- >> -- >> Regards, >> >> Paras Lehana [65871] >> Development Engineer, Auto-Suggest, >> IndiaMART Intermesh Ltd. >> >> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >> Noida, UP, IN - 201303 >> >> Mob.: +91-9560911996 <tel:+91-9560911996> >> Work: 01203916600 | Extn: 8173 >> >> IMPORTANT: >> NEVER share your IndiaMART OTP/ Password with anyone. >> >> >> >> >> >> >> >> >> >> -- >> -- >> Regards, >> >> *Paras Lehana* [65871] >> Development Engineer, Auto-Suggest, >> IndiaMART Intermesh Ltd. >> >> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >> Noida, UP, IN - 201303 >> >> Mob.: +91-9560911996 >> Work: 01203916600 | Extn: *8173* >> >> -- >> IMPORTANT: >> NEVER share your IndiaMART OTP/ Password with anyone. >> >> >> >> >> > > -- > -- > Regards, > > *Paras Lehana* [65871] > Development Engineer, Auto-Suggest, > IndiaMART Intermesh Ltd. > > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > Noida, UP, IN - 201303 > > Mob.: +91-9560911996 > Work: 01203916600 | Extn: *8173* > > IMPORTANT: > NEVER share your IndiaMART OTP/ Password with anyone. > > -- -- Regards, *Paras Lehana* [65871] Development Engineer, Auto-Suggest, IndiaMART Intermesh Ltd. 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, Noida, UP, IN - 201303 Mob.: +91-9560911996 Work: 01203916600 | Extn: *8173* -- IMPORTANT: NEVER share your IndiaMART OTP/ Password with anyone.