Hi, > Have you tried reindexing the documents and compare the results? No issues > if you cannot do that - let's try something else. I was going through the > whole mail and your files. You had said: Yes, but since it hasn't worked as suggested, I kept as you suggested.
> As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I >> don't get anything (which make sense). > > Why did you think that not getting anything when you add dbId made sense? > Asking because I may be missing something here. I am searching for a text and I was searching on an ID field, which wouldn't make sense. (I will come back to this soon.) Ok, I've been adding and removing fields in the qf and I could isolate half of the problem. First, I have one type of field called keyword_field and I added the StopWords filter for this field and It worked. Second, when I add the fields that are id (<fieldType name="id" class="solr.StrField" /> Do you think I should also the stopwords filter for the fieldtype id ? (I tried, and it worked, but I am not sure if this is conceptually correct, id, should remain intact from my understand) Thanks Guilherme > On 18 Nov 2019, at 05:37, Paras Lehana <paras.leh...@indiamart.com> wrote: > > Hi Guilherme, > > Have you tried reindexing the documents and compare the results? No issues > if you cannot do that - let's try something else. I was going through the > whole mail and your files. You had said: > > As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I >> don't get anything (which make sense). > > > Why did you think that not getting anything when you add dbId made sense? > Asking because I may be missing something here. > > Also, what is the purpose of so many qf's? Going through your documents and > config files, I found that your dbId's are string of numbers and I don't > think you want to find your query terms in dbId, right? > Do you want to boost the score by the values in dbId? > > Your qf of dbId^100 boosts documents containing terms in q by 100x. Since > your terms don't match with the values in dbId for any document, the score > produced by this scoring is 0. 100x or 1x of 0 is still 0. > I still need to see how this scoring gets added up in edismax parser but do > reevaluate the usage of these qfs. Same goes for other qf boosts. :) > > > On Fri, 15 Nov 2019 at 12:23, Guilherme Viteri <gvit...@ebi.ac.uk> wrote: > >> Hi Paras >> No worries. >> No I didn’t find anything. This is annoying now... >> Yes! They do contain dbId. Absolutely all my docs contains dbId and it is >> actually my key, if you check again the schema.xml >> >> Cheers >> Guilherme >> >> On 15 Nov 2019, at 05:37, Paras Lehana <paras.leh...@indiamart.com> wrote: >> >> >> Hey Guilherme, >> >> I was a bit busy for the past few days and couldn't read your mail. So, >> did you find anything? Anyways, as I had expected, the culprit is >> definitely among the qfs. Do the documents in concern contain dbId? I >> suggest you to cross check the fields in your document with those impacting >> the result in qf. >> >> On Tue, 12 Nov 2019 at 16:14, Guilherme Viteri <gvit...@ebi.ac.uk> wrote: >> >>> What I can't understand is: >>> I search for the exact term - "Immunoregulatory interactions between a >>> Lymphoid *and a* non-Lymphoid cell" and If i search "I search for the >>> exact term - Immunoregulatory interactions between a Lymphoid *and >>> *non-Lymphoid >>> cell" then it works >>> >>> On 11 Nov 2019, at 12:24, Guilherme Viteri <gvit...@ebi.ac.uk> wrote: >>> >>> Thanks >>> >>> Removing stopwords is another story. I'm curious to find the reason >>> assuming that you keep on using stopwords. In some cases, stopwords are >>> really necessary. >>> >>> Yes. It always make sense the way we've been using. >>> >>> If q.alt is giving you responses, it's confirmed that your stopwords >>> filter >>> is working as expected. The problem definitely lies in the configuration >>> of >>> edismax. >>> >>> I see. >>> >>> *Let me explain again:* In your solrconfig.xml, look at your /search >>> >>> Ok, using q now, removed all qf, performed the search and I got 23 >>> results, and the one I really want, on the top. >>> As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then >>> I don't get anything (which make sense). However if I query name_exact, I >>> get the 23 results again, and unfortunately if I query stId^1.0 >>> name_exact^10.0 I still don't get any results. >>> >>> In summary >>> - without qf - 23 results >>> - dbId - 0 results >>> - name_exact - 16 results >>> - name - 23 results >>> - dbId^1.0 >>> name_exact^10.0 - 0 results >>> - 0 results if any other, stId, dbId (key) is added on top of the >>> name(name_exact, etc). >>> >>> Definitely lost here! :-/ >>> >>> >>> On 11 Nov 2019, at 07:59, Paras Lehana <paras.leh...@indiamart.com> >>> wrote: >>> >>> Hi >>> >>> So I don't think removing it completely is the way to go from the scenario >>> >>> we have >>> >>> >>> >>> Removing stopwords is another story. I'm curious to find the reason >>> assuming that you keep on using stopwords. In some cases, stopwords are >>> really necessary. >>> >>> >>> Quite a considerable increase >>> >>> >>> If q.alt is giving you responses, it's confirmed that your stopwords >>> filter >>> is working as expected. The problem definitely lies in the configuration >>> of >>> edismax. >>> >>> >>> >>> I am sorry but I didn't understand what do you want me to do exactly with >>> the lst (??) and qf and bf. >>> >>> >>> >>> What combinations did you try? I was referring to the field-level boosting >>> you have applied in edismax config. >>> >>> *Let me explain again:* In your solrconfig.xml, look at your /search >>> request handler. There are many qf and some bq boosts. I want you to >>> remove >>> all of these, check response again (with q now) and keep on adding them >>> again (one by one) while looking for when the numFound drastically >>> changes. >>> >>> On Fri, 8 Nov 2019 at 23:47, David Hastings <hastings.recurs...@gmail.com >>>> >>> wrote: >>> >>> I use 3 word shingles with stopwords for my MLT ML trainer that worked >>> pretty well for such a solution, but for a full index the size became >>> prohibitive >>> >>> On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood <wun...@wunderwood.org> >>> wrote: >>> >>> If we had IDF for phrases, they would be super effective. The 2X weight >>> >>> is >>> >>> a hack that mostly works. >>> >>> Infoseek had phrase IDF and it was a killer algorithm for relevance. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>> On Nov 8, 2019, at 11:08 AM, David Hastings < >>> >>> hastings.recurs...@gmail.com> wrote: >>> >>> >>> the pf and qf fields are REALLY nice for this >>> >>> On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood < >>> >>> wun...@wunderwood.org> >>> >>> wrote: >>> >>> I always enable phrase searching in edismax for exactly this reason. >>> >>> Something like: >>> >>> <str name="qf”>title^8 keywords^4 text</str> >>> <str name="pf”>title^16 keywords^8 text^2</str> >>> >>> To deal with concepts in queries, a classifier and/or named entity >>> extractor can be helpful. If you have a list of concepts (“controlled >>> vocabulary”) that includes “Lamin A”, and that shows up in a query, >>> >>> that >>> >>> term can be queried against the field matching that vocabulary. >>> >>> This is how LinkedIn separates people, companies, and places, for >>> >>> example. >>> >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>> On Nov 8, 2019, at 10:48 AM, Erick Erickson <erickerick...@gmail.com >>> >>> >>> wrote: >>> >>> >>> Look at the “mm” parameter, try setting it to 100%. Although that’t >>> >>> not >>> >>> entirely likely to do what you want either since virtually every doc >>> >>> will >>> >>> have “a” in it. But at least you’d get docs that have both terms. >>> >>> >>> you may also be able to search for things like “Lamin A” _only as a >>> >>> phrase_ and have some luck. But this is a gnarly problem in general. >>> >>> Some >>> >>> people have been able to substitute synonyms and/or shingles to make >>> >>> this >>> >>> work at the expense of a larger index. >>> >>> >>> This is a generic problem with context. “Lamin A” is really a >>> >>> “concept”, >>> >>> not just two words that happen to be near each other. Searching as a >>> >>> phrase >>> >>> is an OOB-but-naive way to try to make it more likely that the ranked >>> results refer to the _concept_ of “Lamin A”. The assumption here is >>> >>> “if >>> >>> these two words appear next to each other, they’re more likely to be >>> >>> what I >>> >>> want”. I say “naive” because “Lamins: A new approach to...” would >>> >>> _also_ be >>> >>> found for a naive phrase search. (I have no idea whether such a title >>> >>> makes >>> >>> sense or not, but you figured that out already)... >>> >>> >>> To do this well you’d have to dive in to NLP/Machine learning. >>> >>> I truly wish we could have the DWIM search algorithm (Do What I >>> >>> Mean)…. >>> >>> >>> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri <gvit...@ebi.ac.uk> >>> >>> wrote: >>> >>> >>> HI Walter and Paras >>> >>> I indexed it removing all the references to StopWordFilter and I >>> >>> went >>> >>> from 121 results to near 20K as the search term q="Lymphoid and a >>> non-Lymphoid cell" is matching entities such as "IFT A" or "Lamin A". >>> >>> So I >>> >>> don't think removing it completely is the way to go from the scenario >>> >>> we >>> >>> have, but I appreciate the suggestion… >>> >>> >>> Yes the response is using fl=* >>> I am trying some combinations at the moment, but yet no success. >>> >>> defType=edismax >>> q.alt=Lymphoid and a non-Lymphoid cell >>> Number of results=1599 >>> Quite a considerable increase, even though reasonable meaningful >>> >>> results. >>> >>> >>> I am sorry but I didn't understand what do you want me to do exactly >>> >>> with the lst (??) and qf and bf. >>> >>> >>> Thanks everyone with their inputs >>> >>> >>> On 8 Nov 2019, at 06:45, Paras Lehana <paras.leh...@indiamart.com> >>> >>> wrote: >>> >>> >>> Hi Guilherme >>> >>> By accident, I ended up querying the using the default handler >>> >>> (/select) and it worked. >>> >>> >>> You've just found the culprit. Thanks for giving the material I >>> >>> requested. Your analysis chain is working as expected. I don't see any >>> issue in either StopWordFilter or your boosts. I also use a boost of >>> >>> 50 >>> >>> when boosting contextual suggestions (boosting "gold iphone" on a page >>> >>> of >>> >>> iphone) but I take Walter's suggestion and would try to optimize my >>> weights. I agree that this 50 thing was not researched much about by >>> >>> us >>> >>> as >>> >>> well (we never faced performance or relevance issues). >>> >>> >>> See the major difference in both the handlers - edismax. I'm pretty >>> >>> sure that your problem lies in the parsing of queries (you can confirm >>> >>> that >>> >>> from parsedquery key in debug of both JSON responses). I hope you have >>> provided the response with fl=*. Replace q with q.alt in your /search >>> handler query and I think you should start getting responses. That's >>> because q.alt uses standard parser. If you want to keep using >>> >>> edisMax, I >>> >>> suggest you to test the responses removing some combination of lst >>> >>> (qf, >>> >>> bf) >>> >>> and find what's restricting the documents to come up. I'm out of >>> >>> office >>> >>> today - would have certainly tried analyzing the field values of the >>> document in /select request and compare it with qf/bq in >>> >>> solrconfig.xml >>> >>> /search. Do this for me and you'd certainly find something. >>> >>> >>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood < >>> >>> wun...@wunderwood.org >>> >>> <mailto:wun...@wunderwood.org>> wrote: >>> >>> I normally use a weight of 8 for the most important field, like >>> >>> title. >>> >>> Other fields might get a 4 or 2. >>> >>> >>> I add a “pf” field with the weights doubled, so that phrase matches >>> >>> have a higher weight. >>> >>> >>> The weight of 8 comes from experience at Infoseek and Inktomi, two >>> >>> early web search engines. With different relevance algorithms and >>> >>> totally >>> >>> different evaluation and tuning systems, they settled on weights of 8 >>> >>> and >>> >>> 7.5 for HTML titles. With the the two radically different system >>> >>> getting >>> >>> the same number, I decided that was a property of the documents, not >>> >>> of >>> >>> the >>> >>> search engines. >>> >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> >>> >>> (my blog) >>> >>> >>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gvit...@ebi.ac.uk >>> >>> <mailto:gvit...@ebi.ac.uk>> wrote: >>> >>> >>> Hi Wunder, >>> >>> My indexer takes quite a few hours to be executed I am shortening >>> >>> it >>> >>> to run faster, but I also need to make sure it gives what we are >>> >>> expecting. >>> >>> This implementation's been there for >4y, and massively used. >>> >>> >>> In your edismax handlers, weights of 20, 50, and 100 are >>> >>> extremely >>> >>> high. I don’t think I’ve ever used a weight higher than 16 in a dozen >>> >>> years >>> >>> of configuring Solr. >>> >>> I've inherited that implementation and I am really keen to >>> >>> adequate >>> >>> it, what would you recommend ? >>> >>> >>> Cheers >>> Guilherme >>> >>> On 7 Nov 2019, at 14:43, Walter Underwood <wun...@wunderwood.org >>> >>> <mailto:wun...@wunderwood.org>> wrote: >>> >>> >>> Thanks for posting the files. Looking at schema.xml, I see that >>> >>> you >>> >>> still are using StopFilterFactory. The first advice we gave you was to >>> remove that. >>> >>> >>> Remove StopFilterFactory everywhere and reindex. >>> >>> You will continue to have problems matching stopwords until you >>> >>> do >>> >>> that. >>> >>> >>> In your edismax handlers, weights of 20, 50, and 100 are >>> >>> extremely >>> >>> high. I don’t think I’ve ever used a weight higher than 16 in a dozen >>> >>> years >>> >>> of configuring Solr. >>> >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/ >>> >>> >>> (my blog) >>> >>> >>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk >>> >>> <mailto:gvit...@ebi.ac.uk>> wrote: >>> >>> >>> Hi Paras, everyone >>> >>> Thank you again for your inputs and suggestions. I sorry to hear >>> >>> you had trouble with the attachments I will host it somewhere and >>> >>> share >>> >>> the >>> >>> links. >>> >>> I don't tweak my index, I get the data from the graph database, >>> >>> create a document as they are and save to solr. >>> >>> >>> So, I am sending the new analysis screen querying the way you >>> >>> suggested. Also the results with params and solr query url. >>> >>> >>> During the process of querying what you asked I found something >>> >>> really weird (at least for me). By accident, I ended up querying the >>> >>> using >>> >>> the default handler (/select) and it worked. Then If I use the one I >>> >>> must >>> >>> use, then sadly doesn't work. I am posting both results and I will >>> >>> also >>> >>> post the handlers as well. >>> >>> >>> Here is the link with all the files mentioned before >>> >>> >>> >>> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 >>> < >>> >>> >>> >>> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 >>>> >>> >>> < >>> >>> >>> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 >>> >>> < >>> >>> >>> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 >>> >>> >>> If the link doesn't work www dot dropbox dot com slash sh slash >>> >>> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0 >>> >>> >>> Thanks >>> >>> On 7 Nov 2019, at 05:23, Paras Lehana < >>> >>> paras.leh...@indiamart.com >>> >>> <mailto:paras.leh...@indiamart.com>> wrote: >>> >>> >>> Hi Guilherme. >>> >>> I am sending they analysis result and the json result as >>> >>> requested. >>> >>> >>> >>> Thanks for the effort. Luckily, I can see your attachments (low >>> >>> quality >>> >>> though). >>> >>> From the analysis screen, the analysis is working as expected. >>> >>> One >>> >>> of the >>> >>> reasons for query="lymphoid and *a* non-lymphoid cell" not >>> >>> matching >>> >>> document containing "Lymphoid and a non-Lymphoid cell" I can >>> >>> initially >>> >>> think of is: the stopword "a" is probably present in >>> >>> post-analysis >>> >>> either >>> >>> of query or index. Did you tweak your index time analysis after >>> >>> indexing? >>> >>> >>> Do two things: >>> >>> 1. Post the analysis screen for and index=*"Immunoregulatory >>> interactions between a Lymphoid and a non-Lymphoid cell"* and >>> "query=*"lymphoid >>> and a non-lymphoid cell"*. Try hosting the image and providing >>> >>> the >>> >>> link >>> >>> here. >>> 2. Give the same JSON output as you have sent but this time >>> >>> with >>> >>> *"echoParams=all"*. Also, post the exact Solr query url. >>> >>> >>> >>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson < >>> >>> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> wrote: >>> >>> >>> I don’t see the attachments, maybe I deleted old e-mails or >>> >>> some >>> >>> such. The >>> >>> Apache server is fairly aggressive about stripping attachments >>> >>> though, so >>> >>> it’s also possible they didn’t make it through. >>> >>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri < >>> >>> gvit...@ebi.ac.uk >>> >>> <mailto:gvit...@ebi.ac.uk>> wrote: >>> >>> >>> Thanks Erick. >>> >>> First, your index and analysis chains are considerably >>> >>> different, this >>> >>> can easily be a source of problems. In particular, using two >>> >>> different >>> >>> tokenizers is a huge red flag. I _strongly_ recommend against >>> >>> this unless >>> >>> you’re totally sure you understand the consequences. >>> >>> Additionally, your use >>> >>> of the length filter is suspicious, especially since your >>> >>> problem >>> >>> statement >>> >>> is about the addition of a single letter term and the min >>> >>> length >>> >>> allowed on >>> >>> that filter is 2. That said, it’s reasonable to suppose that >>> >>> the >>> >>> ’a’ is >>> >>> filtered out in both cases, but maybe you’ve found something >>> >>> odd >>> >>> about the >>> >>> interactions. >>> >>> I will investigate the min length and post the results later. >>> >>> Second, I have no idea what this will do. Are the equal >>> >>> signs >>> >>> typos? >>> >>> Used by custom code? >>> >>> This the url in my application, not solr params. That's the >>> >>> query string. >>> >>> >>> What does “species=“ do? That’s not Solr syntax, so it’s >>> >>> likely >>> >>> that >>> >>> all the params with an equal-sign are totally ignored unless >>> >>> it’s >>> >>> just a >>> >>> typo. >>> >>> This is part of the application. Species will be used later >>> >>> on >>> >>> in solr >>> >>> to filter out the result. That's not solr. That my app params. >>> >>> >>> Third, the easiest way to see what’s happening under the >>> >>> covers >>> >>> is to >>> >>> add “&debug=true” to the query and look at the parsed query. >>> >>> Ignore all the >>> >>> relevance calculations for the nonce, or specify >>> >>> “&debug=query” >>> >>> to skip >>> >>> that part. >>> >>> The two json files i've sent, they are debugQuery=on and the >>> >>> explain tag >>> >>> is present. >>> >>> I will try the searching the way you mentioned. >>> >>> Thank for your inputs >>> >>> Guilherme >>> >>> On 6 Nov 2019, at 14:14, Erick Erickson < >>> >>> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> >>> >>> wrote: >>> >>> >>> Fwd to another server >>> >>> First, your index and analysis chains are considerably >>> >>> different, this >>> >>> can easily be a source of problems. In particular, using two >>> >>> different >>> >>> tokenizers is a huge red flag. I _strongly_ recommend against >>> >>> this unless >>> >>> you’re totally sure you understand the consequences. >>> >>> Additionally, your use >>> >>> of the length filter is suspicious, especially since your >>> >>> problem >>> >>> statement >>> >>> is about the addition of a single letter term and the min >>> >>> length >>> >>> allowed on >>> >>> that filter is 2. That said, it’s reasonable to suppose that >>> >>> the >>> >>> ’a’ is >>> >>> filtered out in both cases, but maybe you’ve found something >>> >>> odd >>> >>> about the >>> >>> interactions. >>> >>> >>> Second, I have no idea what this will do. Are the equal >>> >>> signs >>> >>> typos? >>> >>> Used by custom code? >>> >>> >>> >>> >>> >>> >>> >>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>> < >>> >>> >>> >>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>> >>> >>> What does “species=“ do? That’s not Solr syntax, so it’s >>> >>> likely >>> >>> that >>> >>> all the params with an equal-sign are totally ignored unless >>> >>> it’s >>> >>> just a >>> >>> typo. >>> >>> >>> Third, the easiest way to see what’s happening under the >>> >>> covers >>> >>> is to >>> >>> add “&debug=true” to the query and look at the parsed query. >>> >>> Ignore all the >>> >>> relevance calculations for the nonce, or specify >>> >>> “&debug=query” >>> >>> to skip >>> >>> that part. >>> >>> >>> 90% + of the time, the question “why didn’t this query do >>> >>> what I >>> >>> expect” is answered by looking at the “&debug=query” output >>> >>> and >>> >>> the >>> >>> analysis page in the admin UI. NOTE: for the analysis page be >>> >>> sure to look >>> >>> at _both_ the query and index output. Also, and very important >>> >>> about the >>> >>> analysis page (and this is confusing) is that this _assumes_ >>> >>> that >>> >>> what you >>> >>> put in the text boxes have made it through the query parser >>> >>> intact and is >>> >>> analyzed by the field selected. Consider the search >>> >>> "q=field:word1 word2". >>> >>> Now you type “word1 word2” into the analysis text box and it >>> >>> looks like >>> >>> what you expect. That’s misleading because the query is >>> >>> _parsed_ >>> >>> as >>> >>> "field:word1 default_search_field:word2”. This is where >>> >>> “&debug=query” >>> >>> helps. >>> >>> >>> Best, >>> Erick >>> >>> On Nov 6, 2019, at 2:36 AM, Paras Lehana < >>> >>> paras.leh...@indiamart.com <mailto:paras.leh...@indiamart.com>> >>> >>> wrote: >>> >>> >>> Hi Walter, >>> >>> The solr.StopFilter removes all tokens that are stopwords. >>> >>> Those words >>> >>> will >>> >>> not be in the index, so they can never match a query. >>> >>> >>> >>> I think the OP's concern is different results when adding a >>> >>> stopword. I >>> >>> think he's using the filter factory correctly - the query >>> >>> chain >>> >>> includes >>> >>> the filter as well so it should remove "a" while querying. >>> >>> *@Guilherme*, please post results for both the query, the >>> >>> document in >>> >>> result you are concerned about and post full result of >>> >>> analysis screen >>> >>> (for >>> >>> both query and index). >>> >>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood < >>> >>> wun...@wunderwood.org <mailto:wun...@wunderwood.org>> >>> >>> wrote: >>> >>> >>> No. >>> >>> The solr.StopFilter removes all tokens that are stopwords. >>> >>> Those words >>> >>> will not be in the index, so they can never match a query. >>> >>> 1. Remove the lines with solr.StopFilter from every >>> >>> analysis >>> >>> chain in >>> >>> schema.xml. >>> 2. Reload the collection, restart Solr, or whatever to >>> >>> read >>> >>> the new >>> >>> config. >>> >>> 3. Reindex all of the documents. >>> >>> When indexed with the new analysis chain, the stopwords >>> >>> will >>> >>> not be >>> >>> removed and they will be searchable. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>> http://observer.wunderwood.org/ < >>> >>> http://observer.wunderwood.org/> (my blog) >>> >>> >>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri < >>> >>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>> >>> >>> wrote: >>> >>> >>> Ok. I am kind a lost now. >>> If I open up the console > analysis and perform it, >>> >>> that's >>> >>> the final >>> >>> result. >>> >>> <Screenshot 2019-11-05 at 14.54.16.png> >>> >>> Your suggestion is: get rid of the <filter stopword.txt> >>> >>> in >>> >>> the >>> >>> schema.xml and during index phase replaceAll("in >>> >>> stopwords.txt"," ") >>> >>> then >>> >>> add to solr. Is that correct ? >>> >>> >>> Thanks David >>> >>> On 5 Nov 2019, at 14:48, David Hastings < >>> >>> hastings.recurs...@gmail.com <mailto: >>> >>> hastings.recurs...@gmail.com >>> >>> >>> <mailto:hastings.recurs...@gmail.com <mailto: >>> >>> hastings.recurs...@gmail.com>>> wrote: >>> >>> >>> Fwd to another server >>> >>> no, >>> <filter class="solr.StopFilterFactory" >>> >>> ignoreCase="true" >>> >>> words="stopwords.txt"/> >>> >>> is still using stopwords and should be removed, in my >>> >>> opinion of >>> >>> course, >>> >>> based on your use case may be different, but i generally >>> >>> axe any >>> >>> reference >>> >>> to them at all >>> >>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri < >>> >>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> >>> >>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> >>> >>> wrote: >>> >>> >>> Thanks. >>> Haven't I done this here ? >>> <fieldType name="text_field" class="solr.TextField" >>> positionIncrementGap="100" omitNorms="false" > >>> <analyzer type="index"> >>> <tokenizer class="solr.StandardTokenizerFactory"/> >>> <filter class="solr.ClassicFilterFactory"/> >>> <filter class="solr.LengthFilterFactory" min="2" >>> >>> max="20"/> >>> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> <filter class="solr.StopFilterFactory" >>> >>> ignoreCase="true" >>> >>> words="stopwords.txt"/> >>> </analyzer> >>> >>> >>> On 5 Nov 2019, at 14:15, David Hastings < >>> >>> hastings.recurs...@gmail.com <mailto: >>> >>> hastings.recurs...@gmail.com >>> >>> >>> <mailto:hastings.recurs...@gmail.com <mailto: >>> >>> hastings.recurs...@gmail.com>>> >>> >>> wrote: >>> >>> >>> Fwd to another server >>> >>> The first thing you should do is remove any reference >>> >>> to >>> >>> stop >>> >>> words >>> >>> and >>> >>> never use them, then re-index your data and try it >>> >>> again. >>> >>> >>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri < >>> >>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> >>> >>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> >>> >>> wrote: >>> >>> >>> Hi, >>> >>> I am performing a search to match a name >>> >>> (text_field), >>> >>> however >>> >>> this >>> >>> term >>> >>> contains 'and' and 'a' and it doesn't return any >>> >>> records. If i >>> >>> remove >>> >>> 'a' >>> >>> then it works. >>> e.g >>> Search Term: lymphoid and a non-lymphoid cell >>> doesn't work: >>> >>> >>> >>> >>> >>> >>> >>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>> < >>> >>> >>> >>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>> >>> < >>> >>> >>> >>> >>> >>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>> < >>> >>> >>> >>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>> >>> >>> < >>> >>> >>> >>> >>> >>> >>> >>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>> < >>> >>> >>> >>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>> >>> >>> >>> Search term: lymphoid and non-lymphoid cell >>> works: >>> >>> >>> >>> >>> >>> >>> >>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>> < >>> >>> >>> >>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>> >>> < >>> >>> >>> >>> >>> >>> >>> >>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>> < >>> >>> >>> >>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>> >>> >>> interested in the first result >>> >>> schema.xml >>> <field name="name" >>> >>> type="text_field" >>> >>> indexed="true" stored="true" omitNorms="false" >>> >>> required="true" >>> >>> multiValued="false"/> >>> >>> <analyzer type="query"> >>> <tokenizer class="solr.PatternTokenizerFactory" >>> pattern="[^a-zA-Z0-9/._:]"/> >>> <filter class="solr.PatternReplaceFilterFactory" >>> pattern="^[/._:]+" replacement=""/> >>> <filter class="solr.PatternReplaceFilterFactory" >>> pattern="[/._:]+$" replacement=""/> >>> <filter class="solr.PatternReplaceFilterFactory" >>> pattern="[_]" replacement=" "/> >>> <filter class="solr.LengthFilterFactory" min="2" >>> >>> max="20"/> >>> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> <filter class="solr.StopFilterFactory" >>> >>> ignoreCase="true" >>> >>> words="stopwords.txt"/> >>> </analyzer> >>> >>> <fieldType name="text_field" class="solr.TextField" >>> positionIncrementGap="100" omitNorms="false" > >>> <analyzer type="index"> >>> <tokenizer >>> >>> class="solr.StandardTokenizerFactory"/> >>> >>> <filter class="solr.ClassicFilterFactory"/> >>> <filter class="solr.LengthFilterFactory" min="2" >>> >>> max="20"/> >>> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> <filter class="solr.StopFilterFactory" >>> >>> ignoreCase="true" >>> >>> words="stopwords.txt"/> >>> </analyzer> >>> <analyzer type="query"> >>> <tokenizer class="solr.PatternTokenizerFactory" >>> pattern="[^a-zA-Z0-9/._:]"/> >>> <filter class="solr.PatternReplaceFilterFactory" >>> pattern="^[/._:]+" replacement=""/> >>> <filter class="solr.PatternReplaceFilterFactory" >>> pattern="[/._:]+$" replacement=""/> >>> <filter class="solr.PatternReplaceFilterFactory" >>> pattern="[_]" replacement=" "/> >>> <filter class="solr.LengthFilterFactory" min="2" >>> >>> max="20"/> >>> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> <filter class="solr.StopFilterFactory" >>> >>> ignoreCase="true" >>> >>> words="stopwords.txt"/> >>> </analyzer> >>> </fieldType> >>> >>> stopwords.txt >>> #Standard english stop words taken from Lucene's >>> >>> StopAnalyzer >>> >>> a >>> b >>> c >>> .... >>> an >>> and >>> are >>> >>> Running SolR 6.6.2. >>> >>> Is there anything I could do to prevent this ? >>> >>> Thanks >>> Guilherme >>> >>> >>> >>> >>> >>> >>> >>> -- >>> -- >>> Regards, >>> >>> *Paras Lehana* [65871] >>> Development Engineer, Auto-Suggest, >>> IndiaMART Intermesh Ltd. >>> >>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>> Noida, UP, IN - 201303 >>> >>> Mob.: +91-9560911996 >>> Work: 01203916600 | Extn: *8173* >>> >>> -- >>> IMPORTANT: >>> NEVER share your IndiaMART OTP/ Password with anyone. >>> >>> >>> >>> >>> >>> >>> -- >>> -- >>> Regards, >>> >>> *Paras Lehana* [65871] >>> Development Engineer, Auto-Suggest, >>> IndiaMART Intermesh Ltd. >>> >>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>> Noida, UP, IN - 201303 >>> >>> Mob.: +91-9560911996 >>> Work: 01203916600 | Extn: *8173* >>> >>> -- >>> IMPORTANT: >>> NEVER share your IndiaMART OTP/ Password with anyone. >>> >>> >>> >>> >>> >>> >>> >>> -- >>> -- >>> Regards, >>> >>> Paras Lehana [65871] >>> Development Engineer, Auto-Suggest, >>> IndiaMART Intermesh Ltd. >>> >>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>> Noida, UP, IN - 201303 >>> >>> Mob.: +91-9560911996 <tel:+91-9560911996> >>> Work: 01203916600 | Extn: 8173 >>> >>> IMPORTANT: >>> NEVER share your IndiaMART OTP/ Password with anyone. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> -- >>> Regards, >>> >>> *Paras Lehana* [65871] >>> Development Engineer, Auto-Suggest, >>> IndiaMART Intermesh Ltd. >>> >>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>> Noida, UP, IN - 201303 >>> >>> Mob.: +91-9560911996 >>> Work: 01203916600 | Extn: *8173* >>> >>> -- >>> IMPORTANT: >>> NEVER share your IndiaMART OTP/ Password with anyone. >>> >>> >>> >>> >>> >> >> -- >> -- >> Regards, >> >> *Paras Lehana* [65871] >> Development Engineer, Auto-Suggest, >> IndiaMART Intermesh Ltd. >> >> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >> Noida, UP, IN - 201303 >> >> Mob.: +91-9560911996 >> Work: 01203916600 | Extn: *8173* >> >> IMPORTANT: >> NEVER share your IndiaMART OTP/ Password with anyone. >> >> > > -- > -- > Regards, > > *Paras Lehana* [65871] > Development Engineer, Auto-Suggest, > IndiaMART Intermesh Ltd. > > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > Noida, UP, IN - 201303 > > Mob.: +91-9560911996 > Work: 01203916600 | Extn: *8173* > > -- > IMPORTANT: > NEVER share your IndiaMART OTP/ Password with anyone.