I don’t see the attachments, maybe I deleted old e-mails or some such. The Apache server is fairly aggressive about stripping attachments though, so it’s also possible they didn’t make it through.
> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <gvit...@ebi.ac.uk> wrote: > > Thanks Erick. > >> First, your index and analysis chains are considerably different, this can >> easily be a source of problems. In particular, using two different >> tokenizers is a huge red flag. I _strongly_ recommend against this unless >> you’re totally sure you understand the consequences. Additionally, your use >> of the length filter is suspicious, especially since your problem statement >> is about the addition of a single letter term and the min length allowed on >> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is >> filtered out in both cases, but maybe you’ve found something odd about the >> interactions. > I will investigate the min length and post the results later. > >> Second, I have no idea what this will do. Are the equal signs typos? Used by >> custom code? > This the url in my application, not solr params. That's the query string. > >> What does “species=“ do? That’s not Solr syntax, so it’s likely that all the >> params with an equal-sign are totally ignored unless it’s just a typo. > This is part of the application. Species will be used later on in solr to > filter out the result. That's not solr. That my app params. > >> Third, the easiest way to see what’s happening under the covers is to add >> “&debug=true” to the query and look at the parsed query. Ignore all the >> relevance calculations for the nonce, or specify “&debug=query” to skip that >> part. > The two json files i've sent, they are debugQuery=on and the explain tag is > present. > I will try the searching the way you mentioned. > > Thank for your inputs > > Guilherme > >> On 6 Nov 2019, at 14:14, Erick Erickson <erickerick...@gmail.com> wrote: >> >> Fwd to another server >> >> First, your index and analysis chains are considerably different, this can >> easily be a source of problems. In particular, using two different >> tokenizers is a huge red flag. I _strongly_ recommend against this unless >> you’re totally sure you understand the consequences. Additionally, your use >> of the length filter is suspicious, especially since your problem statement >> is about the addition of a single letter term and the min length allowed on >> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is >> filtered out in both cases, but maybe you’ve found something odd about the >> interactions. >> >> Second, I have no idea what this will do. Are the equal signs typos? Used by >> custom code? >> >>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> >> What does “species=“ do? That’s not Solr syntax, so it’s likely that all the >> params with an equal-sign are totally ignored unless it’s just a typo. >> >> Third, the easiest way to see what’s happening under the covers is to add >> “&debug=true” to the query and look at the parsed query. Ignore all the >> relevance calculations for the nonce, or specify “&debug=query” to skip that >> part. >> >> 90% + of the time, the question “why didn’t this query do what I expect” is >> answered by looking at the “&debug=query” output and the analysis page in >> the admin UI. NOTE: for the analysis page be sure to look at _both_ the >> query and index output. Also, and very important about the analysis page >> (and this is confusing) is that this _assumes_ that what you put in the text >> boxes have made it through the query parser intact and is analyzed by the >> field selected. Consider the search "q=field:word1 word2". Now you type >> “word1 word2” into the analysis text box and it looks like what you expect. >> That’s misleading because the query is _parsed_ as "field:word1 >> default_search_field:word2”. This is where “&debug=query” helps. >> >> Best, >> Erick >> >>> On Nov 6, 2019, at 2:36 AM, Paras Lehana <paras.leh...@indiamart.com> wrote: >>> >>> Hi Walter, >>> >>> The solr.StopFilter removes all tokens that are stopwords. Those words will >>>> not be in the index, so they can never match a query. >>> >>> >>> I think the OP's concern is different results when adding a stopword. I >>> think he's using the filter factory correctly - the query chain includes >>> the filter as well so it should remove "a" while querying. >>> >>> *@Guilherme*, please post results for both the query, the document in >>> result you are concerned about and post full result of analysis screen (for >>> both query and index). >>> >>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood <wun...@wunderwood.org> wrote: >>> >>>> No. >>>> >>>> The solr.StopFilter removes all tokens that are stopwords. Those words >>>> will not be in the index, so they can never match a query. >>>> >>>> 1. Remove the lines with solr.StopFilter from every analysis chain in >>>> schema.xml. >>>> 2. Reload the collection, restart Solr, or whatever to read the new config. >>>> 3. Reindex all of the documents. >>>> >>>> When indexed with the new analysis chain, the stopwords will not be >>>> removed and they will be searchable. >>>> >>>> wunder >>>> Walter Underwood >>>> wun...@wunderwood.org >>>> http://observer.wunderwood.org/ (my blog) >>>> >>>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk> wrote: >>>>> >>>>> Ok. I am kind a lost now. >>>>> If I open up the console > analysis and perform it, that's the final >>>> result. >>>>> <Screenshot 2019-11-05 at 14.54.16.png> >>>>> >>>>> Your suggestion is: get rid of the <filter stopword.txt> in the >>>> schema.xml and during index phase replaceAll("in stopwords.txt"," ") then >>>> add to solr. Is that correct ? >>>>> >>>>> Thanks David >>>>> >>>>>> On 5 Nov 2019, at 14:48, David Hastings <hastings.recurs...@gmail.com >>>> <mailto:hastings.recurs...@gmail.com>> wrote: >>>>>> >>>>>> Fwd to another server >>>>>> >>>>>> no, >>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>>>> words="stopwords.txt"/> >>>>>> >>>>>> is still using stopwords and should be removed, in my opinion of course, >>>>>> based on your use case may be different, but i generally axe any >>>> reference >>>>>> to them at all >>>>>> >>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri <gvit...@ebi.ac.uk >>>> <mailto:gvit...@ebi.ac.uk>> wrote: >>>>>> >>>>>>> Thanks. >>>>>>> Haven't I done this here ? >>>>>>> <fieldType name="text_field" class="solr.TextField" >>>>>>> positionIncrementGap="100" omitNorms="false" > >>>>>>> <analyzer type="index"> >>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>>>>> <filter class="solr.ClassicFilterFactory"/> >>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>> max="20"/> >>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>>>>> words="stopwords.txt"/> >>>>>>> </analyzer> >>>>>>> >>>>>>> >>>>>>>> On 5 Nov 2019, at 14:15, David Hastings <hastings.recurs...@gmail.com >>>> <mailto:hastings.recurs...@gmail.com>> >>>>>>> wrote: >>>>>>>> >>>>>>>> Fwd to another server >>>>>>>> >>>>>>>> The first thing you should do is remove any reference to stop words >>>> and >>>>>>>> never use them, then re-index your data and try it again. >>>>>>>> >>>>>>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri <gvit...@ebi.ac.uk >>>> <mailto:gvit...@ebi.ac.uk>> >>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am performing a search to match a name (text_field), however this >>>> term >>>>>>>>> contains 'and' and 'a' and it doesn't return any records. If i remove >>>>>>> 'a' >>>>>>>>> then it works. >>>>>>>>> e.g >>>>>>>>> Search Term: lymphoid and a non-lymphoid cell >>>>>>>>> doesn't work: >>>>>>>>> >>>>>>> >>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>> < >>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>> >>>>>>>>> < >>>>>>>>> >>>>>>> >>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>>>>>> >>>>>>>>> >>>>>>>>> Search term: lymphoid and non-lymphoid cell >>>>>>>>> works: >>>>>>>>> >>>>>>> >>>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>>>>> < >>>>>>>>> >>>>>>> >>>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>>>>>> >>>>>>>>> interested in the first result >>>>>>>>> >>>>>>>>> schema.xml >>>>>>>>> <field name="name" type="text_field" >>>>>>>>> indexed="true" stored="true" omitNorms="false" required="true" >>>>>>>>> multiValued="false"/> >>>>>>>>> >>>>>>>>> <analyzer type="query"> >>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" >>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> >>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>> pattern="^[/._:]+" replacement=""/> >>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>> pattern="[/._:]+$" replacement=""/> >>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>> pattern="[_]" replacement=" "/> >>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>> max="20"/> >>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>> <filter class="solr.StopFilterFactory" >>>> ignoreCase="true" >>>>>>>>> words="stopwords.txt"/> >>>>>>>>> </analyzer> >>>>>>>>> >>>>>>>>> <fieldType name="text_field" class="solr.TextField" >>>>>>>>> positionIncrementGap="100" omitNorms="false" > >>>>>>>>> <analyzer type="index"> >>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>>>>>>> <filter class="solr.ClassicFilterFactory"/> >>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>> max="20"/> >>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>> <filter class="solr.StopFilterFactory" >>>> ignoreCase="true" >>>>>>>>> words="stopwords.txt"/> >>>>>>>>> </analyzer> >>>>>>>>> <analyzer type="query"> >>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" >>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> >>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>> pattern="^[/._:]+" replacement=""/> >>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>> pattern="[/._:]+$" replacement=""/> >>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>> pattern="[_]" replacement=" "/> >>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>> max="20"/> >>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>> <filter class="solr.StopFilterFactory" >>>> ignoreCase="true" >>>>>>>>> words="stopwords.txt"/> >>>>>>>>> </analyzer> >>>>>>>>> </fieldType> >>>>>>>>> >>>>>>>>> stopwords.txt >>>>>>>>> #Standard english stop words taken from Lucene's StopAnalyzer >>>>>>>>> a >>>>>>>>> b >>>>>>>>> c >>>>>>>>> .... >>>>>>>>> an >>>>>>>>> and >>>>>>>>> are >>>>>>>>> >>>>>>>>> Running SolR 6.6.2. >>>>>>>>> >>>>>>>>> Is there anything I could do to prevent this ? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Guilherme >>>>>>> >>>>>>> >>>>> >>>> >>>> >>> >>> -- >>> -- >>> Regards, >>> >>> *Paras Lehana* [65871] >>> Development Engineer, Auto-Suggest, >>> IndiaMART Intermesh Ltd. >>> >>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>> Noida, UP, IN - 201303 >>> >>> Mob.: +91-9560911996 >>> Work: 01203916600 | Extn: *8173* >>> >>> -- >>> IMPORTANT: >>> NEVER share your IndiaMART OTP/ Password with anyone. >> >