Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

Paras Lehana Thu, 14 Nov 2019 21:37:49 -0800

Hey Guilherme,

I was a bit busy for the past few days and couldn't read your mail. So, did
you find anything? Anyways, as I had expected, the culprit is definitely
among the qfs. Do the documents in concern contain dbId? I suggest you to
cross check the fields in your document with those impacting the result in
qf.


On Tue, 12 Nov 2019 at 16:14, Guilherme Viteri <gvit...@ebi.ac.uk> wrote:

> What I can't understand is:
> I search for the exact term - "Immunoregulatory interactions between a
> Lymphoid *and a* non-Lymphoid cell" and If i search "I search for the
> exact term - Immunoregulatory interactions between a Lymphoid *and 
> *non-Lymphoid
> cell" then it works
>
> On 11 Nov 2019, at 12:24, Guilherme Viteri <gvit...@ebi.ac.uk> wrote:
>
> Thanks
>
> Removing stopwords is another story. I'm curious to find the reason
> assuming that you keep on using stopwords. In some cases, stopwords are
> really necessary.
>
> Yes. It always make sense the way we've been using.
>
> If q.alt is giving you responses, it's confirmed that your stopwords filter
> is working as expected. The problem definitely lies in the configuration of
> edismax.
>
> I see.
>
> *Let me explain again:* In your solrconfig.xml, look at your /search
>
> Ok, using q now, removed all qf, performed the search and I got 23
> results, and the one I really want, on the top.
> As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I
> don't get anything (which make sense). However if I query name_exact, I get
> the 23 results again, and unfortunately if I query stId^1.0 name_exact^10.0
> I still don't get any results.
>
> In summary
> - without qf - 23 results
> - dbId - 0 results
> - name_exact - 16 results
> - name - 23 results
> - dbId^1.0
>  name_exact^10.0 - 0 results
> - 0 results if any other, stId, dbId (key) is added on top of the
> name(name_exact, etc).
>
> Definitely lost here! :-/
>
>
> On 11 Nov 2019, at 07:59, Paras Lehana <paras.leh...@indiamart.com> wrote:
>
> Hi
>
> So I don't think removing it completely is the way to go from the scenario
>
> we have
>
>
>
> Removing stopwords is another story. I'm curious to find the reason
> assuming that you keep on using stopwords. In some cases, stopwords are
> really necessary.
>
>
> Quite a considerable increase
>
>
> If q.alt is giving you responses, it's confirmed that your stopwords filter
> is working as expected. The problem definitely lies in the configuration of
> edismax.
>
>
>
> I am sorry but I didn't understand what do you want me to do exactly with
> the lst (??) and qf and bf.
>
>
>
> What combinations did you try? I was referring to the field-level boosting
> you have applied in edismax config.
>
> *Let me explain again:* In your solrconfig.xml, look at your /search
> request handler. There are many qf and some bq boosts. I want you to remove
> all of these, check response again (with q now) and keep on adding them
> again (one by one) while looking for when the numFound drastically changes.
>
> On Fri, 8 Nov 2019 at 23:47, David Hastings <hastings.recurs...@gmail.com>
> wrote:
>
> I use 3 word shingles with stopwords for my MLT ML trainer that worked
> pretty well for such a solution, but for a full index the size became
> prohibitive
>
> On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood <wun...@wunderwood.org>
> wrote:
>
> If we had IDF for phrases, they would be super effective. The 2X weight
>
> is
>
> a hack that mostly works.
>
> Infoseek had phrase IDF and it was a killer algorithm for relevance.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Nov 8, 2019, at 11:08 AM, David Hastings <
>
> hastings.recurs...@gmail.com> wrote:
>
>
> the pf and qf fields are REALLY nice for this
>
> On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood <
>
> wun...@wunderwood.org>
>
> wrote:
>
> I always enable phrase searching in edismax for exactly this reason.
>
> Something like:
>
>     <str name="qf”>title^8 keywords^4 text</str>
>     <str name="pf”>title^16 keywords^8 text^2</str>
>
> To deal with concepts in queries, a classifier and/or named entity
> extractor can be helpful. If you have a list of concepts (“controlled
> vocabulary”) that includes “Lamin A”, and that shows up in a query,
>
> that
>
> term can be queried against the field matching that vocabulary.
>
> This is how LinkedIn separates people, companies, and places, for
>
> example.
>
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Nov 8, 2019, at 10:48 AM, Erick Erickson <erickerick...@gmail.com
>
>
> wrote:
>
>
> Look at the “mm” parameter, try setting it to 100%. Although that’t
>
> not
>
> entirely likely to do what you want either since virtually every doc
>
> will
>
> have “a” in it. But at least you’d get docs that have both terms.
>
>
> you may also be able to search for things like “Lamin A” _only as a
>
> phrase_ and have some luck. But this is a gnarly problem in general.
>
> Some
>
> people have been able to substitute synonyms and/or shingles to make
>
> this
>
> work at the expense of a larger index.
>
>
> This is a generic problem with context. “Lamin A” is really a
>
> “concept”,
>
> not just two words that happen to be near each other. Searching as a
>
> phrase
>
> is an OOB-but-naive way to try to make it more likely that the ranked
> results refer to the _concept_ of “Lamin A”. The assumption here is
>
> “if
>
> these two words appear next to each other, they’re more likely to be
>
> what I
>
> want”. I say “naive” because “Lamins: A new approach to...” would
>
> _also_ be
>
> found for a naive phrase search. (I have no idea whether such a title
>
> makes
>
> sense or not, but you figured that out already)...
>
>
> To do this well you’d have to dive in to NLP/Machine learning.
>
> I truly wish we could have the DWIM search algorithm (Do What I
>
> Mean)….
>
>
> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri <gvit...@ebi.ac.uk>
>
> wrote:
>
>
> HI Walter and Paras
>
> I indexed it removing all the references to StopWordFilter and I
>
> went
>
> from 121 results to near 20K as the search term q="Lymphoid and a
> non-Lymphoid cell" is matching entities such as "IFT A" or  "Lamin A".
>
> So I
>
> don't think removing it completely is the way to go from the scenario
>
> we
>
> have, but I appreciate the suggestion…
>
>
> Yes the response is using fl=*
> I am trying some combinations at the moment, but yet no success.
>
> defType=edismax
> q.alt=Lymphoid and a non-Lymphoid cell
> Number of results=1599
> Quite a considerable increase, even though reasonable meaningful
>
> results.
>
>
> I am sorry but I didn't understand what do you want me to do exactly
>
> with the lst (??) and qf and bf.
>
>
> Thanks everyone with their inputs
>
>
> On 8 Nov 2019, at 06:45, Paras Lehana <paras.leh...@indiamart.com>
>
> wrote:
>
>
> Hi Guilherme
>
> By accident, I ended up querying the using the default handler
>
> (/select) and it worked.
>
>
> You've just found the culprit. Thanks for giving the material I
>
> requested. Your analysis chain is working as expected. I don't see any
> issue in either StopWordFilter or your boosts. I also use a boost of
>
> 50
>
> when boosting contextual suggestions (boosting "gold iphone" on a page
>
> of
>
> iphone) but I take Walter's suggestion and would try to optimize my
> weights. I agree that this 50 thing was not researched much about by
>
> us
>
> as
>
> well (we never faced performance or relevance issues).
>
>
> See the major difference in both the handlers - edismax. I'm pretty
>
> sure that your problem lies in the parsing of queries (you can confirm
>
> that
>
> from parsedquery key in debug of both JSON responses). I hope you have
> provided the response with fl=*. Replace q with q.alt in your /search
> handler query and I think you should start getting responses. That's
> because q.alt uses standard parser. If you want to keep using
>
> edisMax, I
>
> suggest you to test the responses removing some combination of lst
>
> (qf,
>
> bf)
>
> and find what's restricting the documents to come up. I'm out of
>
> office
>
> today - would have certainly tried analyzing the field values of the
> document in /select request and compare it with qf/bq in
>
> solrconfig.xml
>
> /search. Do this for me and you'd certainly find something.
>
>
> On Thu, 7 Nov 2019 at 21:00, Walter Underwood <
>
> wun...@wunderwood.org
>
> <mailto:wun...@wunderwood.org>> wrote:
>
> I normally use a weight of 8 for the most important field, like
>
> title.
>
> Other fields might get a 4 or 2.
>
>
> I add a “pf” field with the weights doubled, so that phrase matches
>
> have a higher weight.
>
>
> The weight of 8 comes from experience at Infoseek and Inktomi, two
>
> early web search engines. With different relevance algorithms and
>
> totally
>
> different evaluation and tuning systems, they settled on weights of 8
>
> and
>
> 7.5 for HTML titles. With the the two radically different system
>
> getting
>
> the same number, I decided that was a property of the documents, not
>
> of
>
> the
>
> search engines.
>
>
> wunder
> Walter Underwood
> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>
>
> (my blog)
>
>
> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gvit...@ebi.ac.uk
>
> <mailto:gvit...@ebi.ac.uk>> wrote:
>
>
> Hi Wunder,
>
> My indexer takes quite a few hours to be executed I am shortening
>
> it
>
> to run faster, but I also need to make sure it gives what we are
>
> expecting.
>
> This implementation's been there for >4y, and massively used.
>
>
> In your edismax handlers, weights of 20, 50, and 100 are
>
> extremely
>
> high. I don’t think I’ve ever used a weight higher than 16 in a dozen
>
> years
>
> of configuring Solr.
>
> I've inherited that implementation and I am really keen to
>
> adequate
>
> it, what would you recommend ?
>
>
> Cheers
> Guilherme
>
> On 7 Nov 2019, at 14:43, Walter Underwood <wun...@wunderwood.org
>
> <mailto:wun...@wunderwood.org>> wrote:
>
>
> Thanks for posting the files. Looking at schema.xml, I see that
>
> you
>
> still are using StopFilterFactory. The first advice we gave you was to
> remove that.
>
>
> Remove StopFilterFactory everywhere and reindex.
>
> You will continue to have problems matching stopwords until you
>
> do
>
> that.
>
>
> In your edismax handlers, weights of 20, 50, and 100 are
>
> extremely
>
> high. I don’t think I’ve ever used a weight higher than 16 in a dozen
>
> years
>
> of configuring Solr.
>
>
> wunder
> Walter Underwood
> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> http://observer.wunderwood.org/ <http://observer.wunderwood.org/
>
>
> (my blog)
>
>
> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk
>
> <mailto:gvit...@ebi.ac.uk>> wrote:
>
>
> Hi Paras, everyone
>
> Thank you again for your inputs and suggestions. I sorry to hear
>
> you had trouble with the attachments I will host it somewhere and
>
> share
>
> the
>
> links.
>
> I don't tweak my index, I get the data from the graph database,
>
> create a document as they are and save to solr.
>
>
> So, I am sending the new analysis screen querying the way you
>
> suggested. Also the results with params and solr query url.
>
>
> During the process of querying what you asked I found something
>
> really weird (at least for me). By accident, I ended up querying the
>
> using
>
> the default handler (/select) and it worked. Then If I use the one I
>
> must
>
> use, then sadly doesn't work. I am posting both results and I will
>
> also
>
> post the handlers as well.
>
>
> Here is the link with all the files mentioned before
>
>
>
> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0<
>
>
>
> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0>
>
> <
>
>
> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0
>
> <
>
>
> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0
>
>
> If the link doesn't work www dot dropbox dot com slash sh slash
>
> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0
>
>
> Thanks
>
> On 7 Nov 2019, at 05:23, Paras Lehana <
>
> paras.leh...@indiamart.com
>
> <mailto:paras.leh...@indiamart.com>> wrote:
>
>
> Hi Guilherme.
>
> I am sending they analysis result and the json result as
>
> requested.
>
>
>
> Thanks for the effort. Luckily, I can see your attachments (low
>
> quality
>
> though).
>
> From the analysis screen, the analysis is working as expected.
>
> One
>
> of the
>
> reasons for query="lymphoid and *a* non-lymphoid cell" not
>
> matching
>
> document containing "Lymphoid and a non-Lymphoid cell" I can
>
> initially
>
> think of is: the stopword "a" is probably present in
>
> post-analysis
>
> either
>
> of query or index. Did you tweak your index time analysis after
>
> indexing?
>
>
> Do two things:
>
> 1. Post the analysis screen for and index=*"Immunoregulatory
> interactions between a Lymphoid and a non-Lymphoid cell"* and
> "query=*"lymphoid
> and a non-lymphoid cell"*. Try hosting the image and providing
>
> the
>
> link
>
> here.
> 2. Give the same JSON output as you have sent but this time
>
> with
>
> *"echoParams=all"*. Also, post the exact Solr query url.
>
>
>
> On Wed, 6 Nov 2019 at 21:07, Erick Erickson <
>
> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> wrote:
>
>
> I don’t see the attachments, maybe I deleted old e-mails or
>
> some
>
> such. The
>
> Apache server is fairly aggressive about stripping attachments
>
> though, so
>
> it’s also possible they didn’t make it through.
>
> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <
>
> gvit...@ebi.ac.uk
>
> <mailto:gvit...@ebi.ac.uk>> wrote:
>
>
> Thanks Erick.
>
> First, your index and analysis chains are considerably
>
> different, this
>
> can easily be a source of problems. In particular, using two
>
> different
>
> tokenizers is a huge red flag. I _strongly_ recommend against
>
> this unless
>
> you’re totally sure you understand the consequences.
>
> Additionally, your use
>
> of the length filter is suspicious, especially since your
>
> problem
>
> statement
>
> is about the addition of a single letter term and the min
>
> length
>
> allowed on
>
> that filter is 2. That said, it’s reasonable to suppose that
>
> the
>
> ’a’ is
>
> filtered out in both cases, but maybe you’ve found something
>
> odd
>
> about the
>
> interactions.
>
> I will investigate the min length and post the results later.
>
> Second, I have no idea what this will do. Are the equal
>
> signs
>
> typos?
>
> Used by custom code?
>
> This the url in my application, not solr params. That's the
>
> query string.
>
>
> What does “species=“ do? That’s not Solr syntax, so it’s
>
> likely
>
> that
>
> all the params with an equal-sign are totally ignored unless
>
> it’s
>
> just a
>
> typo.
>
> This is part of the application. Species will be used later
>
> on
>
> in solr
>
> to filter out the result. That's not solr. That my app params.
>
>
> Third, the easiest way to see what’s happening under the
>
> covers
>
> is to
>
> add “&debug=true” to the query and look at the parsed query.
>
> Ignore all the
>
> relevance calculations for the nonce, or specify
>
> “&debug=query”
>
> to skip
>
> that part.
>
> The two json files i've sent, they are debugQuery=on and the
>
> explain tag
>
> is present.
>
> I will try the searching the way you mentioned.
>
> Thank for your inputs
>
> Guilherme
>
> On 6 Nov 2019, at 14:14, Erick Erickson <
>
> erickerick...@gmail.com <mailto:erickerick...@gmail.com>>
>
> wrote:
>
>
> Fwd to another server
>
> First, your index and analysis chains are considerably
>
> different, this
>
> can easily be a source of problems. In particular, using two
>
> different
>
> tokenizers is a huge red flag. I _strongly_ recommend against
>
> this unless
>
> you’re totally sure you understand the consequences.
>
> Additionally, your use
>
> of the length filter is suspicious, especially since your
>
> problem
>
> statement
>
> is about the addition of a single letter term and the min
>
> length
>
> allowed on
>
> that filter is 2. That said, it’s reasonable to suppose that
>
> the
>
> ’a’ is
>
> filtered out in both cases, but maybe you’ve found something
>
> odd
>
> about the
>
> interactions.
>
>
> Second, I have no idea what this will do. Are the equal
>
> signs
>
> typos?
>
> Used by custom code?
>
>
>
>
>
>
>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>
> <
>
>
>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>
>
>
> What does “species=“ do? That’s not Solr syntax, so it’s
>
> likely
>
> that
>
> all the params with an equal-sign are totally ignored unless
>
> it’s
>
> just a
>
> typo.
>
>
> Third, the easiest way to see what’s happening under the
>
> covers
>
> is to
>
> add “&debug=true” to the query and look at the parsed query.
>
> Ignore all the
>
> relevance calculations for the nonce, or specify
>
> “&debug=query”
>
> to skip
>
> that part.
>
>
> 90% + of the time, the question “why didn’t this query do
>
> what I
>
> expect” is answered by looking at the “&debug=query” output
>
> and
>
> the
>
> analysis page in the admin UI. NOTE: for the analysis page be
>
> sure to look
>
> at _both_ the query and index output. Also, and very important
>
> about the
>
> analysis page (and this is confusing) is that this _assumes_
>
> that
>
> what you
>
> put in the text boxes have made it through the query parser
>
> intact and is
>
> analyzed by the field selected. Consider the search
>
> "q=field:word1 word2".
>
> Now you type “word1 word2” into the analysis text box and it
>
> looks like
>
> what you expect. That’s misleading because the query is
>
> _parsed_
>
> as
>
> "field:word1 default_search_field:word2”. This is where
>
> “&debug=query”
>
> helps.
>
>
> Best,
> Erick
>
> On Nov 6, 2019, at 2:36 AM, Paras Lehana <
>
> paras.leh...@indiamart.com <mailto:paras.leh...@indiamart.com>>
>
> wrote:
>
>
> Hi Walter,
>
> The solr.StopFilter removes all tokens that are stopwords.
>
> Those words
>
> will
>
> not be in the index, so they can never match a query.
>
>
>
> I think the OP's concern is different results when adding a
>
> stopword. I
>
> think he's using the filter factory correctly - the query
>
> chain
>
> includes
>
> the filter as well so it should remove "a" while querying.
>
> *@Guilherme*, please post results for both the query, the
>
> document in
>
> result you are concerned about and post full result of
>
> analysis screen
>
> (for
>
> both query and index).
>
> On Tue, 5 Nov 2019 at 21:38, Walter Underwood <
>
> wun...@wunderwood.org <mailto:wun...@wunderwood.org>>
>
> wrote:
>
>
> No.
>
> The solr.StopFilter removes all tokens that are stopwords.
>
> Those words
>
> will not be in the index, so they can never match a query.
>
> 1. Remove the lines with solr.StopFilter from every
>
> analysis
>
> chain in
>
> schema.xml.
> 2. Reload the collection, restart Solr, or whatever to
>
> read
>
> the new
>
> config.
>
> 3. Reindex all of the documents.
>
> When indexed with the new analysis chain, the stopwords
>
> will
>
> not be
>
> removed and they will be searchable.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> http://observer.wunderwood.org/ <
>
> http://observer.wunderwood.org/>  (my blog)
>
>
> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri <
>
> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>
>
> wrote:
>
>
> Ok. I am kind a lost now.
> If I open up the console > analysis and perform it,
>
> that's
>
> the final
>
> result.
>
> <Screenshot 2019-11-05 at 14.54.16.png>
>
> Your suggestion is: get rid of the <filter stopword.txt>
>
> in
>
> the
>
> schema.xml and during index phase replaceAll("in
>
> stopwords.txt"," ")
>
> then
>
> add to solr. Is that correct ?
>
>
> Thanks David
>
> On 5 Nov 2019, at 14:48, David Hastings <
>
> hastings.recurs...@gmail.com <mailto:
>
> hastings.recurs...@gmail.com
>
>
> <mailto:hastings.recurs...@gmail.com <mailto:
>
> hastings.recurs...@gmail.com>>> wrote:
>
>
> Fwd to another server
>
> no,
>   <filter class="solr.StopFilterFactory"
>
> ignoreCase="true"
>
> words="stopwords.txt"/>
>
> is still using stopwords and should be removed, in my
>
> opinion of
>
> course,
>
> based on your use case may be different, but i generally
>
> axe any
>
> reference
>
> to them at all
>
> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri <
>
> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>
>
> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>>
>
> wrote:
>
>
> Thanks.
> Haven't I done this here ?
> <fieldType name="text_field" class="solr.TextField"
> positionIncrementGap="100" omitNorms="false" >
> <analyzer type="index">
>   <tokenizer class="solr.StandardTokenizerFactory"/>
>   <filter class="solr.ClassicFilterFactory"/>
>   <filter class="solr.LengthFilterFactory" min="2"
>
> max="20"/>
>
>   <filter class="solr.LowerCaseFilterFactory"/>
>   <filter class="solr.StopFilterFactory"
>
> ignoreCase="true"
>
> words="stopwords.txt"/>
> </analyzer>
>
>
> On 5 Nov 2019, at 14:15, David Hastings <
>
> hastings.recurs...@gmail.com <mailto:
>
> hastings.recurs...@gmail.com
>
>
> <mailto:hastings.recurs...@gmail.com <mailto:
>
> hastings.recurs...@gmail.com>>>
>
> wrote:
>
>
> Fwd to another server
>
> The first thing you should do is remove any reference
>
> to
>
> stop
>
> words
>
> and
>
> never use them, then re-index your data and try it
>
> again.
>
>
> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri <
>
> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>
>
> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>>
>
> wrote:
>
>
> Hi,
>
> I am performing a search to match a name
>
> (text_field),
>
> however
>
> this
>
> term
>
> contains 'and' and 'a' and it doesn't return any
>
> records. If i
>
> remove
>
> 'a'
>
> then it works.
> e.g
> Search Term: lymphoid and a non-lymphoid cell
> doesn't work:
>
>
>
>
>
>
>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>
> <
>
>
>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>
>
> <
>
>
>
>
>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>
> <
>
>
>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>
>
>
> <
>
>
>
>
>
>
>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>
> <
>
>
>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>
>
>
>
> Search term: lymphoid and non-lymphoid cell
> works:
>
>
>
>
>
>
>
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>
> <
>
>
>
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>
>
> <
>
>
>
>
>
>
>
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>
> <
>
>
>
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>
>
>
> interested in the first result
>
> schema.xml
> <field name="name"
>
> type="text_field"
>
> indexed="true"  stored="true"   omitNorms="false"
>
> required="true"
>
> multiValued="false"/>
>
> <analyzer type="query">
>   <tokenizer class="solr.PatternTokenizerFactory"
> pattern="[^a-zA-Z0-9/._:]"/>
>   <filter class="solr.PatternReplaceFilterFactory"
> pattern="^[/._:]+" replacement=""/>
>   <filter class="solr.PatternReplaceFilterFactory"
> pattern="[/._:]+$" replacement=""/>
>   <filter class="solr.PatternReplaceFilterFactory"
> pattern="[_]" replacement=" "/>
>   <filter class="solr.LengthFilterFactory" min="2"
>
> max="20"/>
>
>   <filter class="solr.LowerCaseFilterFactory"/>
>   <filter class="solr.StopFilterFactory"
>
> ignoreCase="true"
>
> words="stopwords.txt"/>
> </analyzer>
>
> <fieldType name="text_field" class="solr.TextField"
> positionIncrementGap="100" omitNorms="false" >
> <analyzer type="index">
>   <tokenizer
>
> class="solr.StandardTokenizerFactory"/>
>
>   <filter class="solr.ClassicFilterFactory"/>
>   <filter class="solr.LengthFilterFactory" min="2"
>
> max="20"/>
>
>   <filter class="solr.LowerCaseFilterFactory"/>
>   <filter class="solr.StopFilterFactory"
>
> ignoreCase="true"
>
> words="stopwords.txt"/>
> </analyzer>
> <analyzer type="query">
>   <tokenizer class="solr.PatternTokenizerFactory"
> pattern="[^a-zA-Z0-9/._:]"/>
>   <filter class="solr.PatternReplaceFilterFactory"
> pattern="^[/._:]+" replacement=""/>
>   <filter class="solr.PatternReplaceFilterFactory"
> pattern="[/._:]+$" replacement=""/>
>   <filter class="solr.PatternReplaceFilterFactory"
> pattern="[_]" replacement=" "/>
>   <filter class="solr.LengthFilterFactory" min="2"
>
> max="20"/>
>
>   <filter class="solr.LowerCaseFilterFactory"/>
>   <filter class="solr.StopFilterFactory"
>
> ignoreCase="true"
>
> words="stopwords.txt"/>
> </analyzer>
> </fieldType>
>
> stopwords.txt
> #Standard english stop words taken from Lucene's
>
> StopAnalyzer
>
> a
> b
> c
> ....
> an
> and
> are
>
> Running SolR 6.6.2.
>
> Is there anything I could do to prevent this ?
>
> Thanks
> Guilherme
>
>
>
>
>
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> --
> IMPORTANT:
> NEVER share your IndiaMART OTP/ Password with anyone.
>
>
>
>
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> --
> IMPORTANT:
> NEVER share your IndiaMART OTP/ Password with anyone.
>
>
>
>
>
>
>
> --
> --
> Regards,
>
> Paras Lehana [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996 <tel:+91-9560911996>
> Work: 01203916600 | Extn:  8173
>
> IMPORTANT:
> NEVER share your IndiaMART OTP/ Password with anyone.
>
>
>
>
>
>
>
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> --
> IMPORTANT:
> NEVER share your IndiaMART OTP/ Password with anyone.
>
>
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

Reply via email to