Re: Phrase query no hits when stopwords and FlattenGraphFilterFactory used

2020-11-11 Thread Edward Turner
Many thanks Walter, that's useful information. And yes, if we are able to keep stopwords, then we will. We have been exploring it because we've noticed its use leads to a sizable drop in index size (5%, in some of our tests), which then had the knock on effect of better performa

Re: Phrase query no hits when stopwords and FlattenGraphFilterFactory used

2020-11-10 Thread Walter Underwood
By far the simplest solution is to leave stopwords in the index. That also improves relevance, because it becomes possible to search for “vitamin a” or “to be or not to be”. Stopword remove was a performance and disk space hack from the 1960s. It is no longer needed. We were keeping stopwords

Re: Phrase query no hits when stopwords and FlattenGraphFilterFactory used

2020-11-10 Thread Edward Turner
Hi all, Okay, I've been doing more research about this problem and from what I understand, phrase queries + stopwords are known to have some difficulties working together in some circumstances. E.g., https://stackoverflow.com/questions/56802656/stopwords-and-phrase-queries-solr?rq=1

Phrase query no hits when stopwords and FlattenGraphFilterFactory used

2020-11-06 Thread Edward Turner
Hi all, We are experiencing some unexpected behaviour for phrase queries which we believe might be related to the FlattenGraphFilterFactory and stopwords. Brief description: when performing a phrase query "Molecular cloning and evolution of the" => we get expected hits "Mol

Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Mark Robinson
Thanks! Mark On Tue, Oct 27, 2020 at 11:56 AM Dave wrote: > Agreed. Just a JavaScript check on the input box would work fine for 99% > of cases, unless something automatic is running them in which case just > server side redirect back to the form. > > > On Oct 27, 2020, at 11:54 AM, Mark Robins

Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Dave
Agreed. Just a JavaScript check on the input box would work fine for 99% of cases, unless something automatic is running them in which case just server side redirect back to the form. > On Oct 27, 2020, at 11:54 AM, Mark Robinson wrote: > > Hi Konstantinos , > > Thanks for the reply. > I t

Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Mark Robinson
Hi Konstantinos , Thanks for the reply. I too feel the same. Wanted to find what others also in the Solr world thought about it. Thanks! Mark. On Tue, Oct 27, 2020 at 11:45 AM Konstantinos Koukouvis < konstantinos.koukou...@mecenat.com> wrote: > Oh hi Mark! > > Why would you wanna do such a th

Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Konstantinos Koukouvis
Oh hi Mark! Why would you wanna do such a thing in the solr end. Imho it would be much more clean and easy to do it on the client side Regards, Konstantinos > On 27 Oct 2020, at 16:42, Mark Robinson wrote: > > Hello, > > I want to block queries having only a digit like "1" or "2" ,... o

Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Mark Robinson
Hello, I want to block queries having only a digit like "1" or "2" ,... or just a letter like "a" or "b" ... Is it a good idea to block them ... ie just single digits 0 - 9 and a - z by putting them as a stop word? The problem with this I can anticipate is a query like "1 inch screw" can hav

RE: advice on whether to use stopwords for use case

2020-10-01 Thread Markus Jelsma
-solr.PatternReplaceCharFilterFactory -Original message- > From:Walter Underwood > Sent: Thursday 1st October 2020 18:20 > To: solr-user@lucene.apache.org > Subject: Re: advice on whether to use stopwords for use case > > I can’t think of an easy way to do this in Solr.

Re: advice on whether to use stopwords for use case

2020-10-01 Thread Walter Underwood
> Thinking further, using stopwords for this, there will still be results > return when the number of words in the search keywords is more than the > stopwords. > > On 1/10/2020 2:57 am, Walter Underwood wrote: >> I’m not clear on the requirements. It sounds like the que

Re: advice on whether to use stopwords for use case

2020-09-30 Thread Derek Poh
Yes, the requirements (for now) is not to return any results. I think they may change the requirements,pending their return from the holidays. If so, then check for those words in the query before sending it to Solr. That is what I think so too. Thinking further, using stopwords for this

Re: advice on whether to use stopwords for use case

2020-09-30 Thread Derek Poh
Hi Alex The business requirement (for now) is not to return any result when the search keywords are cigarette related. The business user team will provide the list of the cigarette related keywords. Will digest, explore and research on your suggestions. Thank you. On 30/9/2020 10:56 am, Alex

Re: advice on whether to use stopwords for use case

2020-09-30 Thread Walter Underwood
I’m not clear on the requirements. It sounds like the query “cigar” or “cuban cigar” should return zero results. Is that right? If so, then check for those words in the query before sending it to Solr. But the stopwords approach seems like the requirement is different. Could you give some

Re: advice on whether to use stopwords for use case

2020-09-30 Thread Alexandre Rafalovitch
You may also want to look at something like: https://docs.querqy.org/index.html ApacheCon had (is having..) a presentation on it that seemed quite relevant to your needs. The videos should be live in a week or so. Regards, Alex. On Tue, 29 Sep 2020 at 22:56, Alexandre Rafalovitch wrote: > >

Re: advice on whether to use stopwords for use case

2020-09-29 Thread Alexandre Rafalovitch
I am not sure why you think stop words are your first choice. Maybe I misunderstand the question. I read it as that you need to exclude completely a set of documents that include specific keywords when called from specific module. If I wanted to differentiate the searches from specific module, I w

advice on whether to use stopwords for use case

2020-09-29 Thread Derek Poh
Hi I have read in the mailings list that we should try to avoid using stop words. I have a use case where I would like to know if there is other alternative solutions beside using stop words. There is business requirement to return zero result when the search is cigarette related words and

Re: Constant score and stopwords strange behaviour

2020-06-25 Thread Paras Lehana
Hi, You can also change the multiplication factor in TF IDF snipped in the source code to 1 also. I know there would be a better method to handle stopwords now that you have used constant scoring but I wanted to mention my method by what we got rid of TF. On Thu, 25 Jun 2020 at 03:02, dbourassa

Constant score and stopwords strange behaviour

2020-06-24 Thread dbourassa
Hi, I'm working on a Solr core where we don't want to use TF-IDF (BM25). We rank documents with boost based on popularity, exact match, phrase match, etc. To bypass TF-IDF, we use constant score like this "q=harry^=0.5 potter^=0.5" (score is always 1 before boost) We have just noticed a strange b

Re: Dynamic Stopwords

2020-05-15 Thread Tim Casey
What I have done for this in the past is calculating the expected value of a symbol within a universe. Then calculating the difference between expected value and the actual value at the time you see a symbol. Take the difference and use the most surprising symbols, in rank order from most surpris

Re: Dynamic Stopwords

2020-05-15 Thread A Adel
Yes, significant terms have been calculated but they have the anomaly or relative shift nature rather than the high frequency, as suggested also by the blog post. So, it looks that adding a preprocessing step upstream in an additional field makes more sense in this case. The text is intrinsically n

Re: Dynamic Stopwords

2020-05-15 Thread Tim Casey
You do not need stop words to do what you need to do, For one thing, stop words requires a segmentation on a phrase-by-phrase basis in some cases. That is, especially in places like Europe, there is a lot of mixed language. (Your milage may vary :). In order to do what you want, you really need t

Re: Dynamic Stopwords

2020-05-15 Thread Walter Underwood
Right. I might use NLP to pull out noun phrases and entities. Entities are essential noun phrases with proper nouns. Put those in a separate field and build the word cloud from that. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 15, 2020, at 1

Re: Dynamic Stopwords

2020-05-15 Thread Doug Turnbull
You may want something more like "significant terms" - terms statistically significant in a document. Possibly not just based on doc freq https://saumitra.me/blog/solr-significant-terms/ On Fri, May 15, 2020 at 2:16 PM A Adel wrote: > Hi Walter, > > Thank you for your explanation, I understand

Re: Dynamic Stopwords

2020-05-15 Thread A Adel
Hi Walter, Thank you for your explanation, I understand the point and agree with you. However, the use case at hand is building a word cloud based on faceting the multilingual text field (very simple) which in case of not using stop words returns many generic terms, articles, etc. If stop words fi

Re: Dynamic Stopwords

2020-05-15 Thread Walter Underwood
Just don’t use stop words. That will give much better relevance and works for all languages. Stop words are an obsolete hack from the days of search engines running on 16 bit CPUs. They save space by throwing away important information. The classic example is “to be or not to be”, which is made

Dynamic Stopwords

2020-05-14 Thread A Adel
Hi - Is there a way to configure stop words to be dynamic for each document based on the language detected of a multilingual text field? Combining all languages stop words in one set is a possibility however it introduces false positives for some language combinations, such as German and English. T

Re: Stopwords impact on search

2020-04-26 Thread Steven White
IDF and stopword removal are different approaches to the same thing. > > Removing stopwords is a binary decision on how important common words > are for search. It says some words are completely useless. > > IDF is a proportional measure on how important common words are for search. &g

Re: Stopwords impact on search

2020-04-24 Thread Walter Underwood
IDF and stopword removal are different approaches to the same thing. Removing stopwords is a binary decision on how important common words are for search. It says some words are completely useless. IDF is a proportional measure on how important common words are for search. Instead of removing a

Re: Stopwords impact on search

2020-04-24 Thread Steven White
Hi everyone, I get it why and when if stopwords are note indexed is a bad idea and can give you 0 or incomplete results. But what about the quality of search result when stopwords are indexed vs. not indexed? 1) Stopwords are removed and I do word search, not phrase for "solr and lucene a

Re: Stopwords impact on search

2020-04-24 Thread Walter Underwood
On Fri, Apr 24, 2020 at 8:33 AM Steven White wrote: >>> >>>> Hi everyone, >>>> >>>> What is, if any, the impact of stopwords in to my search ranking quality? >>>> Will my ranking improve is I do not index stopwords? >>>> >>>> I'm trying to figure out if I should use the stopword filter or not. >>>> >>>> Thanks in advanced. >>>> >>>> Steve >>>> >> >

Re: Stopwords impact on search

2020-04-24 Thread Jan Høydahl
t; you should never use the stopword filter unless you have a very specific >>> purpose >>> >>> On Fri, Apr 24, 2020 at 8:33 AM Steven White wrote: >>> >>>> Hi everyone, >>>> >>>> What is, if any, the impact of stopwords in

Re: Stopwords impact on search

2020-04-24 Thread Rohan Kasat
So do we use stopwords filter as part of query analyzer, to avoid highlighting of these stop words ? Regards, Rohan On Fri, Apr 24, 2020 at 7:45 AM Walter Underwood wrote: > Agreed. Here is an article from 13 years ago when I accidentally turned on > stopword removal at Netflix. It caus

Re: Stopwords impact on search

2020-04-24 Thread Walter Underwood
Agreed. Here is an article from 13 years ago when I accidentally turned on stopword removal at Netflix. It caused bad problems. https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/ Infoseek was not removing stopwords when I joined them in 1996. Since then, I’ve always left

Re: Stopwords impact on search

2020-04-24 Thread Erick Erickson
> >> you should never use the stopword filter unless you have a very specific >> purpose >> >> On Fri, Apr 24, 2020 at 8:33 AM Steven White wrote: >> >>> Hi everyone, >>> >>> What is, if any, the impact of stopwords in to my search ran

Re: Stopwords impact on search

2020-04-24 Thread Jan Høydahl
at 8:33 AM Steven White wrote: > >> Hi everyone, >> >> What is, if any, the impact of stopwords in to my search ranking quality? >> Will my ranking improve is I do not index stopwords? >> >> I'm trying to figure out if I should use the stopword filter or not. >> >> Thanks in advanced. >> >> Steve >>

Re: Stopwords impact on search

2020-04-24 Thread David Hastings
you should never use the stopword filter unless you have a very specific purpose On Fri, Apr 24, 2020 at 8:33 AM Steven White wrote: > Hi everyone, > > What is, if any, the impact of stopwords in to my search ranking quality? > Will my ranking improve is I do not index stopwords? &g

Stopwords impact on search

2020-04-24 Thread Steven White
Hi everyone, What is, if any, the impact of stopwords in to my search ranking quality? Will my ranking improve is I do not index stopwords? I'm trying to figure out if I should use the stopword filter or not. Thanks in advanced. Steve

Re: handling stopwords for special scenarios

2020-04-09 Thread Walter Underwood
Agreed, leave the stopwords alone. I ran into this same problem thirteen years ago at Netflix. Even before that, I wasn’t removing stopwords, but I accidentally left them in the Solr 1.3 config. https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/ wunder Walter Underwood

Re: handling stopwords for special scenarios

2020-04-09 Thread Erick Erickson
1> why use stopwords at all? They’re largely a holdover from the bad old days when memory was limited. I usually recommend people just start by not using stopwords at all. 2> assuming <1> doesn’t work for you, why doesn’t it look feasible to remove here from the s

handling stopwords for special scenarios

2020-04-09 Thread rashi gandhi
Hi All, We are using stopword filter factory at both index and search time, to omit the stopwords. However, for a one particular case, we are getting "here" as a search query and "here" is one the words in title/name representing our client. We are returning zero results as

Re: Weird issues when using synonyms and stopwords together

2020-03-20 Thread Walter Underwood
Do not remove stopwords. Stopword removal was a hack invented for 16-bit machines and multi-megabyte disks. That hack is not needed now. tf.idf addresses the same problem as stopwords with a much better algorithm. Removing stopwords is an on/off decision for a guess at common words. tf.idf is a

Weird issues when using synonyms and stopwords together

2020-03-20 Thread Vikas Kumar
using multi-word synonyms which contain stopwords. If the stopwords appear in the middle, it works fine. For example, if I have the following in my synonyms file (where i is a stopword): iphone, apple i phone And if I query: /select?q=iphone&qf=title&defType=edismax The pa

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread Walter Underwood
;interesting phrases" for my >>> machine teacher/students, so i wouldnt say theres no reason, however my >> use >>> case is very specific. Otherwise yeah, theyre gone for all practical >>> reasons/search scenarios. >>> >>> On Mon, Feb 17, 2020 a

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread David Hastings
rios. > > > > On Mon, Feb 17, 2020 at 1:41 PM Walter Underwood > > wrote: > > > >> Why are you using stopwords? I would need a really, really good reason > to > >> use those. > >> > >> Stopwords are an obsolete technique from 16-bi

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread Walter Underwood
rios. > > On Mon, Feb 17, 2020 at 1:41 PM Walter Underwood > wrote: > >> Why are you using stopwords? I would need a really, really good reason to >> use those. >> >> Stopwords are an obsolete technique from 16-bit processors. I’ve never >> used th

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread David Hastings
Underwood wrote: > Why are you using stopwords? I would need a really, really good reason to > use those. > > Stopwords are an obsolete technique from 16-bit processors. I’ve never > used them and > I’ve been a search engineer since 1997. > > wunder > Walter Und

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread Walter Underwood
Why are you using stopwords? I would need a really, really good reason to use those. Stopwords are an obsolete technique from 16-bit processors. I’ve never used them and I’ve been a search engineer since 1997. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my

Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread Thomas Corthals
Hi I've run into an issue with creating a Managed Stopwords list that has the same name as a previously deleted list. Going through the same flow with Managed Synonyms doesn't result in this unexpected behaviour. Am I missing something or did I discover a bug in Solr? On a newly st

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-20 Thread Guilherme Viteri
nd I was searching on an ID field, which wouldn't > make sense. > (I will come back to this soon.) > > Ok, I've been adding and removing fields in the qf and I could isolate half > of the problem. First, I have one type of field called keyword_field and I > added the StopWo

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-18 Thread Guilherme Viteri
s searching on an ID field, which wouldn't make sense. (I will come back to this soon.) Ok, I've been adding and removing fields in the qf and I could isolate half of the problem. First, I have one type of field called keyword_field and I added the StopWords filter for this field and

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-17 Thread Paras Lehana
ot;I search for the >> exact term - Immunoregulatory interactions between a Lymphoid *and >> *non-Lymphoid >> cell" then it works >> >> On 11 Nov 2019, at 12:24, Guilherme Viteri wrote: >> >> Thanks >> >> Removing stopwords is another story

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-14 Thread Guilherme Viteri
If i search "I search for the exact >> term - Immunoregulatory interactions between a Lymphoid and non-Lymphoid >> cell" then it works >> >>> On 11 Nov 2019, at 12:24, Guilherme Viteri wrote: >>> >>> Thanks >>>> Removing sto

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-14 Thread Paras Lehana
term - Immunoregulatory interactions between a Lymphoid *and > *non-Lymphoid > cell" then it works > > On 11 Nov 2019, at 12:24, Guilherme Viteri wrote: > > Thanks > > Removing stopwords is another story. I'm curious to find the reason > assuming that you keep on

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-12 Thread Guilherme Viteri
gt; On 11 Nov 2019, at 12:24, Guilherme Viteri wrote: > > Thanks >> Removing stopwords is another story. I'm curious to find the reason >> assuming that you keep on using stopwords. In some cases, stopwords are >> really necessary. > Yes. It always make sense the

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-11 Thread Guilherme Viteri
Thanks > Removing stopwords is another story. I'm curious to find the reason > assuming that you keep on using stopwords. In some cases, stopwords are > really necessary. Yes. It always make sense the way we've been using. > If q.alt is giving you responses, it's co

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-10 Thread Paras Lehana
Hi So I don't think removing it completely is the way to go from the scenario > we have Removing stopwords is another story. I'm curious to find the reason assuming that you keep on using stopwords. In some cases, stopwords are really necessary. Quite a considerable increase

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread David Hastings
I use 3 word shingles with stopwords for my MLT ML trainer that worked pretty well for such a solution, but for a full index the size became prohibitive On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood wrote: > If we had IDF for phrases, they would be super effective. The 2X weight is >

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Walter Underwood
gines. >>>>> >>>>> wunder >>>>> Walter Underwood >>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> >> (my blog) >>>>&

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread David Hastings
> >>>> My indexer takes quite a few hours to be executed I am shortening it > to run faster, but I also need to make sure it gives what we are expecting. > This implementation's been there for >4y, and massively used. > >>>> > >>>>> In y

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Walter Underwood
;mailto:gvit...@ebi.ac.uk>> wrote: >>>> >>>> Hi Wunder, >>>> >>>> My indexer takes quite a few hours to be executed I am shortening it to >>>> run faster, but I also need to make sure it gives what we are expecting. >>>>

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Walter Underwood
lso need to make sure it gives what we are expecting. >> This implementation's been there for >4y, and massively used. >>>> >>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely >> high. I don’t think I’ve ever used a weight higher than 16 in a

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Erick Erickson
high. I >>>> don’t think I’ve ever used a weight higher than 16 in a dozen years of >>>> configuring Solr. >>> I've inherited that implementation and I am really keen to adequate it, >>> what would you recommend ? >>> >>> Cheers >

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Guilherme Viteri
expecting. >> This implementation's been there for >4y, and massively used. >>>> >>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely >> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years >> of c

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread David Hastings
years > of configuring Solr. > >> I've inherited that implementation and I am really keen to adequate it, > what would you recommend ? > >> > >> Cheers > >> Guilherme > >> > >>> On 7 Nov 2019, at 14:43, Walter Underwood <ma

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Guilherme Viteri
ears of >>> configuring Solr. >> I've inherited that implementation and I am really keen to adequate it, what >> would you recommend ? >> >> Cheers >> Guilherme >> >>> On 7 Nov 2019, at 14:43, Walter Underwood >> <mailto:wun...@

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Paras Lehana
see that you still > are using StopFilterFactory. The first advice we gave you was to remove > that. > > Remove StopFilterFactory everywhere and reindex. > > You will continue to have problems matching stopwords until you do that. > > In your edismax handl

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Walter Underwood
you still >> are using StopFilterFactory. The first advice we gave you was to remove that. >> >> Remove StopFilterFactory everywhere and reindex. >> >> You will continue to have problems matching stopwords until you do that. >> >> In your edismax handlers, weight

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Guilherme Viteri
t; Thanks for posting the files. Looking at schema.xml, I see that you still are > using StopFilterFactory. The first advice we gave you was to remove that. > > Remove StopFilterFactory everywhere and reindex. > > You will continue to have problems matching stopwords until you do t

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread David Hastings
oking at schema.xml, I see that you still > are using StopFilterFactory. The first advice we gave you was to remove > that. > > Remove StopFilterFactory everywhere and reindex. > > You will continue to have problems matching stopwords until you do that. > > In your edismax handlers

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Walter Underwood
Thanks for posting the files. Looking at schema.xml, I see that you still are using StopFilterFactory. The first advice we gave you was to remove that. Remove StopFilterFactory everywhere and reindex. You will continue to have problems matching stopwords until you do that. In your edismax

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Guilherme Viteri
e time, the question “why didn’t this query do what I >> expect” is answered by looking at the “&debug=query” output and the >> analysis page in the admin UI. NOTE: for the analysis page be sure to look >> at _both_ the query and index output. Also, and very important about

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-06 Thread Paras Lehana
at this _assumes_ that what you > put in the text boxes have made it through the query parser intact and is > analyzed by the field selected. Consider the search "q=field:word1 word2". > Now you type “word1 word2” into the analysis text box and it looks like > what you expect. T

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-06 Thread Erick Erickson
search "q=field:word1 word2". Now you type >> “word1 word2” into the analysis text box and it looks like what you expect. >> That’s misleading because the query is _parsed_ as "field:word1 >> default_search_field:word2”. This is where “&debug=query” helps. >&

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-06 Thread Guilherme Viteri
> selected. Consider the search "q=field:word1 word2". Now you type “word1 > word2” into the analysis text box and it looks like what you expect. That’s > misleading because the query is _parsed_ as "field:word1 > default_search_field:word2”. This is where “&deb

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-06 Thread Erick Erickson
lter, > > The solr.StopFilter removes all tokens that are stopwords. Those words will >> not be in the index, so they can never match a query. > > > I think the OP's concern is different results when adding a stopword. I > think he's using the filter factory corr

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread Paras Lehana
Hi Walter, The solr.StopFilter removes all tokens that are stopwords. Those words will > not be in the index, so they can never match a query. I think the OP's concern is different results when adding a stopword. I think he's using the filter factory correctly - the query chain

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread Walter Underwood
No. The solr.StopFilter removes all tokens that are stopwords. Those words will not be in the index, so they can never match a query. 1. Remove the lines with solr.StopFilter from every analysis chain in schema.xml. 2. Reload the collection, restart Solr, or whatever to read the new config. 3

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread Guilherme Viteri
d > On 5 Nov 2019, at 14:48, David Hastings wrote: > > Fwd to another server > > no, > words="stopwords.txt"/> > > is still using stopwords and should be removed, in my opinion of course, > based on your use case may be different, but i gene

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread David Hastings
no, is still using stopwords and should be removed, in my opinion of course, based on your use case may be different, but i generally axe any reference to them at all On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri wrote: > Thanks. > Haven't I don

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread Guilherme Viteri
Thanks. Haven't I done this here ? > On 5 Nov 2019, at 14:15, David Hastings wrote: > > Fwd to another server > > The first thing you should do is remove any reference to stop words and >

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread David Hastings
The first thing you should do is remove any reference to stop words and never use them, then re-index your data and try it again. On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri wrote: > Hi, > > I am performing a search to match a name (text_field), however this term > contains 'and' and 'a' and

When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread Guilherme Viteri
Hi, I am performing a search to match a name (text_field), however this term contains 'and' and 'a' and it doesn't return any records. If i remove 'a' then it works. e.g Search Term: lymphoid and a non-lymphoid cell doesn't work: https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymph

Re: Identify stopwords using TF-IDF

2019-06-22 Thread Walter Underwood
I haven’t removed stopwords since 1996, when I joined Infoseek. What is your special case where you must remove them? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 22, 2019, at 9:51 PM, akash jayaweera > wrote: > > Hello Walter

Re: Identify stopwords using TF-IDF

2019-06-22 Thread akash jayaweera
Hello Walter, Thank you for the reply. But for some of my use-case I need to identify stopword. So I need a better way to identify domain specific stopwords. I used TF-IDF to identify stopwords. But it has the issue I mentioned above. Regards, *Akash Jayaweera.* E akash.jayawe...@gmail.com M

Re: Identify stopwords using TF-IDF

2019-06-22 Thread Walter Underwood
Don’t remove stopwords. That was a useful hack when we were running search engines on 16-bit machines. These days, it causes more problems than it solves. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 22, 2019, at 8:14 PM, akash jayawe

Identify stopwords using TF-IDF

2019-06-22 Thread akash jayaweera
Hello All, I'm trying to identify stopwords for a non-English corpus using TF-IDF score. I calculated the score for each unique term in the corpus. But my question is how can I select stopwords using the score. For example if we have a corpus of football, term "football" get th

Re: StopWords behavior with phrases

2019-05-21 Thread Jan Høydahl
Well perhaps you don't need to remove stopwords at all? :) Or a middle ground is to NOT removing stopwords in your 'index' analyzer, then you have the flexibility of removing them on query side. Thus if you use &stopwords=false on your call perhaps that works? -- Jan Høyda

StopWords behavior with phrases

2019-05-21 Thread Ashish Bisht
Hi, We make query to solr as below *q="market and cloud" OR (market and cloud)&q.op=AND&deftype=edismax* Our intent to look for results with both phrase match and AND query together where solr itself takes care of relevancy. But due to presence of stopword in phrase query a gap is left whic

Re: How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-09 Thread Erick Erickson
Ah, I didn’t read thoroughly enough. The problem is stopwords don’t really count for fuzzy searching. By specifying “junk~” you’re not really searching for “junk” or variants. You’re telling Solr “find any term that is a fuzzy match” to “junk”. Under the covers, a search is being made for “jank

Re: How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-09 Thread bbarani
Thanks for your reply Erick. I create a simple field type as below for testing and added 'junk' to the stopwords but it doesnt seem to honor it when using fuzzzy search Btw, I am using qf along with edismax and pass the value in q (sample query below). /solr/collection1

Re: How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-08 Thread Erick Erickson
by bit to see if/when you have this problem. But if stopwords are working correctly at index time, the “junk” will not be _in_ the index, therefore it’ll be impossible to find fuzzy search or not. So you’re making some assumptions that aren’t true, and the analysis process combined with looking

How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-08 Thread bbarani
Hi, Is there a way to use stopwords and fuzzy match in a SOLR query? The below query matches 'jack' too and I added 'junk' to the stopwords (in query) to avoid returning results but looks like its not honoring the stopwords when using the fuzzy search. solr/colle

How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-08 Thread bbarani
Hi, Is there a way to use stopwords and fuzzy match in a SOLR query? The below query matches 'jack' too and I added 'junk' to the stopwords (in query) to avoid returning results but looks like its not honoring the stopwords when using the fuzzy search. solr/colle

Re: Stopwords param of edismax parser not working

2019-03-29 Thread Branham, Jeremy (Experis)
e: Hi, We are trying to remove stopwords from analysis using edismax parser parameter.The documentation says *stopwords A Boolean parameter indicating if the StopFilterFactory configured in the query analyzer should be respected when parsing the query. If this is set to

Re: Stopwords param of edismax parser not working

2019-03-28 Thread Erick Erickson
and to say anything about your particular situation we need to see the field definitions from the schema for the field you expect stopwrods to be removed from and the stopwords file for those fields. But Walter’s comment is germane. Stopwords lead to a number of incongruities and are best just

Re: Stopwords param of edismax parser not working

2019-03-28 Thread Walter Underwood
Why are you removing stopwords? That hack made sense in the 1950s, but I haven’t removed stopwords for the last twenty years. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 28, 2019, at 2:47 AM, Ashish Bisht wrote: > > Hi, > >

Stopwords param of edismax parser not working

2019-03-28 Thread Ashish Bisht
Hi, We are trying to remove stopwords from analysis using edismax parser parameter.The documentation says *stopwords A Boolean parameter indicating if the StopFilterFactory configured in the query analyzer should be respected when parsing the query. If this is set to false, then the

Re: Can I use configsets with custom stopwords per collection?

2018-12-05 Thread O. Klein
Ok. So with these suggestions, I found https://lucene.apache.org/solr/guide/6_6/configuring-solrconfig-xml.html#Configuringsolrconfig.xml-ImplicitCoreProperties So to test this I tried to use it in DIH as this has a similar issue with configsets as every collection needs its own DIH.properties.

Re: Can I use configsets with custom stopwords per collection?

2018-12-04 Thread Erick Erickson
I want all collections to use 1 schema. > > So I wonder, do managed stopwords work with configsets and store stopwords > per collection? > > Also, what would be the substitution variable for collection name? Is there > a list somewhere? > > Thanks! > > > > -- > Sent

  1   2   3   4   >