Re: Default stop word list

2016-09-09 Thread Emir Arnautovic
I would partially agree with Walter - having more resources allows us to include stopwords in index and let scoring model do its job. However, there are other Solr features that can suffer from that approach: e.g. if you use edismax and mm=80%, in case of query with stopwords, you can end up

Re: Default stop word list

2016-09-08 Thread Walter Underwood
I recommend that you remove StopFilterFactor from every analysis chain. In the tf.idf scoring model, rare words are automatically weighted more than common words. I have an index with 11.6 million documents. “the” occurs in 9.9 million of those documents. “cat” occurs in 16,000 of those

Re: Default stop word list

2016-09-08 Thread Steven White
Hi Walter and all. Sorry for the late reply, I was out of town. Are you saying the list of stop words from the stop word file be remove? I understand the issues I will run into because of the stop word list, but all alone, my understanding of stop word list being in the stop word file is -- to

Re: Default stop word list

2016-08-29 Thread Walter Underwood
Do not remove stop words. Want to search for “vitamin a”? That won’t work. Stop word removal is a hack left over from when we were running search engines in 64 kbytes of memory. Yes, common words are less important for search, but removing them is a brute force approach with severe side

Re: Default stop word list

2016-08-29 Thread Steven White
Thanks Shawn. This is the best answer I have seen, much appreciated. A follow up question, I want to remove stop words from the list, but if I do, then search quality will degradation (and index size will grow (less of an issue)). For example, if I remove "a", then if someone search for "For a

Re: Default stop word list

2016-08-27 Thread Shawn Heisey
On 8/27/2016 12:39 PM, Shawn Heisey wrote: > I personally think that stopword removal is more of a problem than a > solution. There actually is one thing that a stopword filter can dothat has little to do with the purpose it was designed for. You can make it impossible to search for certain

Re: Default stop word list

2016-08-27 Thread Shawn Heisey
On 8/26/2016 7:13 AM, Steven White wrote: > But what about the current "default" list that comes with Solr? How was > that list, for all supported languages, determined? That list of stopwords was created from years of history with Lucene, taking the expertise of many people and the wisdom of

Re: Default stop word list

2016-08-26 Thread Steven White
But what about the current "default" list that comes with Solr? How was that list, for all supported languages, determined? What I fear is this, when someone puts Solr into production, no one makes a change to that list, so if the list is not "valid" this will impacting search, but if the list

RE: Default stop word list

2016-08-26 Thread Srinivasa Meenavalli
Hi Steven, List of Stopwords of a language are not fixed, there is no single universal list of stop words used by all natural language processing tools . Ideally stop words should be defined search merchandisers based on their domain instead of referring default.