There is a band named "The The". And a producer named "Don Was". For a list of all-stopword movie titles at Netflix, see this post:
http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html My favorite is "To Be and To Have (Être et Avoir)", which is all stopwords in two languages. And a very good movie. wunder On Jan 12, 2010, at 6:55 PM, Robert Muir wrote: > sorry, i forgot to include this 2009 paper comparing what stopwords do > across 3 languages: > > http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf > > in my opinion, if stopwords annoy your users for very special cases > like 'the the' then, instead consider using commongrams + > defaultsimilarity.discountOverlaps = true so that you still get the > benefits. > > as you can see from the above paper, they can be extremely important > depending on the language, they just don't matter so much for English. > > On Tue, Jan 12, 2010 at 9:20 PM, Lance Norskog <goks...@gmail.com> wrote: >> There are a lot of projects that don't use stopwords any more. You >> might consider dropping them altogether. >> >> On Mon, Jan 11, 2010 at 2:25 PM, Don Werve <d...@madwombat.com> wrote: >>> This is the way I've implemented multilingual search as well. >>> >>> 2010/1/11 Markus Jelsma <mar...@buyways.nl> >>> >>>> Hello, >>>> >>>> >>>> We have implemented language specific search in Solr using language >>>> specific fields and field types. For instance, an en_text field type can >>>> use an English stemmer, and list of stopwords and synonyms. We, however >>>> did not use specific stopwords, instead we used one list shared by both >>>> languages. >>>> >>>> So you would have a field type like: >>>> <fieldType name="en_text" class="solr.TextField" ... >>>> <analyzer type=""> >>>> <filter class="solr.StopFilterFactory" words="stopwords.en.txt"> >>>> <filter class="solr.SynonymFilterFactory" synonyms="synoyms.en.txt"> >>>> >>>> etc etc. >>>> >>>> >>>> >>>> Cheers, >>>> >>>> - >>>> Markus Jelsma Buyways B.V. >>>> Technisch Architect Friesestraatweg 215c >>>> http://www.buyways.nl 9743 AD Groningen >>>> >>>> >>>> Alg. 050-853 6600 KvK 01074105 >>>> Tel. 050-853 6620 Fax. 050-3118124 >>>> Mob. 06-5025 8350 In: http://www.linkedin.com/in/markus17 >>>> >>>> >>>> On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote: >>>> >>>>> Hi Solr users. >>>>> >>>>> I'm trying to set up a site with Solr search integrated. And I use the >>>>> SolJava API to feed the index with search documents. At the moment I >>>>> have only activated search on the English portion of the site. I'm >>>>> interested in using as many features of solr as possible. Synonyms, >>>>> Stopwords and stems all sounds quite interesting and useful but how do >>>>> I set up this in a good way for a multilingual site? >>>>> >>>>> The site don't have a huge text mass so performance issues don't >>>>> really bother me but still I'd like to hear your suggestions before I >>>>> try to implement an solution. >>>>> >>>>> Best regards >>>>> >>>>> Daniel >>>> >>> >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> > > > > -- > Robert Muir > rcm...@gmail.com >