There is a band named "The The". And a producer named "Don Was". For a list of 
all-stopword movie titles at Netflix, see this post:

http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html

My favorite is "To Be and To Have (Être et Avoir)", which is all stopwords in 
two languages. And a very good movie.

wunder

On Jan 12, 2010, at 6:55 PM, Robert Muir wrote:

> sorry, i forgot to include this 2009 paper comparing what stopwords do
> across 3 languages:
> 
> http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf
> 
> in my opinion, if stopwords annoy your users for very special cases
> like 'the the' then, instead consider using commongrams +
> defaultsimilarity.discountOverlaps = true so that you still get the
> benefits.
> 
> as you can see from the above paper, they can be extremely important
> depending on the language, they just don't matter so much for English.
> 
> On Tue, Jan 12, 2010 at 9:20 PM, Lance Norskog <goks...@gmail.com> wrote:
>> There are a lot of projects that don't use stopwords any more. You
>> might consider dropping them altogether.
>> 
>> On Mon, Jan 11, 2010 at 2:25 PM, Don Werve <d...@madwombat.com> wrote:
>>> This is the way I've implemented multilingual search as well.
>>> 
>>> 2010/1/11 Markus Jelsma <mar...@buyways.nl>
>>> 
>>>> Hello,
>>>> 
>>>> 
>>>> We have implemented language specific search in Solr using language
>>>> specific fields and field types. For instance, an en_text field type can
>>>> use an English stemmer, and list of stopwords and synonyms. We, however
>>>> did not use specific stopwords, instead we used one list shared by both
>>>> languages.
>>>> 
>>>> So you would have a field type like:
>>>> <fieldType name="en_text" class="solr.TextField" ...
>>>>  <analyzer type="">
>>>>  <filter class="solr.StopFilterFactory" words="stopwords.en.txt">
>>>>  <filter class="solr.SynonymFilterFactory" synonyms="synoyms.en.txt">
>>>> 
>>>> etc etc.
>>>> 
>>>> 
>>>> 
>>>> Cheers,
>>>> 
>>>> -
>>>> Markus Jelsma          Buyways B.V.
>>>> Technisch Architect    Friesestraatweg 215c
>>>> http://www.buyways.nl  9743 AD Groningen
>>>> 
>>>> 
>>>> Alg. 050-853 6600      KvK  01074105
>>>> Tel. 050-853 6620      Fax. 050-3118124
>>>> Mob. 06-5025 8350      In: http://www.linkedin.com/in/markus17
>>>> 
>>>> 
>>>> On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote:
>>>> 
>>>>> Hi Solr users.
>>>>> 
>>>>> I'm trying to set up a site with Solr search integrated. And I use the
>>>>> SolJava API to feed the index with search documents. At the moment I
>>>>> have only activated search on the English portion of the site. I'm
>>>>> interested in using as many features of solr as possible. Synonyms,
>>>>> Stopwords and stems all sounds quite interesting and useful but how do
>>>>> I set up this in a good way for a multilingual site?
>>>>> 
>>>>> The site don't have a huge text mass so performance issues don't
>>>>> really bother me but still I'd like to hear your suggestions before I
>>>>> try to implement an solution.
>>>>> 
>>>>> Best regards
>>>>> 
>>>>> Daniel
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Lance Norskog
>> goks...@gmail.com
>> 
> 
> 
> 
> -- 
> Robert Muir
> rcm...@gmail.com
> 

Reply via email to