I don't think this is something to consider across the board for all
languages. The same grammatical units that are part of a word in one
language (and removed by stemmers) are independent morphemes in others
(and should be stopwords)

so please take this advice on a case-by-case basis for each language.

On Tue, Jan 12, 2010 at 9:20 PM, Lance Norskog <goks...@gmail.com> wrote:
> There are a lot of projects that don't use stopwords any more. You
> might consider dropping them altogether.
>
> On Mon, Jan 11, 2010 at 2:25 PM, Don Werve <d...@madwombat.com> wrote:
>> This is the way I've implemented multilingual search as well.
>>
>> 2010/1/11 Markus Jelsma <mar...@buyways.nl>
>>
>>> Hello,
>>>
>>>
>>> We have implemented language specific search in Solr using language
>>> specific fields and field types. For instance, an en_text field type can
>>> use an English stemmer, and list of stopwords and synonyms. We, however
>>> did not use specific stopwords, instead we used one list shared by both
>>> languages.
>>>
>>> So you would have a field type like:
>>> <fieldType name="en_text" class="solr.TextField" ...
>>>  <analyzer type="">
>>>  <filter class="solr.StopFilterFactory" words="stopwords.en.txt">
>>>  <filter class="solr.SynonymFilterFactory" synonyms="synoyms.en.txt">
>>>
>>> etc etc.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> -
>>> Markus Jelsma          Buyways B.V.
>>> Technisch Architect    Friesestraatweg 215c
>>> http://www.buyways.nl  9743 AD Groningen
>>>
>>>
>>> Alg. 050-853 6600      KvK  01074105
>>> Tel. 050-853 6620      Fax. 050-3118124
>>> Mob. 06-5025 8350      In: http://www.linkedin.com/in/markus17
>>>
>>>
>>> On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote:
>>>
>>> > Hi Solr users.
>>> >
>>> > I'm trying to set up a site with Solr search integrated. And I use the
>>> > SolJava API to feed the index with search documents. At the moment I
>>> > have only activated search on the English portion of the site. I'm
>>> > interested in using as many features of solr as possible. Synonyms,
>>> > Stopwords and stems all sounds quite interesting and useful but how do
>>> > I set up this in a good way for a multilingual site?
>>> >
>>> > The site don't have a huge text mass so performance issues don't
>>> > really bother me but still I'd like to hear your suggestions before I
>>> > try to implement an solution.
>>> >
>>> > Best regards
>>> >
>>> > Daniel
>>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Robert Muir
rcm...@gmail.com

Reply via email to