ageIdentifier.html .
Peter
-Original Message-
From: Maria Mosolova [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 18, 2007 8:48 AM
To: solr-user@lucene.apache.org
Subject: Re: multilingual list of stopwords
Thanks a lot to everyone who responded. Yes, I agree that eventually we
need to use
>
>> Peter
>>
>> -Original Message-
>> From: Maria Mosolova [mailto:[EMAIL PROTECTED]
>> Sent: Thursday, October 18, 2007 8:48 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: multilingual list of stopwords
>>
>> Thanks a lot to e
Thank you very much for the references Gordon! Looks like that is
exactly what I need
Maria
On 10/18/07, Gordon <[EMAIL PROTECTED]> wrote:
> Maria,
>
> It's perfectly reasonable to build a single list, sort it, and scan it for
> especially bad cases. See for example,
> http://members.unine.ch/jacqu
Maria,
It's perfectly reasonable to build a single list, sort it, and scan it for
especially bad cases. See for example,
http://members.unine.ch/jacques.savoy/clef/index.html for stopwords for
several languages or check in some standard programming modules like:
http://search.cpan.org/~fabpot/Ling
Original Message-
> From: Maria Mosolova [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 18, 2007 8:48 AM
> To: solr-user@lucene.apache.org
> Subject: Re: multilingual list of stopwords
>
> Thanks a lot to everyone who responded. Yes, I agree that eventually we
&
solr-user@lucene.apache.org
Subject: Re: multilingual list of stopwords
Thanks a lot to everyone who responded. Yes, I agree that eventually we
need to use separate stopword lists for different languages.
Unfortunately the data we are trying to index at the moment does not
contain any direct co
Thanks a lot to everyone who responded. Yes, I agree that eventually
we need to use separate stopword lists for different languages.
Unfortunately the data we are trying to index at the moment does not
contain any direct country/language information and we need to create
the first version of the in
Also "die" in German and English. --wunder
On 10/18/07 4:16 AM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote:
> One example that I'm familiar with: words "is" and "by" in English and
> in Swedish. Both words are stopwords in English, but they are content
> words in Swedish (ice and village, respe
Are you sure they don't just mean they want separate stopword lists
for various different indexes in different languages? Otherwise, I
agree, it doesn't make much sense for a single mixed language index
(unless you had an intelligent filter that could select based on
language.)
Maria, pe
Lukas Vlcek wrote:
Hi,
I haven't heard of multilingual stop words list before. What should be the
purpose of it? This seems to odd to me :-)
That's because multilingual stopword list doesn't make sense ;)
One example that I'm familiar with: words "is" and "by" in English and
in Swedish. Both
Hi,
I haven't heard of multilingual stop words list before. What should be the
purpose of it? This seems to odd to me :-)
Stop words are used to cut down the size of index.
One way you can go about this is to create your own list by indexing your
documents (without stop words removed) and then lo
Hi Maria,
this is a "me too". ;)
At the moment I'll take the way to merge the various language stopword
files I need to one and use it. But the main problem in this case is,
having collusions with words which are stopwords in one language and in
the other not.
Cheers,
Joe
12 matches
Mail list logo