subject:"Russian stemmer"

Russian stemmer

2010-07-27 Thread Oleg Burlaca

Hello,

I'm using SnowballPorterFilterFactory with language=Russian.
The stemming works ok except people names, geographical places.
Here are some examples:

searching for Ковров should also find Коврова, Коврову, Ковровом, Коврове.

Are there other stemming plugins for the russian language that can handle
this?
If not, what are the options. A simple solution may be to use the wildcard
queries in Standard mode instead of the DisMaxQueryHandler:
Ковров*

but I'd like to avoid it.

Thanks.

Re: Russian stemmer

2010-07-27 Thread Robert Muir

All of your examples stem to ковров:

   assertAnalyzesTo(a, Коврова Коврову Ковровом Коврове,
  new String[] { ковров, ковров, ковров, ковров });
}

Are you sure you enabled this at *both* index and query time?

2010/7/27 Oleg Burlaca o...@burlaca.com

 Hello,

 I'm using SnowballPorterFilterFactory with language=Russian.
 The stemming works ok except people names, geographical places.
 Here are some examples:

 searching for Ковров should also find Коврова, Коврову, Ковровом, Коврове.

 Are there other stemming plugins for the russian language that can handle
 this?
 If not, what are the options. A simple solution may be to use the wildcard
 queries in Standard mode instead of the DisMaxQueryHandler:
 Ковров*

 but I'd like to avoid it.

 Thanks.




-- 
Robert Muir
rcm...@gmail.com

Re: Russian stemmer

2010-07-27 Thread Robert Muir

another look, your problem is ковров itself... its mapped to ковр

a workaround might be to use the protected words functionality to
keep ковров and any other problematic people/geo names as-is.

separately, in trunk there is an alternative russian stemmer
(RussianLightStemFilterFactory), which might give you less problems on
average, but I noticed it has this same problem with the example you gave.

On Tue, Jul 27, 2010 at 4:25 AM, Robert Muir rcm...@gmail.com wrote:

 All of your examples stem to ковров:

assertAnalyzesTo(a, Коврова Коврову Ковровом Коврове,
   new String[] { ковров, ковров, ковров, ковров });
 }

 Are you sure you enabled this at *both* index and query time?

 2010/7/27 Oleg Burlaca o...@burlaca.com

 Hello,

 I'm using SnowballPorterFilterFactory with language=Russian.
 The stemming works ok except people names, geographical places.
 Here are some examples:

 searching for Ковров should also find Коврова, Коврову, Ковровом, Коврове.

 Are there other stemming plugins for the russian language that can handle
 this?
 If not, what are the options. A simple solution may be to use the wildcard
 queries in Standard mode instead of the DisMaxQueryHandler:
 Ковров*

 but I'd like to avoid it.

 Thanks.




 --
 Robert Muir
 rcm...@gmail.com




-- 
Robert Muir
rcm...@gmail.com

Re: Russian stemmer

2010-07-27 Thread Oleg Burlaca

Yes, I'm sure I've enabled SnowballPorterFilterFactory both at Index and
Query time, because the search works ok,
except names and geo locations.

I've noticed that searching by
Коврова

also shows documents that contain Коврову, Коврове

Search by Ковров, 7 results:
http://www.sova-center.ru/search/?q=%D0%BA%D0%BE%D0%B2%D1%80%D0%BE%D0%B2

Search by Коврова, 26 results:
http://www.sova-center.ru/search/?lg=1q=%D0%BA%D0%BE%D0%B2%D1%80%D0%BE%D0%B2%D0%B0

Adding such words in stopwords.txt will be a tedious task, as there are 7
millions russian names :)

Kind Regards,
Oleg Burlaca



On Tue, Jul 27, 2010 at 11:35 AM, Robert Muir rcm...@gmail.com wrote:

 another look, your problem is ковров itself... its mapped to ковр

 a workaround might be to use the protected words functionality to
 keep ковров and any other problematic people/geo names as-is.

 separately, in trunk there is an alternative russian stemmer
 (RussianLightStemFilterFactory), which might give you less problems on
 average, but I noticed it has this same problem with the example you gave.

 On Tue, Jul 27, 2010 at 4:25 AM, Robert Muir rcm...@gmail.com wrote:

  All of your examples stem to ковров:
 
 assertAnalyzesTo(a, Коврова Коврову Ковровом Коврове,
new String[] { ковров, ковров, ковров, ковров });
  }
 
  Are you sure you enabled this at *both* index and query time?
 
  2010/7/27 Oleg Burlaca o...@burlaca.com
 
  Hello,
 
  I'm using SnowballPorterFilterFactory with language=Russian.
  The stemming works ok except people names, geographical places.
  Here are some examples:
 
  searching for Ковров should also find Коврова, Коврову, Ковровом,
 Коврове.
 
  Are there other stemming plugins for the russian language that can
 handle
  this?
  If not, what are the options. A simple solution may be to use the
 wildcard
  queries in Standard mode instead of the DisMaxQueryHandler:
  Ковров*
 
  but I'd like to avoid it.
 
  Thanks.
 
 
 
 
  --
  Robert Muir
  rcm...@gmail.com
 



 --
 Robert Muir
 rcm...@gmail.com

Re: Russian stemmer

2010-07-27 Thread Oleg Burlaca

A similar word is Немцов.
The strange thing is that searching for Немцова will not find documents
containing Немцов

Немцова: 14 articles
http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0

Немцов: 74 articles
http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2

Re: Russian stemmer

2010-07-27 Thread Oleg Burlaca

Actually the situation with Немцов из ок,
I've just checked how Yandex works with Немцов and Немцова:
http://nano.yandex.ru/project/inflect/

I think there are two solutions:
a) manually search for both Немцов and then Немцова
b) use wildcard query: Немцов*

Robert, thanks for the RussianLightStemFilterFactory info,
I've found this page
http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg06857.html
that somehow describes it. Where can I read more about
RussianLightStemFilterFactory ?

Regards,
Oleg

2010/7/27 Oleg Burlaca o...@burlaca.com

 A similar word is Немцов.
 The strange thing is that searching for Немцова will not find documents
 containing Немцов

 Немцова: 14 articles

 http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0

 Немцов: 74 articles

 http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2

Re: Russian stemmer

2010-07-27 Thread Robert Muir

2010/7/27 Oleg Burlaca o...@burlaca.com

 Actually the situation with Немцов из ок,
 I've just checked how Yandex works with Немцов and Немцова:
 http://nano.yandex.ru/project/inflect/

 I think there are two solutions:
 a) manually search for both Немцов and then Немцова
 b) use wildcard query: Немцов*


Well, here is one idea of a more general solution.
The problem with protected words is you must have a complete list.

One idea would be to add a filter that protects any words from stemming that
match a regular expression:
In english maybe someone wants to avoid any capitalized words to reduce
trouble: [A-Z].*
in your case then some pattern like [A-Я].*ов might prevent problems.


 Robert, thanks for the RussianLightStemFilterFactory info,
 I've found this page
 http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg06857.html
 that somehow describes it. Where can I read more about
 RussianLightStemFilterFactory ?


Here is the link:
http://doc.rero.ch/lm.php?url=1000,43,4,20091209094227-CA/Dolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf


 Regards,
 Oleg

 2010/7/27 Oleg Burlaca o...@burlaca.com

  A similar word is Немцов.
  The strange thing is that searching for Немцова will not find documents
  containing Немцов
 
  Немцова: 14 articles
 
 
 http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0
 
  Немцов: 74 articles
 
 
 http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2
 
 
 
 




-- 
Robert Muir
rcm...@gmail.com

Re: Russian stemmer

2010-07-27 Thread Oleg Burlaca

Thanks Robert for all your help,

The idea of ы[A-Z].* stopwords is ideal for the english language,
although in russian nouns are inflected: Борис, Борису, Бориса, Борисом

I'll try the RussianLightStemFilterFactory (the article in the PDF mentioned
it's more accurate).

Once again thanks,
Oleg Burlaca

On Tue, Jul 27, 2010 at 12:07 PM, Robert Muir rcm...@gmail.com wrote:

 2010/7/27 Oleg Burlaca o...@burlaca.com

  Actually the situation with Немцов из ок,
  I've just checked how Yandex works with Немцов and Немцова:
  http://nano.yandex.ru/project/inflect/
 
  I think there are two solutions:
  a) manually search for both Немцов and then Немцова
  b) use wildcard query: Немцов*
 

 Well, here is one idea of a more general solution.
 The problem with protected words is you must have a complete list.

 One idea would be to add a filter that protects any words from stemming
 that
 match a regular expression:
 In english maybe someone wants to avoid any capitalized words to reduce
 trouble: [A-Z].*
 in your case then some pattern like [A-Я].*ов might prevent problems.


  Robert, thanks for the RussianLightStemFilterFactory info,
  I've found this page
  http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg06857.html
  that somehow describes it. Where can I read more about
  RussianLightStemFilterFactory ?
 
 
 Here is the link:

 http://doc.rero.ch/lm.php?url=1000,43,4,20091209094227-CA/Dolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf


  Regards,
  Oleg
 
  2010/7/27 Oleg Burlaca o...@burlaca.com
 
   A similar word is Немцов.
   The strange thing is that searching for Немцова will not find
 documents
   containing Немцов
  
   Немцова: 14 articles
  
  
 
 http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0
  
   Немцов: 74 articles
  
  
 
 http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2
  
  
  
  
 



 --
 Robert Muir
 rcm...@gmail.com

Re: Russian stemmer

2010-07-27 Thread Robert Muir

right, but your problem is this is the current output:

Ковров - Ковр
Коврову - Ковров
Ковровом - Ковров
Коврове - Ковров

so, if Ковров was simply left alone, all your forms would match...

2010/7/27 Oleg Burlaca o...@burlaca.com

 Thanks Robert for all your help,

 The idea of ы[A-Z].* stopwords is ideal for the english language,
 although in russian nouns are inflected: Борис, Борису, Бориса, Борисом

 I'll try the RussianLightStemFilterFactory (the article in the PDF
 mentioned
 it's more accurate).

 Once again thanks,
 Oleg Burlaca

 On Tue, Jul 27, 2010 at 12:07 PM, Robert Muir rcm...@gmail.com wrote:

  2010/7/27 Oleg Burlaca o...@burlaca.com
 
   Actually the situation with Немцов из ок,
   I've just checked how Yandex works with Немцов and Немцова:
   http://nano.yandex.ru/project/inflect/
  
   I think there are two solutions:
   a) manually search for both Немцов and then Немцова
   b) use wildcard query: Немцов*
  
 
  Well, here is one idea of a more general solution.
  The problem with protected words is you must have a complete list.
 
  One idea would be to add a filter that protects any words from stemming
  that
  match a regular expression:
  In english maybe someone wants to avoid any capitalized words to reduce
  trouble: [A-Z].*
  in your case then some pattern like [A-Я].*ов might prevent problems.
 
 
   Robert, thanks for the RussianLightStemFilterFactory info,
   I've found this page
  
 http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg06857.html
   that somehow describes it. Where can I read more about
   RussianLightStemFilterFactory ?
  
  
  Here is the link:
 
 
 http://doc.rero.ch/lm.php?url=1000,43,4,20091209094227-CA/Dolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf
 
 
   Regards,
   Oleg
  
   2010/7/27 Oleg Burlaca o...@burlaca.com
  
A similar word is Немцов.
The strange thing is that searching for Немцова will not find
  documents
containing Немцов
   
Немцова: 14 articles
   
   
  
 
 http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0
   
Немцов: 74 articles
   
   
  
 
 http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2
   
   
   
   
  
 
 
 
  --
  Robert Muir
  rcm...@gmail.com
 




-- 
Robert Muir
rcm...@gmail.com

Re: Russian stemmer

2010-07-27 Thread Dennis Gearon

I have studied some Russian. I kind of got the picture from the texts that all 
the exceptions had already been 'found', and were listed in the book. 

I do know that languages are living, changing organisms, but Russian has got to 
be more regular than English I would think, even WITH all six cases and 3 
genders.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Tue, 7/27/10, Robert Muir rcm...@gmail.com wrote:

 From: Robert Muir rcm...@gmail.com
 Subject: Re: Russian stemmer
 To: solr-user@lucene.apache.org
 Date: Tuesday, July 27, 2010, 7:12 AM
 right, but your problem is this is
 the current output:
 
 Ковров - Ковр
 Коврову - Ковров
 Ковровом - Ковров
 Коврове - Ковров
 
 so, if Ковров was simply left alone, all your forms
 would match...
 
 2010/7/27 Oleg Burlaca o...@burlaca.com
 
  Thanks Robert for all your help,
 
  The idea of ы[A-Z].* stopwords is ideal for the
 english language,
  although in russian nouns are inflected: Борис,
 Борису, Бориса, Борисом
 
  I'll try the RussianLightStemFilterFactory (the
 article in the PDF
  mentioned
  it's more accurate).
 
  Once again thanks,
  Oleg Burlaca
 
  On Tue, Jul 27, 2010 at 12:07 PM, Robert Muir rcm...@gmail.com
 wrote:
 
   2010/7/27 Oleg Burlaca o...@burlaca.com
  
Actually the situation with Немцов
 из ок,
I've just checked how Yandex works with
 Немцов and Немцова:
http://nano.yandex.ru/project/inflect/
   
I think there are two solutions:
a) manually search for both Немцов and
 then Немцова
b) use wildcard query: Немцов*
   
  
   Well, here is one idea of a more general
 solution.
   The problem with protected words is you must
 have a complete list.
  
   One idea would be to add a filter that protects
 any words from stemming
   that
   match a regular expression:
   In english maybe someone wants to avoid any
 capitalized words to reduce
   trouble: [A-Z].*
   in your case then some pattern like [A-Я].*ов
 might prevent problems.
  
  
Robert, thanks for the
 RussianLightStemFilterFactory info,
I've found this page
   
  http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg06857.html
that somehow describes it. Where can I read
 more about
RussianLightStemFilterFactory ?
   
   
   Here is the link:
  
  
  http://doc.rero.ch/lm.php?url=1000,43,4,20091209094227-CA/Dolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf
  
  
Regards,
Oleg
   
2010/7/27 Oleg Burlaca o...@burlaca.com
   
 A similar word is Немцов.
 The strange thing is that searching for
 Немцова will not find
   documents
 containing Немцов

 Немцова: 14 articles


   
  
  http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0

 Немцов: 74 articles


   
  
  http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2




   
  
  
  
   --
   Robert Muir
   rcm...@gmail.com
  
 
 
 
 
 -- 
 Robert Muir
 rcm...@gmail.com

Re: Problem with Russian stemmer in Solr 1.2

2007-07-17 Thread Andrew Stromnov


Hi Daniel

How to implement custom Russian factory with various Tokenizers and Filters?

Can you provide some code examples?

Regards,
Andrew


Daniel Alheiros wrote:
 
 Hi Andrew
 
 Yes, I saw that. As I'm not knowledgeable in Russian I had to infer it was
 adequate. But as you have much more to add to it, it could be interesting
 if
 you could contribute that.
 
 The problem is Russian analyzer and it's filters are all final class,
 don't
 allowing an elegant extension. But you can create an analyzer that reuse
 what is interesting for you (in this case, the stemmer) and customize the
 other filters. I would propose you to do that creating the Solr factories
 so
 you can point to your files containing your stopwords. Any chance you
 could
 contribute with this stopwords list?
 
 One of my reasons to not use directly the RussianAnalyzer was that I need
 to
 use an WhitespaceTokenizer removing HTML code... So I created my
 factories.
 
 Regards,
 Daniel 
 

-- 
View this message in context: 
http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11646823
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with Russian stemmer in Solr 1.2

2007-07-17 Thread Daniel Alheiros

Hi Andrew.

This is an example for one FilterFactory:

public class RussianStemFilterFactory extends BaseTokenFilterFactory {
private String charset;/** * @see
org.apache.solr.analysis.BaseTokenFilterFactory#init(java.util.Map) */
@Overridepublic void init(MapString, String arg0){
super.init(arg0);String charsetName =
args.get(charsetName);this.charset = charsetName;}}
/** * @see 
org.apache.solr.analysis.TokenFilterFactory#create(org.apache.lucene.analysi
s.TokenStream) */public TokenStream create(TokenStream tokenStream)
{return new RussianStemFilter(tokenStream, charset.getChars());}
}


When you run the args.get(String) you are going to get a property defined in
your schema.xml like this:
filter class=myCompany.RussianStemFilterFactory
charsetName=UnicodeRussian/


For a tokenizer that prepares for your filters:
public class HTMLStripRussianLetterTokenizerFactory extends
BaseTokenizerFactory {private char[]charset; /** * @see
org.apache.solr.analysis.BaseTokenizerFactory#init(java.util.Map) */
@Overridepublic void init(MapString, String arg0){
super.init(arg0); String charsetName = args.get(charsetName);
this.charset = charsetName.getChars();} /** * @see
org.apache.solr.analysis.TokenizerFactory#create(Reader) */public
TokenStream create(Reader reader){return new
RussianLetterTokenizer(new HTMLStripReader(reader), this.charset);} }

tokenizer class=myCompany.HTMLStripRussianLetterTokenizerFactory
charsetName=UnicodeRussian/

I hope it helps.

Regards,
Daniel

On 17/7/07 11:34, Andrew Stromnov [EMAIL PROTECTED] wrote:

 
 Hi Daniel
 
 How to implement custom Russian factory with various Tokenizers and Filters?
 
 Can you provide some code examples?
 
 Regards,
 Andrew
 
 
 Daniel Alheiros wrote:
 
 Hi Andrew
 
 Yes, I saw that. As I'm not knowledgeable in Russian I had to infer it was
 adequate. But as you have much more to add to it, it could be interesting
 if
 you could contribute that.
 
 The problem is Russian analyzer and it's filters are all final class,
 don't
 allowing an elegant extension. But you can create an analyzer that reuse
 what is interesting for you (in this case, the stemmer) and customize the
 other filters. I would propose you to do that creating the Solr factories
 so
 you can point to your files containing your stopwords. Any chance you
 could
 contribute with this stopwords list?
 
 One of my reasons to not use directly the RussianAnalyzer was that I need
 to
 use an WhitespaceTokenizer removing HTML code... So I created my
 factories.
 
 Regards,
 Daniel 
 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: Problem with Russian stemmer in Solr 1.2

2007-07-10 Thread Daniel Alheiros

Hi Andrew

Yes, I saw that. As I'm not knowledgeable in Russian I had to infer it was
adequate. But as you have much more to add to it, it could be interesting if
you could contribute that.

The problem is Russian analyzer and it's filters are all final class, don't
allowing an elegant extension. But you can create an analyzer that reuse
what is interesting for you (in this case, the stemmer) and customize the
other filters. I would propose you to do that creating the Solr factories so
you can point to your files containing your stopwords. Any chance you could
contribute with this stopwords list?

One of my reasons to not use directly the RussianAnalyzer was that I need to
use an WhitespaceTokenizer removing HTML code... So I created my factories.

Regards,
Daniel 


On 9/7/07 19:36, Andrew Stromnov [EMAIL PROTECTED] wrote:

 
 Hi, Daniel
 
 Stemmer in RussianAnalyser works as expected. But this analyser doesn't
 allow any Solr customization. All stopwords are hardcoded, no support for
 custom tokenizer, no synonym support.
 
 RussianAnalyser is similar to this scheme:
   standard tokenizer
   standard filter factory
   word delimeter filter factory
   lowercase filter factory
   stop filter factory (with hardcoded stopwords)
   russian stem filter
  
 
 Regards,
 Andrew
 
 
 Daniel Alheiros wrote:
 
 Hi Andrew
 
 In fact I did it creating all the Factories for Solr, but I think you can
 use it directly, changing your index like this:
 
 fieldtype name=cpstext_russian class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 class=²org.apache.lucene.analysis.ru.RussianAnalyzer²
 /analyzer
 analyzer type=query
 class=²org.apache.lucene.analysis.ru.RussianAnalyzer²
 /analyzer
 /fieldtype
 
 I¹ve not tested that, but I saw something like this.
 
 Please tell me if it works as expected and if it solves your problem (I¹m
 indexing Russian content and as you seem to be knowledgeable of Russian
 language your comments are very useful).
 
 Regards,
 Daniel
 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: Problem with Russian stemmer in Solr 1.2

2007-07-09 Thread Andrew Stromnov


Hi Daniel,

Yes, I want to try RussianAnalyzer. How to enable it in Solr config?

Thank you.


Daniel Alheiros wrote:
 
 Hi Andrew.
 
 I'm using the RussianAnalyzer (part of the Lucene analyzers) and it
 reduces
 списки to списк.
 
 Do you want to try this other Analyzer?
 
 Regards,
 Daniel
 
 
 On 9/7/07 16:06, Andrew Stromnov [EMAIL PROTECTED] wrote:
 
 списки arrondissement turvallisuuden
 
 
 http://www.bbc.co.uk/
 This e-mail (and any attachments) is confidential and may contain personal
 views which are not the views of the BBC unless specifically stated.
 If you have received it in error, please delete it from your system.
 Do not use, copy or disclose the information in any way nor act in
 reliance on it and notify the sender immediately.
 Please note that the BBC monitors e-mails sent or received.
 Further communication will signify your consent to this.
   
 
 

-- 
View this message in context: 
http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11505646
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with Russian stemmer in Solr 1.2

2007-07-09 Thread Daniel Alheiros

Hi Andrew

In fact I did it creating all the Factories for Solr, but I think you can
use it directly, changing your index like this:

fieldtype name=cpstext_russian class=solr.TextField
positionIncrementGap=100
analyzer type=index
class=”org.apache.lucene.analysis.ru.RussianAnalyzer”
/analyzer
analyzer type=query
class=”org.apache.lucene.analysis.ru.RussianAnalyzer”
/analyzer
/fieldtype

I’ve not tested that, but I saw something like this.

Please tell me if it works as expected and if it solves your problem (I’m
indexing Russian content and as you seem to be knowledgeable of Russian
language your comments are very useful).

Regards,
Daniel

On 9/7/07 18:00, Andrew Stromnov [EMAIL PROTECTED] wrote:

 
 Hi Daniel,
 
 Yes, I want to try RussianAnalyzer. How to enable it in Solr config?
 
 Thank you.
 
 
 Daniel Alheiros wrote:
 
 Hi Andrew.
 
 I'm using the RussianAnalyzer (part of the Lucene analyzers) and it
 reduces
 списки to списк.
 
 Do you want to try this other Analyzer?
 
 Regards,
 Daniel
 
 
 On 9/7/07 16:06, Andrew Stromnov [EMAIL PROTECTED] wrote:
 
 списки arrondissement turvallisuuden
 
 
 http://www.bbc.co.uk/
 This e-mail (and any attachments) is confidential and may contain personal
 views which are not the views of the BBC unless specifically stated.
 If you have received it in error, please delete it from your system.
 Do not use, copy or disclose the information in any way nor act in
 reliance on it and notify the sender immediately.
 Please note that the BBC monitors e-mails sent or received.
 Further communication will signify your consent to this.
 
 
 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Russian stemmer

Re: Russian stemmer

Re: Russian stemmer

Re: Russian stemmer

Re: Russian stemmer

Re: Russian stemmer

Re: Russian stemmer

Re: Russian stemmer

Re: Russian stemmer

Re: Russian stemmer

Re: Problem with Russian stemmer in Solr 1.2

Re: Problem with Russian stemmer in Solr 1.2

Re: Problem with Russian stemmer in Solr 1.2

Re: Problem with Russian stemmer in Solr 1.2

Re: Problem with Russian stemmer in Solr 1.2

15 matches

Site Navigation

Mail list logo

Footer information