Re: Stemming in Nutch

Howie Wang Wed, 08 Jun 2005 08:07:29 -0700

Thanks for the info! Is the way to approach it just to call
PorterStemFilter in NutchDocumentAnalysis.java? Something
like this:


 /** Analyzer used to index textual content. */
 private static class ContentAnalyzer extends Analyzer {
   /** Constructs a [EMAIL PROTECTED] NutchDocumentTokenizer}. */
   public TokenStream tokenStream(String field, Reader reader) {

TokenStream ts = CommonGrams.getFilter(newNutchDocumentTokenizer(reader), field);

     return new PorterStemFilter(ts);
   }
 }

Am I completely off-base?

Howie

From: Andy Liu <[EMAIL PROTECTED]>

There's a couple that have been developed for Lucene.  You'd have to
modify the Nutch code to use your new stemming analyzer.

On 6/8/05, J�r�me Charron <[EMAIL PROTECTED]> wrote:
> > It seems that stemming is not working for me in nutch. If a document
> > has the word "kittens" in it, when I search for "kitten" it is not
> > being returned. Is there something I need to do to enable or install
> > support for stemming in English?
>

> As far as I know, it does not seem to me that the Nutch Analyzerperforms

> stemming.
> I planned for the next release to write a proposal for integrating
> multi-language analyzers in Nutch (like in Lucene).
> But for now, as far as I know, there is nothing done on this area.
>
> Jerome
>
> --
> http://motrech.free.fr/
> http://frutch.free.fr/
>
>

Re: Stemming in Nutch

Reply via email to