> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 

> Actually I could not find stopwords file. Could You help me with this.
> Actually I am sure that such words as mission, sea, ocean, building,
> electricity, etc. couldn't be in stopwords file. (at my previous
> question I mean carrot stopword file, because I can't find lucene's
> stopwords files)

The current implementations of the language Analyzers use
the default constructors of the analyzers of the same name in 
the Lucene package.  When instantiated this way,
the analyzers use the hard-coded stop word lists.  For German,
the stop words are:

  private String[] GERMAN_STOP_WORDS = {
    "einer", "eine", "eines", "einem", "einen",
    "der", "die", "das", "dass", "daß",
    "du", "er", "sie", "es",
    "was", "wer", "wie", "wir",
    "und", "oder", "ohne", "mit",
    "am", "im", "in", "aus", "auf",
    "ist", "sein", "war", "wird",
    "ihr", "ihre", "ihres",
    "als", "für", "von", "mit",
    "dich", "dir", "mich", "mir",
    "mein", "sein", "kein",
    "durch", "wegen", "wird"
  };
// From src/java/org/apache/lucene/analysis/de/GermanAnalyzer.java
// of Lucene 1.4.3 distribution.  This could be slightly out of date.

You'd have to either modify the source code in:
src/plugin/analysis-de/src/java/org/apache/nutch/analysis/de/GermanAnalyzer.java
to use the constructor that takes the word list or the file name of the word 
list,
I think.


> when I use trunk version should I change some code as it shown at wiki
> in MultiLingual support page? Because, as I understand everything in
> trunk version have been done for stemming plugins integration without
> code changing.

I believe Jérôme has implemented these code changes into the Trunk.

-kuro

All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to