Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The "AnalyzersTokenizersTokenFilters" page has been changed by RobertMuir. The comment on this change is: beef up / disambiguate the snowball docs. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=73&rev2=74 -------------------------------------------------- Example: "riding", "rides", "horses" ==> "ride", "ride", "hors". + Note: This differs very slightly from the "Porter" algorithm available in `solr.SnowballPorterFilter`, in that it deviates slightly from the published algorithm. + For more details, see the section "Points of difference from the published algorithm" described [[http://tartarus.org/~martin/PorterStemmer/|here]]. + <<Anchor(EnglishPorterFilter)>> ==== solr.EnglishPorterFilterFactory ==== @@ -347, +350 @@ Creates `org.apache.lucene.analysis.SnowballPorterFilter`. - Creates an [[http://snowball.tartarus.org/algorithms/english/stemmer.html|Porter2 stemmer]] from the Java classes generated from a [[http://snowball.tartarus.org/|Snowball]] specification. The language attribute is used to specify the language of the stemmer. + Creates an [[http://snowball.tartarus.org/texts/stemmersoverview.html|Snowball stemmer]] from the Java classes generated from a [[http://snowball.tartarus.org/|Snowball]] specification. The language attribute is used to specify the language of the stemmer. {{{ <fieldtype name="myfieldtype" class="solr.TextField"> <analyzer> @@ -358, +361 @@ }}} Valid values for the language attribute (creates the snowball stemmer class language + "Stemmer"): - * Danish - * Dutch - * English - * Finnish - * French - * German2 - * German - * Italian - * Kp - * Lovins - * Norwegian - * Porter - * Portuguese - * Russian - * Spanish - * Swedish + * [[http://snowball.tartarus.org/algorithms/danish/stemmer.html|Danish]] + * [[http://snowball.tartarus.org/algorithms/dutch/stemmer.html|Dutch]] + * [[http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stemmer.html|Kp]]: The Kraaij-Pohlmann stemming algorithm for Dutch. + * [[http://snowball.tartarus.org/algorithms/porter/stemmer.html|Porter]]: The original Porter stemming algorithm for English. + * [[http://snowball.tartarus.org/algorithms/english/stemmer.html|English]]: The Porter2 stemming algorithm for English. + * [[http://snowball.tartarus.org/algorithms/lovins/stemmer.html|Lovins]]: The early Lovins stemming algorithm for English. + * [[http://snowball.tartarus.org/algorithms/finnish/stemmer.html|Finnish]] + * [[http://snowball.tartarus.org/algorithms/french/stemmer.html|French]] + * [[http://snowball.tartarus.org/algorithms/german/stemmer.html|German]] + * [[http://snowball.tartarus.org/algorithms/german2/stemmer.html|German2]]: A variation of the German algorithm with handling to allow ä, ö and ü to be represented by ae, oe and ue + * [[http://snowball.tartarus.org/algorithms/italian/stemmer.html|Italian]] + * [[http://snowball.tartarus.org/algorithms/norwegian/stemmer.html|Norwegian]] + * [[http://snowball.tartarus.org/algorithms/portuguese/stemmer.html|Portuguese]] + * [[http://snowball.tartarus.org/algorithms/russian/stemmer.html|Russian]] + * [[http://snowball.tartarus.org/algorithms/spanish/stemmer.html|Spanish]] + * [[http://snowball.tartarus.org/algorithms/swedish/stemmer.html|Swedish]] + <!> Gotchas: + * Although the Lovins stemmer is described as faster than Porter/Porter2, practically it is much slower in Solr, as it is implemented using reflection. + * Neither the Lovins nor the Finnish stemmer produce correct output (as of Solr 1.4), due to a [[http://article.gmane.org/gmane.comp.search.snowball/1139|known bug in Snowball]] <<Anchor(WordDelimiterFilter)>> ==== solr.WordDelimiterFilterFactory ====
