KStem Token Filter
------------------
Key: SOLR-379
URL: https://issues.apache.org/jira/browse/SOLR-379
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Pieter Berkel
Priority: Minor
A Lucene / Solr implementation of the KStem stemmer. Full credit goes to Harry
Wagner for adapting the Lucene version found here:
http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi
Background discussion to this stemmer (including licensing issues) can be found
in this thread:
http://www.nabble.com/Embedded-about-50--faster-for-indexing-tf4325720.html#a12376295
I've made some minor changes to KStemFilterFactory so that it compiles cleanly
against trunk:
1) removed some unnecessary imports
2) changed the init() method parameters introduced by SOLR-215
3) moved KStemFilterFactory into package org.apache.solr.analysis
Once compiled and included in your Solr war (or as a jar in your lib directory,
the KStem filter can be used in your schema very easily:
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KStemFilterFactory" cacheSize="20000"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.