[ 
https://issues.apache.org/jira/browse/SOLR-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605169#action_12605169
 ] 

Pieter Berkel commented on SOLR-379:
------------------------------------

As far as I'm aware KStemFilterFactory.java was written by Harry Wagner so if 
he's happy to grant ASL it should be possible to include that in the repo.  
Everything in "/src/java/org/apache/lucene/analysis" has been copied from 
KStem.jar which was originally downloaded from CIIR, so if that can possibly be 
loaded on demand, then it should be fairly straightforward to include support 
for this stemmer in Solr.


> KStem Token Filter
> ------------------
>
>                 Key: SOLR-379
>                 URL: https://issues.apache.org/jira/browse/SOLR-379
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Pieter Berkel
>            Priority: Minor
>         Attachments: KStemSolr.zip
>
>
> A Lucene / Solr implementation of the KStem stemmer.  Full credit goes to 
> Harry Wagner for adapting the Lucene version found here:
> http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi
> Background discussion to this stemmer (including licensing issues) can be 
> found in this thread:
> http://www.nabble.com/Embedded-about-50--faster-for-indexing-tf4325720.html#a12376295
> I've made some minor changes to KStemFilterFactory so that it compiles 
> cleanly against trunk:
> 1) removed some unnecessary imports
> 2) changed the init() method parameters introduced by SOLR-215
> 3) moved KStemFilterFactory into package org.apache.solr.analysis
> Once compiled and included in your Solr war (or as a jar in your lib 
> directory, the KStem filter can be used in your schema very easily:
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt"/>
>         <filter class="solr.StandardFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KStemFilterFactory" cacheSize="20000"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to