On Feb 3, 2005, at 9:26 AM, Owen Densmore wrote:
Is this the right way to make a porter analyzer using the standard
tokenizer? I'm not sure about the order of the filters.
Owen
class MyAnalyzer extends Analyzer {
public TokenStream tokenStream(String fieldName, Reader reader) {
return new PorterStemFilter(
new StopFilter(
new LowerCaseFilter(
new StandardFilter(
new StandardTokenizer(reader))),
StopAnalyzer.ENGLISH_STOP_WORDS));
}
}
Yes, that is correct.
Analysis starts with a tokenizer, and chains the output of that to the
next filter and so on.
I strongly recommend, as you start tinkering with custom analysis, to
use a little bit of code to see how your analyzer works on some text.
The Lucene Intro article I wrote for java.net has some code you can
borrow to do this, as does Lucene in Action's source code. Also, Luke
has this capability - which is a tool I also highly recommend.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]