RE: [pylucene-dev] First shot at custom tokenfilter

Ofer Nave Mon, 26 Mar 2007 16:01:09 -0800

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Ofer Nave
> Sent: Monday, March 26, 2007 4:49 PM
> 
> I checked the PyLucene README, and the note regarding custom 
> tokenfilters said this:
> 
>        "In order to instantiate such a custom token filter, 
> the additional
>        tokenFilter() factory method defined on
>        org.apache.lucene.analysis.TokenStream instances needs 
> to be invoked
>        with the Python extension instance."
> 
> However, I couldn't find reference to any tokenFilter() 
> methods in the TokenStream class family in the Lucene 2.1 docs.


I finally figured out that it might be smart to compare my implementation to
the PyLucene version of the SynonymAnalyzer and SynonymFilter classes from
LIA (yeah, I'm slow).

The SynonnymAnalyzer class defines tokenStream like this:

    def tokenStream(self, fieldName, reader):

        tokenStream =
LowerCaseFilter(StandardFilter(StandardTokenizer(reader)))
        tokenStream = StopFilter(tokenStream, StandardAnalyzer.STOP_WORDS)
        filter = SynonymFilter(tokenStream, self.engine)

        return tokenStream.tokenFilter(filter)

I found this very strange (especially the part about giving the filter the
stream object AND giving the stream the filter object), but it does match
the note in the README regarding the tokenFilter() factory method that I
previously didn't understand.

I reimplemented my FooAnalyzer using this pattern and it works now.  I still
don't know why, but at least it works. :)

-ofer

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

RE: [pylucene-dev] First shot at custom tokenfilter

Reply via email to