Re: [pylucene-dev] PyLucene custom charTokenizer?

Andi Vajda Fri, 20 Jul 2007 09:49:01 -0700


On Fri, 20 Jul 2007, keekles keekles wrote:

I'm not that familiar with lucene, but basically what im looking to
accomplish is the equivalent of a whitespace tokenizer with my own list of
delimiters, in Lucene docs it just looks like simple inheritance but I dont
really see any examples in PyLucene on how to subclass a charTokenizer other
then the class from the lia SimpleKeywordAnalyzer which does not appear to
be used or work as far as i can see. I realize this is probably a bit out of
place to be asking here, but could someone explain or show me a valid
example of a custom analyzer using a custom charTokenizer in PyLucene?

Because PyLucene wraps a gcj-compiled Java Lucene you cannot simply extend aPyLucene python class and expect your extension to be known to theJava Lucene library.

Instead, there are a number of extension points that are pre-definedextensions to Java Lucene classes that can delegate (wrap in reverse) anycorresponding python class that implements the protocol of the wrapper.

All APIs that accept an instance of a class whose protocol can be implementedin python will do the right thing in wrapping your python customization withan instance of the corresponding pre-defined extension point class. Some ofthese customizations need to be created via a constructor though because thereis no API that accepts a parameter of the given type. This capability wasmissing for CharTokenizer until I added it just now.

I fixed and moved the SimpleKeywordAnalyzer class to the lia/analysis/keyworddirectory and I added a new unit test to the KeywordAnalyzerTest.py file thatillustrates how to use it.


These changes are checked into the PyLucene trunk rev 334.

Andi..

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Re: [pylucene-dev] PyLucene custom charTokenizer?

Reply via email to