Andi Vajda wrote:
is PyLucene able to handle a custom tokenization
without any stemming process ?
PyLucene is able to handle anything Java Lucene is capable of handling
since it wraps it (except for the RemoteSearchable class).
If you have questions on how to use Lucene, that are not specific to
python, you should join the [email protected] mailing list
where there is a wealth of information about how to use Lucene.
Also, most of all the 'Lucene in Action' book's samples are available,
in python, in the PyLucene distribution. For examples on how to write
a custom analyzer in python for PyLucene, look at the files in
samples/LuceneInAction/lia/analysis.
In fact you can even write nice simple pure-Python analyzers:
# This one just returns the whole string as a single token...
class ExactAnalyzer(object):
def tokenStream(self, fieldName, reader):
"""tokenStream method for PyLucene"""
input = reader.read()
return iter([PyLucene.Token(input, 0, len(input)), None])
David
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev