is PyLucene able to handle a custom tokenization without any stemming process ?
PyLucene is able to handle anything Java Lucene is capable of handling since it wraps it (except for the RemoteSearchable class).
If you have questions on how to use Lucene, that are not specific to python, you should join the [email protected] mailing list where there is a wealth of information about how to use Lucene.
Also, most of all the 'Lucene in Action' book's samples are available, in python, in the PyLucene distribution. For examples on how to write a custom analyzer in python for PyLucene, look at the files in samples/LuceneInAction/lia/analysis.
Andi..
actually i would like to feed the index myself with words from different languages (thus inconsistant tokenization), but also sgml tags, and maybe even some numbers, will it be possible ? where can i found hints on where to look after that ? best regards, J. ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger Téléchargez cette version sur http://fr.messenger.yahoo.com _______________________________________________ pylucene-dev mailing list [email protected] http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
_______________________________________________ pylucene-dev mailing list [email protected] http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
