Past year I was following the progress of FTS5 with the idea of making a
multilingual stemmer extension. Due to being busy in other things I lost
track of it, until a couple of weeks ago, when I decided to upgrade the
code I had made back then.

It uses the Snowball generated libstemmer. It works pretty much as porter
stemmer (actually, Snowball is Mr. Porter's work). When you create the
table, you use tokenize = 'snowball language_here'.

This weekend I decided to publish it as is, in case someone found it useful
too. Not even 48 hours after that, I got questioned: how do I use multiple
languages? (I had the same question too).

I've been thinking about the right way to do it (aka, more close to perfect
results), and I guess the most obvious would be to get the language from a
field in the query each time the stemmer is called. But I don't know if
it's possible, and how to do it in a clean way.

Any suggestions on how to proceed?

PS: if anyone wants to use what I've got so far: https://github.com/
abiliojr/fts5-snowball
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to