Past year I was following the progress of FTS5 with the idea of making a multilingual stemmer extension. Due to being busy in other things I lost track of it, until a couple of weeks ago, when I decided to upgrade the code I had made back then.
It uses the Snowball generated libstemmer. It works pretty much as porter stemmer (actually, Snowball is Mr. Porter's work). When you create the table, you use tokenize = 'snowball language_here'. This weekend I decided to publish it as is, in case someone found it useful too. Not even 48 hours after that, I got questioned: how do I use multiple languages? (I had the same question too). I've been thinking about the right way to do it (aka, more close to perfect results), and I guess the most obvious would be to get the language from a field in the query each time the stemmer is called. But I don't know if it's possible, and how to do it in a clean way. Any suggestions on how to proceed? PS: if anyone wants to use what I've got so far: https://github.com/ abiliojr/fts5-snowball _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users