> It's more of the implementation that needs an update than TextCat > algorithm > itself. > > Charset/case awareness: > https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6229 > > Better database: > https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4152 > > Etc.. feel free to chime in..
There is one more thing I guess it should be fixed (or at least I can't get why it is the way it is right now): charsets in TextCat language database. Why are languages in the database expressed in different charsets? Isn't it better to have them in unicode only?