Re: Problems with Cyrillic spam

darxus Thu, 15 Dec 2011 10:15:32 -0800

On 12/15, Martin Gregorie wrote:
> In that case I'm missing some information: how to write a rule that can
> interpret the value(s) returned by TextCat.


I think you're looking for:

ok_languages en fr de

- 
http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Plugin_TextCat.html

> Why wouldn't it be sensible to rewrite ok_locales to compare TextCat
> return value(s) against its list of OK codes?

Because that functionality already exists within TextCat?  

> Then why has ok_locales not been fixed already? This is not a criticism,
> just a request for information. Is it something that's difficult to do
> efficiently? I'd imagine that language recognition by looking codepoint
> values is possible but not necessarily fast nor unambiguous.

Because it's not actually broken.  That bug should probably be closed.
Perhaps after noting the limited utility in the documentation.

ok_locales functions by identifying character sets that can only be used
for a specific language.  UTF8, Windows-1255, and koi8 are not such
character sets, because they can also be used to write in English.  

And, most importantly, as Kevin says here, people *do* use those character
sets to write in English:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078#c27

Well, it's obvious that people write English in UTF8.  

> I've no time ATM and in any case I'm a middling to poor Perl coder. Now,
> if SA was written in C or Java....

I bet you know that's the best way to get better at a language.

-- 
"If you are not paranoid... you may not be paying attention."
 - j...@creative-net.net, on an IDPA mailing list
http://www.ChaosReigns.com

Re: Problems with Cyrillic spam

Reply via email to