On Wed, 25 Feb 2004, Ilan Aisic wrote:
> Hi Everyone,
>
> In /etc/mail/spamassassin/local.df I have the following line:
>
> ok_languages en he
>
> This supposedly allows English and Hebrew.
> However, I get lots of false positive on Hebrew letters because it wrongly
> identifies Hebrew letters as "raw illegal characters" and gives them high
> scores.
> You can see 7 points are contributed to the scroe just from these 2 lines
> copied from the reports:
>
> 4.3 FROM_ILLEGAL_CHARS From contains too many raw illegal
> characters
> 2.7 SUBJ_ILLEGAL_CHARS Subject contains too many raw illegal
> characters
>
> Anyone can advise on this?
No that isn't a false positive, that's an appropriate hit against a
bad e-mail client that is violating internet standards.
Internet standard RFC-2822 (section 2.2) unequivocally states that you
MUST use only 7-bit characters in e-mail HEADERS. If you want to represent
non-7-bit characters in a header (such as 'From' or 'Subject') you must
use some kind of encoding (such as 'QP' or Base64), not the "raw" data.
Good e-mail client programs follow RFC standards and would not generate
such messages. Spammers usually are not concerned with following
standards and are more likely to generate such garbage.
The "ok_languages" option sets the types of languages that you will
accept, after they're decoded. (IE what kinds of character sets
can be encoded).
Thus if somebody were to send you a message with an encoded Korean
subject your SA would score against that.
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{