Re: Scoring for rule SUBJ_ILLEGAL_CHARS
Kelson wrote on Fri, 12 May 2006 14:23:55 -0700: I count two: The ü in für and the ´ in MODEL´S, which is different from the ASCII single quote/apostrophe: ' Ah, you are right, I missed the ü, it's too natural for me. Nevertheless too many implies a bit more than *two* for me. I can't exactly say how much, but I'd use a better description. The rule is an eval rule, so I don't know how many characters it needs, maybe it's really just one. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: Scoring for rule SUBJ_ILLEGAL_CHARS
Theo Van Dinter wrote on Thu, 11 May 2006 13:49:11 -0400: fwiw, the 8-bit characters ought to be encoded in base64 or quoted-printable. then the rule wouldn't hit. I just found the same problem here with a whole bunch of messages coming from the same source. It seems the rule hits on *one* occurence of a non-ASCII character, however, the description says Subject: has too many raw illegal characters. At least the description is wrong then. And, as Keith explains, I think that score is excessive. It's fairly common that some mail programs, especially if webmail or form-generated, have at least one none-encoded character in the subject. The subject line hitting in the case of our customer was: Bewerbung für INS-2006-05-4, MODEL´S GESUCHT!!! I can identify only one character that is outside the ASCII range. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: Scoring for rule SUBJ_ILLEGAL_CHARS
Kai Schaetzl wrote: The subject line hitting in the case of our customer was: Bewerbung für INS-2006-05-4, MODEL´S GESUCHT!!! I can identify only one character that is outside the ASCII range. I count two: The ü in für and the ´ in MODEL´S, which is different from the ASCII single quote/apostrophe: ' -- Kelson Vibber SpeedGate Communications www.speed.net
Re: Scoring for rule SUBJ_ILLEGAL_CHARS
From: Kai Schaetzl [EMAIL PROTECTED] Theo Van Dinter wrote on Thu, 11 May 2006 13:49:11 -0400: fwiw, the 8-bit characters ought to be encoded in base64 or quoted-printable. then the rule wouldn't hit. I just found the same problem here with a whole bunch of messages coming from the same source. It seems the rule hits on *one* occurence of a non-ASCII character, however, the description says Subject: has too many raw illegal characters. At least the description is wrong then. And, as Keith explains, I think that score is excessive. It's fairly common that some mail programs, especially if webmail or form-generated, have at least one none-encoded character in the subject. The subject line hitting in the case of our customer was: Bewerbung für INS-2006-05-4, MODEL´S GESUCHT!!! I can identify only one character that is outside the ASCII range. Kai 1 is too many, of course. {^_-}
Scoring for rule SUBJ_ILLEGAL_CHARS
I've recently had a couple of false positives caused by this rule, and think it may be scored too highly for a single check. The e-mails in question were in Spanish, and the Spanish word for linguistics has two accented characters which is enough to trigger this rule. Admittedly, the blacklists account for 2.4 points (it was from Yahoo) but the 4.3 point score for the subject alone strikes me as excessive. I understand that anything that is not English is inherently suspect for most users, but to give 86% of the default spam score on almost *any* single rule would seem to me to be overkill. Alternatively, is there (or should there be) a ruleset for those who wish to receive e-mail in other languages? Ideally, a Spanish-friendly ruleset would reduce the scores of character-based rules, while adding in rules for known spam in Spanish where possible. Does such a thing already exist? Should it? The spam report from the e-mail in question follows, although the above pretty much sums it up. X-Spam-Report: * 0.0 DK_POLICY_SIGNSOME Domain Keys: policy says domain signs some mails * 0.0 DK_POLICY_TESTING Domain Keys: policy says domain is testing DK * 4.3 SUBJ_ILLEGAL_CHARS Subject: has too many raw illegal characters * 0.0 DK_SIGNED Domain Keys: message has an unverified signature * -0.0 DK_VERIFIED Domain Keys: signature passes verification * 0.5 HTML_40_50 BODY: Message is 40% to 50% HTML * 0.0 HTML_MESSAGE BODY: HTML included in message * 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% * [score: 0.5000] * 0.2 DNS_FROM_RFC_ABUSE RBL: Envelope sender in abuse.rfc-ignorant.org * 1.4 DNS_FROM_RFC_WHOIS RBL: Envelope sender in whois.rfc-ignorant.org * 0.8 RCVD_IN_BLARS RBL: Received via a relay in block.blars.org * [217.216.40.199 listed in block.blars.org] [66.163.178.160 listed in block.blars.org] * -0.5 AWL AWL: From: address is in the auto white-list Regards, Keith
Re: Scoring for rule SUBJ_ILLEGAL_CHARS
On Thu, May 11, 2006 at 07:47:15PM +0200, Keith Dunnett wrote: I've recently had a couple of false positives caused by this rule, and think it may be scored too highly for a single check. The e-mails in question were in Spanish, and the Spanish word for linguistics has two accented characters which is enough to trigger this rule. fwiw, the 8-bit characters ought to be encoded in base64 or quoted-printable. then the rule wouldn't hit. Admittedly, the blacklists account for 2.4 points (it was from Yahoo) but the 4.3 point score for the subject alone strikes me as excessive. I understand that anything that is not English is inherently suspect for most users, but to give 86% of the default spam score on almost *any* single rule would seem to me to be overkill. It's actually less about english vs non-english and more about messages violating the rfc (non 7-bit ascii chars need to be encoded in the header). however, english maps to 7-bit ascii very well, so ... -- Randomly Generated Tagline: They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. - Benjamin Franklin pgp7y7upt628P.pgp Description: PGP signature
Re: SUBJ_ILLEGAL_CHARS
Matt Kettler написа: Note that SUBJ_ILLEGAL_CHARS is NOT concerned with what language or character set is used. It is concerned about it not being encoded properly. Per RFC specifications, all characters in email-headers that aren't in the normal ascii ranges must be QP encoded. This rule is essentially detecting that the sender used extended range character sets, but their email client neglected to properly QP encode it. Can You please point which RFC is this and what exactly 'QP encoding' means. Thanks, Milen
Re: SUBJ_ILLEGAL_CHARS
Can You please point which RFC is this and what exactly 'QP encoding' means. Someone else can doubtless point to the RFC, but as an example, your name in the From address is encoded in Quoted Printable encoding. I've added some spaces to it below so that your mail client doesn't turn it back into Cyrillic characters: =? UTF-8 ? B ? 0JzQuNC70LXQvSDQn9Cw0L3QutC+0LI = ? = [EMAIL PROTECTED] The same general encoding should be used for a Subject line with non-ascii characters, such as those in your name. Loren
Re: SUBJ_ILLEGAL_CHARS
Милен Панков wrote: Matt Kettler написа: Note that SUBJ_ILLEGAL_CHARS is NOT concerned with what language or character set is used. It is concerned about it not being encoded properly. Per RFC specifications, all characters in email-headers that aren't in the normal ascii ranges must be QP encoded. This rule is essentially detecting that the sender used extended range character sets, but their email client neglected to properly QP encode it. Can You please point which RFC is this and what exactly 'QP encoding' means. RFC 2822 QP = Quoted Printable Thanks, Milen
RE: SUBJ_ILLEGAL_CHARS
http://www.faqs.org/rfcs/rfc2822.html Refer to Section 3.2.2 for information on quoted-pairs. Cheers, Phil Phil Randal Network Engineer Herefordshire Council Hereford, UK -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 15 March 2006 15:22 To: users@spamassassin.apache.org Cc: Matt Kettler Subject: Re: SUBJ_ILLEGAL_CHARS Matt Kettler написа: Note that SUBJ_ILLEGAL_CHARS is NOT concerned with what language or character set is used. It is concerned about it not being encoded properly. Per RFC specifications, all characters in email-headers that aren't in the normal ascii ranges must be QP encoded. This rule is essentially detecting that the sender used extended range character sets, but their email client neglected to properly QP encode it. Can You please point which RFC is this and what exactly 'QP encoding' means. Thanks, Milen
RE: SUBJ_ILLEGAL_CHARS
And http://www.faqs.org/rfcs/rfc2047.html Cheers, Phil Phil Randal Network Engineer Herefordshire Council Hereford, UK -Original Message- From: Craig Morrison [mailto:[EMAIL PROTECTED] Sent: 15 March 2006 15:31 To: users@spamassassin.apache.org Subject: Re: SUBJ_ILLEGAL_CHARS Милен Панков wrote: Matt Kettler написа: Note that SUBJ_ILLEGAL_CHARS is NOT concerned with what language or character set is used. It is concerned about it not being encoded properly. Per RFC specifications, all characters in email-headers that aren't in the normal ascii ranges must be QP encoded. This rule is essentially detecting that the sender used extended range character sets, but their email client neglected to properly QP encode it. Can You please point which RFC is this and what exactly 'QP encoding' means. RFC 2822 QP = Quoted Printable Thanks, Milen
Re: SUBJ_ILLEGAL_CHARS
On Wed, Mar 15, 2006 at 03:29:45PM -, Randal, Phil wrote: Can You please point which RFC is this and what exactly 'QP encoding' means. http://www.faqs.org/rfcs/rfc2822.html Refer to Section 3.2.2 for information on quoted-pairs. QP in this case does not mean quoted pairs, it means Quoted Printable which is a MIME encoding ala http://www.faqs.org/rfcs/rfc1522.html and http://www.faqs.org/rfcs/rfc2047.html 2822 is the RFC which talks about only US-ASCII (7-bit) in the headers, see section 2.2. Matt Ketler wrote: Per RFC specifications, all characters in email-headers that aren't in the normal ascii ranges must be QP encoded. This isn't exactly correct. 2822 specifies that only US-ASCII may appear in the headers, but it doesn't say what to do with characters outside that range. 1522 and 2047 discusses how to use either QP or Base64 encoding (either is valid) to deal with those headers. -- Randomly Generated Tagline: Bender, we didn't mind your drinking or your cleptomania or your pornography ring. -Leela In fact, that's why we love you. -Zoidberg pgprUGrzMZoKY.pgp Description: PGP signature
Re: SUBJ_ILLEGAL_CHARS
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Philip Prindeville wrote: [snip] I mean it's not X.400, right? ;-) Thank the Gods... C. - -- Craig McLeanhttp://fukka.co.uk [EMAIL PROTECTED] Where the fun never starts Powered by FreeBSD, and GIN! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2.1 (GNU/Linux) iD8DBQFEGGJbMDDagS2VwJ4RAlycAJsEuPBxIMR1vwJqnlsT5nUdJKOK2wCeK4Ic 6Pq0jomOmnPcTWbH3muDC1o= =weNm -END PGP SIGNATURE-
SUBJ_ILLEGAL_CHARS
Hi to all, I'm using spamassassin for years without any serious problems. Except for one. My users write messages mostly in bulgarian and the 'SUBJ_ILLEGAL_CHARS' rule very often stops good mail. I have put in my local.cf the line 'ok_languages bg en', but it doesn't fix the problem. For now I made this rule not giving any scores and this temporary fixes the problem. My question is how can I make it work without disabling it. I may be need to say to spamassassin not to check for specific encodings. For example there are at least 4 encodings my users use for writing/receiving mail (Windows-1251, KOI8-R, KOI8-U, UTF-8). How can I do that? Milen
Re: SUBJ_ILLEGAL_CHARS
Милен Панков wrote: Hi to all, I'm using spamassassin for years without any serious problems. First: In my answer's I'm assuming you are running 3.1.0 or higher. If you aren't please specify your version. Except for one. My users write messages mostly in bulgarian and the 'SUBJ_ILLEGAL_CHARS' rule very often stops good mail. I have put in my local.cf the line 'ok_languages bg en', but it doesn't fix the problem. No, if anything that will make your problem WORSE. The default here is all. By declaring an ok_languages you're limiting the number of acceptable languages. Also note: this won't do anything at all unless you've got the textcat plugin loaded in your v310.pre For now I made this rule not giving any scores and this temporary fixes the problem. My question is how can I make it work without disabling it. I may be need to say to spamassassin not to check for specific encodings. For example there are at least 4 encodings my users use for writing/receiving mail (Windows-1251, KOI8-R, KOI8-U, UTF-8). How can I do that? Note that SUBJ_ILLEGAL_CHARS is NOT concerned with what language or character set is used. It is concerned about it not being encoded properly. Per RFC specifications, all characters in email-headers that aren't in the normal ascii ranges must be QP encoded. This rule is essentially detecting that the sender used extended range character sets, but their email client neglected to properly QP encode it. Realistically, you have two options: 1) tell the sender their client isn't properly QP encoding Bulgarian text in the subject headers. 2) accept that many email clients don't properly handle Bulgarian text, and disable this rule by adding score SUBJ_ILLEGAL_CHARS 0 to your local.cf.
Re: SUBJ_ILLEGAL_CHARS
Matt Kettler написа: Милен Панков wrote: Hi to all, I'm using spamassassin for years without any serious problems. First: In my answer's I'm assuming you are running 3.1.0 or higher. If you aren't please specify your version. Yes, it's 3.1.0, sorry Except for one. My users write messages mostly in bulgarian and the 'SUBJ_ILLEGAL_CHARS' rule very often stops good mail. I have put in my local.cf the line 'ok_languages bg en', but it doesn't fix the problem. No, if anything that will make your problem WORSE. The default here is all. By declaring an ok_languages you're limiting the number of acceptable languages. Also note: this won't do anything at all unless you've got the textcat plugin loaded in your v310.pre Ok. I'll have that in mind. For now I made this rule not giving any scores and this temporary fixes the problem. My question is how can I make it work without disabling it. I may be need to say to spamassassin not to check for specific encodings. For example there are at least 4 encodings my users use for writing/receiving mail (Windows-1251, KOI8-R, KOI8-U, UTF-8). How can I do that? Note that SUBJ_ILLEGAL_CHARS is NOT concerned with what language or character set is used. It is concerned about it not being encoded properly. Per RFC specifications, all characters in email-headers that aren't in the normal ascii ranges must be QP encoded. This rule is essentially detecting that the sender used extended range character sets, but their email client neglected to properly QP encode it. Realistically, you have two options: 1) tell the sender their client isn't properly QP encoding Bulgarian text in the subject headers. 2) accept that many email clients don't properly handle Bulgarian text, and disable this rule by adding score SUBJ_ILLEGAL_CHARS 0 to your local.cf. Well this happens mostly when we receive mail from some webmails for example Yahoo, so I'm stuck with the second option, which I'm already using. Thanks, Milen
Re: SUBJ_ILLEGAL_CHARS
Милен Панков wrote: Matt Kettler написа: Realistically, you have two options: 1) tell the sender their client isn't properly QP encoding Bulgarian text in the subject headers. 2) accept that many email clients don't properly handle Bulgarian text, and disable this rule by adding score SUBJ_ILLEGAL_CHARS 0 to your local.cf. Well this happens mostly when we receive mail from some webmails for example Yahoo, so I'm stuck with the second option, which I'm already using. Thanks, Milen It's an issue, to be sure. And people need to be edumacated. I recently pointed out to the IT department at Dice.com that they were sending out malformed Date: lines that were causing their emails to trigger against ILLEGAL_DATE... which most mailers manage to get right, so it's a fairly good indicator of spam and can be safely cranked way up. In fact, I pointed out chapter and verse from RFC-2821 where they were going wrong, and how to fix it (by padding the hour out with a leading zero before 10am). They told me they appreciated my suggestion. I reminded them that it wasn't a suggestion, it was a conclusive documentation of where they were failing to conform to a 25 year-old specification that is, in fact, trivial... all things considered. I mean it's not X.400, right? ;-) Have they fixed it? Not the last time I checked. You'd think that given the nature of what they do, they'd have their pick of the crop for good IT and messaging people. Guess not. Kind of makes me think twice about posting my resume with them. :-( -Philip