Re: SUBJ_ILLEGAL_CHARS

2006-03-15 Thread Милен Панков

Matt Kettler написа:

Note that SUBJ_ILLEGAL_CHARS is NOT concerned with what language or character
set is used. It is concerned about it not being encoded properly.

Per RFC specifications, all characters in email-headers that aren't in the
normal ascii ranges must be QP encoded. This rule is essentially detecting that
the sender used extended range character sets, but their email client neglected
to properly QP encode it.
  

Can You please point which RFC is this and what exactly 'QP encoding' means.

Thanks,
Milen


Re: SUBJ_ILLEGAL_CHARS

2006-03-15 Thread Loren Wilton
 Can You please point which RFC is this and what exactly 'QP encoding'
means.

Someone else can doubtless point to the RFC, but as an example, your name in
the From address is encoded in Quoted Printable encoding.  I've added some
spaces to it below so that your mail client doesn't turn it back into
Cyrillic characters:

=? UTF-8 ? B ? 0JzQuNC70LXQvSDQn9Cw0L3QutC+0LI = ? =
[EMAIL PROTECTED]

The same general encoding should be used for a Subject line with non-ascii
characters, such as those in your name.

Loren



Re: SUBJ_ILLEGAL_CHARS

2006-03-15 Thread Craig Morrison

Милен Панков wrote:

Matt Kettler написа:
Note that SUBJ_ILLEGAL_CHARS is NOT concerned with what language or 
character

set is used. It is concerned about it not being encoded properly.

Per RFC specifications, all characters in email-headers that aren't in 
the
normal ascii ranges must be QP encoded. This rule is essentially 
detecting that
the sender used extended range character sets, but their email client 
neglected

to properly QP encode it.
  
Can You please point which RFC is this and what exactly 'QP encoding' 
means.


RFC 2822

QP = Quoted Printable



Thanks,
Milen





RE: SUBJ_ILLEGAL_CHARS

2006-03-15 Thread Randal, Phil
http://www.faqs.org/rfcs/rfc2822.html

Refer to Section 3.2.2 for information on quoted-pairs.

Cheers,

Phil


Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK  

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: 15 March 2006 15:22
 To: users@spamassassin.apache.org
 Cc: Matt Kettler
 Subject: Re: SUBJ_ILLEGAL_CHARS
 
 Matt Kettler написа:
  Note that SUBJ_ILLEGAL_CHARS is NOT concerned with what language or 
  character set is used. It is concerned about it not being 
 encoded properly.
 
  Per RFC specifications, all characters in email-headers 
 that aren't in 
  the normal ascii ranges must be QP encoded. This rule is 
 essentially 
  detecting that the sender used extended range character sets, but 
  their email client neglected to properly QP encode it.

 Can You please point which RFC is this and what exactly 'QP 
 encoding' means.
 
 Thanks,
 Milen
 


RE: SUBJ_ILLEGAL_CHARS

2006-03-15 Thread Randal, Phil
And 

  http://www.faqs.org/rfcs/rfc2047.html

Cheers,

Phil


Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK  

 -Original Message-
 From: Craig Morrison [mailto:[EMAIL PROTECTED] 
 Sent: 15 March 2006 15:31
 To: users@spamassassin.apache.org
 Subject: Re: SUBJ_ILLEGAL_CHARS
 
 Милен Панков wrote:
  Matt Kettler написа:
  Note that SUBJ_ILLEGAL_CHARS is NOT concerned with what 
 language or 
  character set is used. It is concerned about it not being encoded 
  properly.
 
  Per RFC specifications, all characters in email-headers 
 that aren't 
  in the normal ascii ranges must be QP encoded. This rule is 
  essentially detecting that the sender used extended range 
 character 
  sets, but their email client neglected to properly QP encode it.

  Can You please point which RFC is this and what exactly 'QP 
 encoding' 
  means.
 
 RFC 2822
 
 QP = Quoted Printable
 
  
  Thanks,
  Milen
  
 


Re: SUBJ_ILLEGAL_CHARS

2006-03-15 Thread Theo Van Dinter
On Wed, Mar 15, 2006 at 03:29:45PM -, Randal, Phil wrote:
  Can You please point which RFC is this and what exactly 'QP 
  encoding' means.
 http://www.faqs.org/rfcs/rfc2822.html
 Refer to Section 3.2.2 for information on quoted-pairs.

QP in this case does not mean quoted pairs, it means Quoted Printable
which is a MIME encoding ala http://www.faqs.org/rfcs/rfc1522.html and
http://www.faqs.org/rfcs/rfc2047.html

2822 is the RFC which talks about only US-ASCII (7-bit) in the headers,
see section 2.2.


Matt Ketler wrote:
 Per RFC specifications, all characters in email-headers that aren't in
 the normal ascii ranges must be QP encoded.

This isn't exactly correct.  2822 specifies that only US-ASCII may
appear in the headers, but it doesn't say what to do with characters
outside that range.  1522 and 2047 discusses how to use either QP or
Base64 encoding (either is valid) to deal with those headers.

-- 
Randomly Generated Tagline:
 Bender, we didn't mind your drinking or your cleptomania or your
 pornography ring. -Leela 
  In fact, that's why we love you. -Zoidberg 


pgprUGrzMZoKY.pgp
Description: PGP signature


Re: SUBJ_ILLEGAL_CHARS

2006-03-15 Thread Craig McLean
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Philip Prindeville wrote:
[snip]
  I mean it's not X.400, right?  ;-)

Thank the Gods...

C.
- --
Craig McLeanhttp://fukka.co.uk
[EMAIL PROTECTED]   Where the fun never starts
Powered by FreeBSD, and GIN!
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.1 (GNU/Linux)

iD8DBQFEGGJbMDDagS2VwJ4RAlycAJsEuPBxIMR1vwJqnlsT5nUdJKOK2wCeK4Ic
6Pq0jomOmnPcTWbH3muDC1o=
=weNm
-END PGP SIGNATURE-


Re: SUBJ_ILLEGAL_CHARS

2006-03-14 Thread Matt Kettler
Милен Панков wrote:
 Hi to all,
 
 I'm using spamassassin for years without any serious problems.

First: In my answer's I'm assuming you are running 3.1.0 or higher. If you
aren't please specify your version.

 Except for one. My users write messages mostly in bulgarian and the
 'SUBJ_ILLEGAL_CHARS' rule very often stops good mail.
 I have put in my local.cf the line 'ok_languages bg en', but it doesn't
 fix the problem. 

No, if anything that will make your problem WORSE. The default here is all. By
declaring an ok_languages you're limiting the number of acceptable languages.

Also note: this won't do anything at all unless you've got the textcat plugin
loaded in your v310.pre

For now I made this rule not giving any scores and this
 temporary fixes the problem. My question is how can I make it work
 without disabling it. I may be need to say to spamassassin not to check
 for specific encodings. For example there are at least 4 encodings my
 users use for writing/receiving mail (Windows-1251, KOI8-R, KOI8-U,
 UTF-8). How can I do that?

Note that SUBJ_ILLEGAL_CHARS is NOT concerned with what language or character
set is used. It is concerned about it not being encoded properly.

Per RFC specifications, all characters in email-headers that aren't in the
normal ascii ranges must be QP encoded. This rule is essentially detecting that
the sender used extended range character sets, but their email client neglected
to properly QP encode it.

Realistically, you have two options:

1) tell the sender their client isn't properly QP encoding Bulgarian 
text in
the subject headers.
2) accept that many email clients don't properly handle Bulgarian text, 
and
disable this rule by adding score SUBJ_ILLEGAL_CHARS 0 to your local.cf.







Re: SUBJ_ILLEGAL_CHARS

2006-03-14 Thread Милен Панков

Matt Kettler написа:


Милен Панков wrote:

Hi to all,

I'm using spamassassin for years without any serious problems.


First: In my answer's I'm assuming you are running 3.1.0 or higher. If you
aren't please specify your version.


Yes, it's 3.1.0, sorry




Except for one. My users write messages mostly in bulgarian and the
'SUBJ_ILLEGAL_CHARS' rule very often stops good mail.
I have put in my local.cf the line 'ok_languages bg en', but it doesn't
fix the problem. 


No, if anything that will make your problem WORSE. The default here is all. By
declaring an ok_languages you're limiting the number of acceptable languages.

Also note: this won't do anything at all unless you've got the textcat plugin
loaded in your v310.pre



Ok. I'll have that in mind.


For now I made this rule not giving any scores and this

temporary fixes the problem. My question is how can I make it work
without disabling it. I may be need to say to spamassassin not to check
for specific encodings. For example there are at least 4 encodings my
users use for writing/receiving mail (Windows-1251, KOI8-R, KOI8-U,
UTF-8). How can I do that?


Note that SUBJ_ILLEGAL_CHARS is NOT concerned with what language or character
set is used. It is concerned about it not being encoded properly.

Per RFC specifications, all characters in email-headers that aren't in the
normal ascii ranges must be QP encoded. This rule is essentially detecting that
the sender used extended range character sets, but their email client neglected
to properly QP encode it.

Realistically, you have two options:

1) tell the sender their client isn't properly QP encoding Bulgarian 
text in
the subject headers.
2) accept that many email clients don't properly handle Bulgarian text, 
and
disable this rule by adding score SUBJ_ILLEGAL_CHARS 0 to your local.cf.



Well this happens mostly when we receive mail from some webmails for 
example Yahoo, so I'm stuck with the second option, which I'm already using.


Thanks,
Milen


Re: SUBJ_ILLEGAL_CHARS

2006-03-14 Thread Philip Prindeville
Милен Панков wrote:
 Matt Kettler написа:
Realistically, you have two options:

  1) tell the sender their client isn't properly QP encoding Bulgarian 
 text in
the subject headers.
  2) accept that many email clients don't properly handle Bulgarian text, 
 and
disable this rule by adding score SUBJ_ILLEGAL_CHARS 0 to your local.cf.

 
 
 Well this happens mostly when we receive mail from some webmails for 
 example Yahoo, so I'm stuck with the second option, which I'm already using.
 
 Thanks,
 Milen


It's an issue, to be sure.  And people need to be edumacated.

I recently pointed out to the IT department at Dice.com that they were sending
out malformed Date: lines that were causing their emails to trigger against
ILLEGAL_DATE...  which most mailers manage to get right, so it's a fairly good
indicator of spam and can be safely cranked way up.

In fact, I pointed out chapter and verse from RFC-2821 where they were going
wrong, and how to fix it (by padding the hour out with a leading zero before
10am).

They told me they appreciated my suggestion.

I reminded them that it wasn't a suggestion, it was a conclusive documentation
of where they were failing to conform to a 25 year-old specification that is,
in fact, trivial... all things considered.   I mean it's not X.400, right?  ;-)

Have they fixed it?

Not the last time I checked.

You'd think that given the nature of what they do, they'd have their pick of
the crop for good IT and messaging people.

Guess not.

Kind of makes me think twice about posting my resume with them.  :-(

-Philip