euphobot wrote:

Thanks Jason,

The RFC 2047 was exactly what I was missing.  Especially nice that it is so
short and sweet -- also easy to miss.    Your summary is even more succinct.

It appears that this is a useful rfc for using non latin 1 languages in
subjects and the 'personal' part of from/to addresses. But
"=?iso-8859-1?B?...?=" or "=?iso-8859-1?Q?...?=" are both useless and
obvious attempts to confuse. So I think they both signal spam.


Sounds like a good guess to me :)
It is the default encoding so not many programs would set that encoding.

Living in NZ you may be closer to the action in character coding and where
more people are aware of this facility. Are there any useful exceptions to
ruling these two as spam?


I probably aren't the best person to answer this question as I don't spend too much time on dealing with spam. I use JAMES with authentication and let Mozilla Thunderbird filter any other spam I have (which isn't much). You were just lucky that I was diving into enabling my application to be able to send multi-lingual email messages over the last week. I think I encountered this character set suff more because I have a Japanese wife, so I have made the small step of making my application multi-lingual too.

Don

----- Original Message ----- From: "Jason Lea" <[EMAIL PROTECTED]>
To: "James Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, December 31, 2003 2:50 PM
Subject: Re: Interpreting Subject line





I have been playing around with email encoding in the last few days -
trying to send UTF-8 encoded email so that stuff is defined in RFC2047...

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

charset=iso-8859-1 (normal Latin-1 encoding)
encoding=B (can be either B or Q encoding)



encoded-text=UGF5IFBlbm5pZXMgb24gdGhlIERvbGxhciBmb3IgeW91ciBQcmVzY3JpcCh0aW9
uIQ==


By chosing the 'B' encoding it is causing every character to be encoded
using BASE64 encoding.  Normally if you send mostly ASCII characters you
would choose type Q where the 8-bit characters are converted into =XX
format (eg '=' becomes '=3D', SPACE becomes '=20').

see here for more info http://www.faqs.org/rfcs/rfc2047

Vinny wrote:



Sorry if this sounds stupid, but how do you interpret that crazy string


of


characters? Is that sound king of screwed up MD5 sum or something?





=?iso-8859-1?B?>UGF5IFBlbm5pZXMgb24gdGhlIERvbGxhciBmb3IgeW91ciBQcmVzY3JpcC


h




0a




W9uIQ==?=




-Vinny


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]






--
Jason Lea



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






--
Jason Lea



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to