Re: UTF-16 Encoding doesn't work

NoOp Tue, 18 Mar 2014 08:40:35 -0700

On 03/17/2014 10:28 PM, Philip Chee wrote:
> On 18/03/2014 10:53, NoOp wrote:
>> On 03/14/2014 07:32 PM, NoOp wrote:
>>> On 03/14/2014 07:25 PM, NoOp wrote:
>>>> On 03/14/2014 07:19 PM, Paul B. Gallagher wrote:
>>>>> NoOp wrote:
>>>>> 
>>>>>> User agent: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101
>>>>>> Firefox/27.0 SeaMonkey/2.24
>>>>>> Build identifier: 20140203230449
>>>>>>
>>>>>> Received an email today from a US government agency that is encoded with:
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> Reset Password
>>>>>> Dear <user>,
>>>>>> Click on the link below to reset your password.
>>>>>> Reset Password
>>>>>> This link will expire in 24 hours.
>>>>>> Kind Regards,
>>>>>> Human Resources
>>>>>> THIS IS AN AUTOMATIC SYSTEM GENERATED EMAIL. PLEASE DO NOT RESPOND TO
>>>>>> THIS MESSAGE.
>>>>>>
>>>>>> (No link actually appears in the Opera)
>>>>> 
>>>>> I would take one look at that and flag it as junk; it looks like the 
>>>>> last five phishing attempts that crossed my desk. Do you have any reason 
>>>>> to believe it's legit and worth your time?
>>>>> 
>>>> 
>>>> Actually it's not junk/phishing at all - it's a valid email from
>>>> usps.gov and is a response to my request to reset a password.
>>>> 
>>>> 
>>>> 
>>> 
>>> Just tested with Evolution 3.2.3 (linux) and it correctly decodes the
>>> email body (including the link):
>>> ====
>>> Reset Password
>>> 
>>> 
>>> Dear <user>,
>>> 
>>> Click on the link below to reset your password.
>>> HTTPS://WP1-EXT.USPS.GOV/...<rest of link details redacted>
>>> 
>>> Reset Password
>>> 
>>> This link will expire in 24 hours.
>>> ====
>>> 
>> 
>> Well here is an interesting twist; it appears that the issue is with the
>> SeaMonkey/Thunderbird email display decoding. When I compose a reply,
>> the message text appears properly:
>> 
>> On 3/14/2014 3:32 PM, ERP Workflow System wrote:> Reset Password
>>> Dear <user>,
>>> Click on the link below to reset your password.
>>> Reset Password
>>> <HTTPS://WP1-EXT.USPS.GOV/sap/...rest of url redacted
>>> This link will expire in 24 hours.
>>> Kind Regards,
>>> Human Resources
>>> THIS IS AN AUTOMATIC SYSTEM GENERATED EMAIL.  PLEASE DO NOT RESPOND TO
>> THIS MESSAGE.
>>>
>> 
>> Tested in both linux and Windows versions of SeaMonkey & Thunderbird.
>> 
>> I guess it's time to add mozilla.dev.apps.seamonkey to the thread...
>> 
>> Maybe related:
>> <https://bugzilla.mozilla.org/show_bug.cgi?id=779087>
>> Bug 779087 - The sourcecode view is brohen opening mail written in
>> UTF-16 (If mail of charset=UTF-16, View/Message Source shows entire
>> message source data as UTF-16)
> 
> For the true horror of email character sets read:
> <http://quetzalcoatal.blogspot.com/2014/03/understanding-email-charsets.html>
> 
> Phil
>


Yes, I've seen that. Perhaps you'll like part 2 :-)

<http://quetzalcoatal.blogspot.com/2013/10/why-email-is-hard-part-2.html>

<quote>
...
Many cross-platform C and C++ programs implicitly require UTF-16 due to
its pervasive inclusion into the Windows operating system and common
internationalization libraries [3]. Unsurprisingly, non-BMP characters
tend to quickly run into all sorts of hangups by unaware code. For
example, right now, it is possible to coax Thunderbird to render these
characters unusable in, say, your subject string if the subject is just
right, and I suspect similar bugs exist in a majority of email
applications [4].
...
[3] C and C++ have a built-in internationalization and localization API,
derived from POSIX. However, this API is generally unsuited to the full
needs of people who actually care about these topics, so it's not really
worth mentioning.
[4] The basic algorithm to encode RFC 2047 strings for any charset are
to try to shift characters into the output string until you hit the
maximum word length. If the internal character set for Unicode
conversion is UTF-16 instead of UTF-32 and the code is ignorant of
surrogate concerns, then this algorithm could break surrogates apart.
This is exactly how the bug is triggered in Thunderbird.
...
</quote>

<https://www.google.com/#q=mozilla+%2B+%22UTF-16LE%22>
More related:
<https://bugzilla.mozilla.org/show_bug.cgi?id=343133>
Bug 343133 - simtaot.co.nr - source code rendered, problems detecting
UTF-16LE encoding?
<https://bugzilla.mozilla.org/show_bug.cgi?id=634541>
Bug 634541 - Since UTF-16BE and UTF-16LE decoders swallow an initial
BOM, the HTML5 parser should feed them a BOM to swallow
<https://bugzilla.mozilla.org/show_bug.cgi?id=961983>
Bug 961983 - Replying a message with an UTF-16LE content-encoding sends
corrupt messages
<https://bugzilla.mozilla.org/show_bug.cgi?id=936440>
Bug 936440 - Report UTF-16LE or UTF-16BE instead of UTF-16 as the
BOM-sniffed encoding


_______________________________________________
support-seamonkey mailing list
[email protected]
https://lists.mozilla.org/listinfo/support-seamonkey

Re: UTF-16 Encoding doesn't work

Reply via email to