RE: Encoding/Use of pontial unpaired UTF-16 surrogate pair specifiers

Shawn Steele Sun, 31 Jan 2016 11:55:15 -0800

It should be understood that any algorithm that changes the Unicode character 
data to non-character data is therefore binary, and not Unicode.  It's 
inappropriate to shove binary data into unicode streams because stuff will 
break.
https://blogs.msdn.microsoft.com/shawnste/2005/09/26/avoid-treating-binary-data-as-a-string/



-----Original Message-----
From: Unicode [mailto:[email protected]] On Behalf Of Chris Jacobs
Sent: Sunday, January 31, 2016 10:08 AM
To: J Decker <[email protected]>
Cc: [email protected]
Subject: Re: Encoding/Use of pontial unpaired UTF-16 surrogate pair specifiers



J Decker schreef op 2016-01-31 18:56:
> On Sun, Jan 31, 2016 at 8:31 AM, Chris Jacobs <[email protected]>
> wrote:
>> 
>> 
>> J Decker schreef op 2016-01-31 03:28:
>>> 
>>> I've reconsidered and think for ease of implementation to just mask 
>>> every UTF-16 character (not  codepoint) with a 10 bit value, This 
>>> will result in no character changing from BMP space to 
>>> surrogate-pair or vice-versa.
>>> 
>>> Thanks for the feedback.
>> 
>> 
>> So you are still trying to handle the unarmed output as plaintext.
>> Do you realize that if a string in the output is replaced by a 
>> canonical equivalent one this may mess up things because the 
>> originals are not canonical equivalent?
>> 
> I see ... things like mentioned here
> http://websec.github.io/unicode-security-guide/character-transformatio
> ns/

Yes especially the part about normalization.
This would not only spoil the normalized string, but also, as the string can 
have a different length, for anything after that your ever-changing xor-values 
may go out of sync.

RE: Encoding/Use of pontial unpaired UTF-16 surrogate pair specifiers

Reply via email to