RE: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

Shawn Steele Wed, 04 Jun 2014 11:12:02 -0700

The BOM I've seen (not FFFE though), it's prevalence depends on the system and 
other factors.


The others I only see if there's corruption, bugs, or tests.  The most common 
error I see that causes those is when some developer calls a binary blob a 
unicode string and tries to shove it through a text transport or something.  
Usually that bites them sooner or later.

-Shawn

-----Original Message-----
From: Unicode [mailto:[email protected]] On Behalf Of Doug Ewell
Sent: Wednesday, June 4, 2014 11:01 AM
To: [email protected]
Subject: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

How common is it to see any of the following in real-world Unicode text, as 
opposed to code charts and test suites and the like?

1. Unpaired surrogates
2. Noncharacters (besides CLDR data)
3. U+FEFF at the beginning of a stream (note: not "packet" or arbitrary cutoff 
point)

I'm not asking whether any of these are recommended or "prohibited" or whether 
they are a good idea. I'm asking about actual usage.

--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell


_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

RE: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

Reply via email to