The BOM I've seen (not FFFE though), it's prevalence depends on the system and other factors.
The others I only see if there's corruption, bugs, or tests. The most common error I see that causes those is when some developer calls a binary blob a unicode string and tries to shove it through a text transport or something. Usually that bites them sooner or later. -Shawn -----Original Message----- From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Doug Ewell Sent: Wednesday, June 4, 2014 11:01 AM To: unicode@unicode.org Subject: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE) How common is it to see any of the following in real-world Unicode text, as opposed to code charts and test suites and the like? 1. Unpaired surrogates 2. Noncharacters (besides CLDR data) 3. U+FEFF at the beginning of a stream (note: not "packet" or arbitrary cutoff point) I'm not asking whether any of these are recommended or "prohibited" or whether they are a good idea. I'm asking about actual usage. -- Doug Ewell | Thornton, CO, USA http://ewellic.org | @DougEwell _______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode _______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode