Re: Unicode 3.1: UTF-8

David Starner Wed, 31 Jan 2001 21:40:07 -0800
On Wed, Jan 31, 2001 at 11:18:37AM -0800, John Cowan wrote:
> I propose that the distinction between illegal and irregular UTF-8
> code sequences (D36bc) be eliminated.  Since there are no code points
> between U+D7FF and U+E000 (the apparently intervening code points
> are UTF-16 code units, but not Unicode code points)
> the corresponding UTF-8 code sequences should be illegal.
> 
> This can be achieved by replacing the U+1000..U+FFFF row in
> Table 3.1B as follows:
> 
> U+1000..U+CFFF   E1..EC   80..BF   80..BF
> U+D000..U+D7FF   ED       80..9F   80..BF   [9F underscored]
> U+E000..U+FFFF   EE       80..BF   80..BF

Do other people use irregular sequences? I'm forced to. I have to use them
in some UTF-8 source code to sneak them past a compiler that translates that
UTF-8 into UCS-2, so I can convert them into proper UTF-8 and whatever else
in my program. I'm not sure that the occaional utility of using them to
work around older systems doesn't balance the (very rare) problems from
using them.

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
Re: Unicode 3.1: UTF-8

Reply via email to