Unicode 3.1: UTF-8

John Cowan Wed, 31 Jan 2001 11:44:13 -0800

I propose that the distinction between illegal and irregular UTF-8
code sequences (D36bc) be eliminated.  Since there are no code points
between U+D7FF and U+E000 (the apparently intervening code points
are UTF-16 code units, but not Unicode code points)
the corresponding UTF-8 code sequences should be illegal.

This can be achieved by replacing the U+1000..U+FFFF row in
Table 3.1B as follows:

U+1000..U+CFFF   E1..EC   80..BF   80..BF
U+D000..U+D7FF   ED       80..9F   80..BF   [9F underscored]
U+E000..U+FFFF   EE       80..BF   80..BF

-- 
There is / one art             || John Cowan <[EMAIL PROTECTED]>
no more / no less              || http://www.reutershealth.com
to do / all things             || http://www.ccil.org/~cowan
with art- / lessness           \\ -- Piet Hein

Unicode 3.1: UTF-8

Reply via email to