|
From: Frank Yung-Fong Tang wrote:
> It should be: > Legal UTF-8 sequences are:
> 1st---- 2nd---- 3rd---- 4th----
Codepoints---
>
00-7F
0000- 007F
> C2-DF
80-BF
0080- 07FF
> E0
A0-BF
80-BF
0800- 0FFF
> E1-EC 80-BF
80-BF
1000- CFFF
> ED
80-9F
80-BF
D000- D7FF
> EE-EF 80-BF
80-BF
E000- FFFF
> F0
90-BF 80-BF 80-BF 10000-
3FFFF
> F1-F3 80-BF
80-BF 80-BF 40000- FFFFF
> F4
80-8F 80-BF 80-BF 100000-10FFFF
However I feel it's not legal (or really not
recommanded) to encode non-character codepoints xFFFE-xFFFF where x is any
plane number. So the rules need to be a bit more detailed to exclude
them.
Are these permanently assigned non-characters
encodable in any UTF or in CESU-8?
|
- problems in Public Review 33 UTF Conversion Code U... Frank Yung-Fong Tang
- Philippe Verdy

