From: Frank Yung-Fong Tang wrote:

> It should be:
> Legal UTF-8 sequences are:
> 1st---- 2nd---- 3rd---- 4th---- Codepoints---
> 00-7F                             0000-  007F
> C2-DF   80-BF                     0080-  07FF
> E0      A0-BF   80-BF             0800-  0FFF
> E1-EC   80-BF   80-BF             1000-  CFFF
> ED      80-9F   80-BF             D000-  D7FF
> EE-EF   80-BF   80-BF             E000-  FFFF
> F0      90-BF   80-BF   80-BF    10000- 3FFFF
> F1-F3   80-BF   80-BF   80-BF    40000- FFFFF
> F4      80-8F   80-BF   80-BF   100000-10FFFF
 
However I feel it's not legal (or really not recommanded) to encode non-character codepoints xFFFE-xFFFF where x is any plane number. So the rules need to be a bit more detailed to exclude them.
 
Are these permanently assigned non-characters encodable in any UTF or in CESU-8?
 

Reply via email to