I read the RFC 2279 again ( http://www.cis.ohio-state.edu/cs/Services/rfc/rfc-text/rfc2279.txt )
1. I cannot find any text in it mentioned about. non short form is invalid UTF8, and
2. It mentioned about 1-6 octets of UTF8
3. It mentioned about how to encode surrogate pair to UTF-8. But it does not say the UTF8 sequence mapping directly to Surrogate High and Surrogate Low are illegal

I remember in last couple year the definitation of UTF-8 is changing from 1-6 bytes to 1-4 octets because the decision of the future roadmap of Unicode/ISO 10646.

Here is my question;
1. Is there an updated RFC obsoleted RFC 2279 ? (I cannot find it, if we have one, what is the number? and URL)
2. Is there a formal speciification talk about non short form is illegal in UTF8 (the RFC2279 mentioned very lightly, but does not formal specify that is illegal. It only mentioned that are security concern) and directly encode Surrogate is illegal? or maybe the language in RFC2279 is good enough.
3. Is there a formal specification mentioned that UTF-8 is only 1-4 octects and therefore update the part the RFC2279 mentioned 1-6 octects?

Thanks



Reply via email to