I want to make sure that people are not mislead by that paper. There is a note below
that section that:
"Note: The italicized names are not yet registered, but are useful for reference."
and "UTF-8N" is italicized. It is not a registered name, and should not be used
outside of a closed system.
The reason I make that notational distinction in the text is that there is a danger
with UTF-8 currently: BOM can be used with it, and some people do. Since, unlike the
case of UTF-16 / UTF-16BE / UTF-16LE, there is no way to distinguish between
implementations that allow a BOM and those that don't, the situation is slightly
unstable: if you find EF BB BF at the start of a UTF-8 file, you don't know whether to
delete it or not.
In XML, this situation does not arise, since it specifies the exact useage of BOM, but
it can arise in other circumstances.
Mark
Masahiko Maedera wrote:
> I found UTF-8N in the following URL.
>
> www-4.ibm.com/software/developer/library/utfencodingforms/index.html
>
> I have understood the meaning and the format of UTF-8N.
> But I don't make sure how it will be treated in future.
>
> Does anyone have plan to regist new charset UTF-8N,
> or any other information about it?
>
> Thank you in advance.
>
> --
> Masahiko Maedera.