[EMAIL PROTECTED] wrote:
We are talking about charset value for the internet protocol here. It is a special narrow field of charset name. The value used by Internet protocol are defined by a well defined process- http://www.faqs.org/rfcs/rfc2278.html RFC 2278 - IANA Charset Registration Procedures
"well defined process" is a stretch. By the way, RFC 2978 replaced 2278 a few years ago.
I think exactly the same way as Markus on all his points. It's regrettable but true that all sort of garbages (in addition to well-defined useful character encodings) were thrown into it and were accepted almost blindly. In other words, there's a serious quality control issue.
The problem with the IANA charset _list_ is that it lists not only useful charset names but also
- names that are illegal for charsets by its own rules
- names for things that are not (verifiably) charsets at all
Markus may have something different in mind, but I'd add to this category coded character set names like ks_c_5601-1987 that are not suitable for use as MIME charset.
- names for charsets that cannot be implemented reliably because there is no online, machine-readable specification for them, and even for the ones where there is one, it is not usually a mapping to/from Unicode
Related to the last point is that some charsets are not properly annotated.
Jungshik

