DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=4456>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=4456 Not all correct encoding is supported ------- Additional Comments From [EMAIL PROTECTED] 2001-10-28 05:14 ------- Actually, I can find no occurrances of the string "ISO8859-1" in either the Xerces 1 or Xerces 2 source bases. However, I understand the confusion. The preferred IANA character set name for ISO Latin 1 is "ISO-8859-1". This is also the preferred mime name. The Java JDK has encoding names of "ISO8859_1", or simply "8859_1", for this character set. The MIME2Java class in Xerces 1, and the EncodingMap class in Xerces 2, provide mapping services between IANA names and the associated Java encoding name. This is because using a Java encoding name in an XML document is likely to limit the set of XML parsers that are able to process those documents to the ones that are written in Java, and XML certainly does not rely upon the runtime services defined by a particular implementation language like Java, but rather on authorities like IANA. I also feel obligated to point out that there is an ill considered feature in Xerces named "http://apache.org/xml/features/allow-java-encodings" that will use the encoding declaration in the document directly as a Java encoding name. While this will allow the parser to read documents with incorrect encoding declarations, they will no longer be portable if you do so. Requiring recipients of non-portable documents to use non-portable features to read them is not really in the spirit of what XML is all about now is it ? I would strongly urge you to explain this distinction and the inherent non-portability of the documents you are receiving to the producer of these documents. It is likely that they are simply unaware that they are using an encoding that is not registered with IANA as a character set name and is therefore not reliably recognized by XML processors. In fact, since UTF-8 and UTF-16 are the only encodings REQUIRED to be understood by all conformant XML processors, even ISO-8859-1 would technically be on shaky ground if not for the fact that it is in such widespread use that every reasonable XML processor supports it. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
