DO NOT REPLY [Bug 4456] - Not all correct encoding is supported

bugzilla Sun, 28 Oct 2001 04:43:08 -0800

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=4456>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=4456

Not all correct encoding is supported





------- Additional Comments From [EMAIL PROTECTED]  2001-10-28 05:14 -------
Actually, I can find no occurrances of the string "ISO8859-1" in either the 
Xerces 1 or Xerces 2 source bases.  However, I understand the confusion.  The 
preferred IANA character set name for ISO Latin 1 is "ISO-8859-1".  This is also 
the preferred mime name.  The Java JDK has encoding names of "ISO8859_1", or 
simply "8859_1", for this character set.  The MIME2Java class in Xerces 1, and 
the EncodingMap class in Xerces 2, provide mapping services between IANA names 
and the associated Java encoding name.  This is because using a Java encoding 
name in an XML document is likely to limit the set of XML parsers that are able 
to process those documents to the ones that are written in Java, and XML 
certainly does not rely upon the runtime services defined by a particular 
implementation language like Java, but rather on authorities like IANA.

I also feel obligated to point out that there is an ill considered feature in 
Xerces named "http://apache.org/xml/features/allow-java-encodings"; that will use 
the encoding declaration in the document directly as a Java encoding name.  
While this will allow the parser to read documents with incorrect encoding 
declarations, they will no longer be portable if you do so.  Requiring 
recipients of non-portable documents to use non-portable features to read them 
is not really in the spirit of what XML is all about now is it ?

I would strongly urge you to explain this distinction and the inherent 
non-portability of the documents you are receiving to the producer of these 
documents.  It is likely that they are simply unaware that they are using an 
encoding that is not registered with IANA as a character set name and is 
therefore not reliably recognized by XML processors.  In fact, since UTF-8 and 
UTF-16 are the only encodings REQUIRED to be understood by all conformant XML 
processors, even ISO-8859-1 would technically be on shaky ground if not for the 
fact that it is in such widespread use that every reasonable XML processor 
supports it.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 4456] - Not all correct encoding is supported

Reply via email to