Hi All,
        
                As of right now, Xerces2 doesn't recognize JAVA encoding names 
by default. Xerces2 assumes that encoding names *must* be IANA encoding names 
and throws fatal error in other cases. 
                
As per XML 1.0 specification, 
                http://www.w3.org/TR/REC-xml#sec-entity-decl
<snip>
"It is recommended that character encodings registered (as charsets) with the 
Internet Assigned  Numbers Authority [IANA-CHARSETS], other than those just 
listed, be referred to using their registered names." 
</snip>

The spec recommends that IANA names be used, but does not *require* it. 
IMO, behavior of Xerces2 throwing "Fatal Error" is very stringent or i would say 
not right, as encoding names to be IANA names is not among the required behavior 
of parser. If required, parser may report warning to the application. Throwing 
fatal error stops the further processing of document and that document can not 
be processed, even when XML 1.0 specification provides flexibility (encourages) 
for the processors to be able to determine the encoding externally. It is easy 
for processor to know if particular encoding is supported by underlying JVM.

Also, XML 1.0 specification states that "It is recognized that other encodings 
are used around the world, and it may be desired for XML processors to read 
entities that use them."  If one reads the E23 section of erratum.. 
                http://www.w3.org/XML/xml-V10-2e-errata

"It was always the intent of the XML 1.0 spec to allow the character encoding to 
be determined externally." Considering this it would be fair assumption on part 
of user for Xerces2 to be able to process XML documents using JAVA encoding anme 
when it is supported by underlying JVM. It should not be restricted to JAVA 
encodings and can be any encoding 'X' for which custom readers may be written 
(or somehow made available ) for Xerces2 as support for other international 
encoding is provided. Processor should throw fatal error if parser is still not 
able to process an entity with particular encoding as required by XML 1.0 
specification.

IMO, Xerces2 behavior be changed to accept JAVA encodings as supported by 
underlying JVM.

What do other developers and members of community think ?


Thanks

Neeraj


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to