> Glenn wrote:
> Actually, the XML spec doesn't require that a processor recognize
> anything other than UTF-8 and UTF-16,
And that's it.
>so we are being quite nice
> in supporting the IANA registered names.
Why are we being less nice ? :-) Why only IANA registered names, when XML 1.0
spec. itself doesn't make it mandatory and moreover we are not completely
solving any problem. Please read my comments below...
> I think that promoting *by default* the use of encoding names that
> are a property of which JDK you are using is a bad situation when
> it comes to document interoperability.
I think, processing *by default* IANA registered encodings other than UTF-8 and
UTF-16, we are deep diving into the whole issue of document interoperablitiy.
XML processor is only required to support UTF-8 and UTF-16, so it is very well
possible that any XML document using IANA registered encoding is parsed
sucessfully on Xerces2 and is not parsed on other processor because it may not
support that or vice versa. So the problem we are trying to solve is still
unresolved. Ideally, We should first take user permission before we make any
attempt to parse such (other than UTF-8, UTF-16) documents to make sure that
application/user is aware that by doing so it is tying itself to processor based
feature and may not obtain the desired result when shifting to other processor.
I think the very reason for doing this because
a) their are encodings available other than those in real world and there are
real applications which rely on those encodings because their requirement goes
far beyond UTF-8 and UTF-16. And at that time those documents are not
interoperable any more. Application is relying on the individaul parser
capablitiy to process their "X" encoding and is very well aware that same may
not be the case when shifting to other parser.
b) XML specification gives leverage to processor to have capability to read
documents other than UTF-8, UTF-16 and says it is _desired_ for XML processors
to read those entities.
It leaves upto processor to check if it is able to process specific
encoded document or not ? When it comes to ability of individual parser to
process specific encoded document, why does Xerces2 limits itself to IANA ? We
are not completely addressing the problem and solving anything by doing so. I
think when application has decided to rely on parser based feature, thus losing
interoperablity, a parser only processing IANA encodings is of no interest to
it. As per XML spec.. "It is a fatal error when an XML processor encounters an
entity with an encoding that it is unable to process." Let Xerces2 check before
throwing fatal error if it is really cabable of parsing specific encoded
document. Let the parser exploit its capablity to its full extent.
I am not sure here about the behavior, probably, Xerces2 may give warning to the
application when it attempts to process any document other than UTF-8 and
UTF-16.
Neeraj
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]