Re: Java encoding names

Neeraj Bajaj Mon, 27 May 2002 07:08:19 -0700

> Glenn wrote:
> Actually, the XML spec doesn't require that a processor recognize
> anything other than UTF-8 and UTF-16,


And that's it. 

>so we are being quite nice
> in supporting the IANA registered names.  

Why are we being less nice ? :-)  Why only IANA registered names, when XML 1.0 
spec. itself doesn't make it mandatory and moreover we are not completely 
solving any problem. Please read my comments below...

> I think that promoting *by default* the use of encoding names that
> are a property of which JDK you are using is a bad situation when
> it comes to document interoperability. 

I think, processing *by default* IANA registered encodings other than UTF-8 and 
UTF-16, we are deep diving into the whole issue of document interoperablitiy.
XML processor is only required to support UTF-8 and UTF-16, so it is very well 
possible that any XML document using IANA registered encoding is parsed 
sucessfully on Xerces2 and is not parsed on other processor because it may not 
support that or vice versa. So the problem we are trying to solve is still 
unresolved. Ideally, We should first take user permission before we make any 
attempt to parse such (other than UTF-8, UTF-16) documents to make sure that 
application/user is aware that by doing so it is tying itself to processor based 
feature and may not obtain the desired result when shifting to other processor.

I think the very reason for doing this because 
        
a) their are encodings available other than those in real world and there are 
real applications which rely on those encodings because their requirement goes 
far beyond UTF-8 and UTF-16. And at that time those documents are not 
interoperable any more. Application is relying on the individaul parser 
capablitiy to process their "X" encoding and is very well aware that same may 
not be the case when shifting to other parser.
        
b) XML specification gives leverage to processor to have capability to read 
documents other than UTF-8, UTF-16 and says it is _desired_ for XML processors 
to read those entities. 

        It leaves upto processor to check if it is able to process specific 
encoded document or not ? When it comes to ability of individual parser to 
process specific encoded document, why does Xerces2 limits itself to IANA ? We 
are not completely addressing the problem and solving anything by doing so. I 
think when application has decided to rely on parser based feature, thus losing 
interoperablity, a parser only processing IANA encodings is of no interest to 
it.  As per XML spec.. "It is a fatal error when an XML processor encounters an 
entity with an encoding that it is unable to process." Let Xerces2 check before 
throwing fatal error if it is really cabable of parsing specific encoded 
document. Let the parser exploit its capablity to its full extent.

I am not sure here about the behavior, probably, Xerces2 may give warning to the 
application when it attempts to process any document other than UTF-8 and 
UTF-16. 


Neeraj


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Java encoding names

Reply via email to