DO NOT REPLY [Bug 5085] - Reporting of externally specified encodings

bugzilla Tue, 27 Nov 2001 16:28:39 -0800

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5085>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5085

Reporting of externally specified encodings





------- Additional Comments From [EMAIL PROTECTED]  2001-11-27 16:28 -------
The problem is that the startDocument/startEntity does NOT currently pass the 
actualEncoding: it passes the auto-detected encoding (i.e. the encoding it 
detects from looking at the first 4 bytes).  For example, if I have a document 
with an internally specified encoding

  <?xml version="1.0" encoding="iso-8859-1"?>

startDocument/startEntity the auto-detected encoding will be UTF-8 whereas the 
actualEncoding is ISO-8859-1.  At the point where Xerces calls startEntity it 
hasn't read the XML or text decl so it doesn't know the actual encoding in the 
case of an internally specified encoding.

One fix would to change the semantics of the encoding parameter of 
startDocument/startEntity to be the actualEncoding as you suggest.  This would 
be fine.  In fact, I think it would be the best behaviour. But it would involve 
a shift in responsibilities between the entity manager and the scanner. At the 
moment, the entity manager does the auto-detection and calls startEntity, while 
the scanner parses the XML or text decl and tells the entity manager to change 
the encoding accordingly.  On the other hand having startDocument/startEntity 
report whether the encoding specified was the auto-detected encoding or the 
actual encoding could be done without any such shift.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 5085] - Reporting of externally specified encodings

Reply via email to