DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5085>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5085 Reporting of externally specified encodings ------- Additional Comments From [EMAIL PROTECTED] 2001-11-27 16:28 ------- The problem is that the startDocument/startEntity does NOT currently pass the actualEncoding: it passes the auto-detected encoding (i.e. the encoding it detects from looking at the first 4 bytes). For example, if I have a document with an internally specified encoding <?xml version="1.0" encoding="iso-8859-1"?> startDocument/startEntity the auto-detected encoding will be UTF-8 whereas the actualEncoding is ISO-8859-1. At the point where Xerces calls startEntity it hasn't read the XML or text decl so it doesn't know the actual encoding in the case of an internally specified encoding. One fix would to change the semantics of the encoding parameter of startDocument/startEntity to be the actualEncoding as you suggest. This would be fine. In fact, I think it would be the best behaviour. But it would involve a shift in responsibilities between the entity manager and the scanner. At the moment, the entity manager does the auto-detection and calls startEntity, while the scanner parses the XML or text decl and tells the entity manager to change the encoding accordingly. On the other hand having startDocument/startEntity report whether the encoding specified was the auto-detected encoding or the actual encoding could be done without any such shift. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
