Dean Roddey wrote: >Be aware that, as the code stands right now, if you force the encoding on >an entity, all internal smarts about the encoding are skipped. So, if you >force the encoding to UTF-16LE, and there is a BOM, the parser won't try to >skip it, and the parse will fail.
Yech. Barf. One practical consequence is that in most cases, it will be a bad idea to try to override. And the case of UTF-16LE (or BE) is particurly troubled. There was passionate debate because some people over in IETF wanted to make the BOM *forbidden* for UTF-16LE and BE; other people felt that it could not possibly ever be a bad idea to put a BOM on any UTF-16, and thus this would mean that the LE and BE variants practically speaking couldn't be used as media-types for XML. Don't know how that one eventually settled out. Having said that, you're probably doing the right thing. There is a use-case, not sure how strong: some webserver out there does transcoding, say from EUC to Shift-JIS (I'm told this actually happens) without of course fixing up the XML declaration so you get an XML declaration that is actually wrong and will probably cause your parse to crash & burn unless ignored. Of course you can get around this by using application/xml as the media type (no transcoding allowed) or even better, by not transcoding in the server. In the general case, the best thing to do is to leave the parser alone. With luck, the need for this escape hatch will be relatively short-lived. -T.
