Dean Roddey wrote:
>Be aware that, as the code stands right now, if you force the encoding on
>an entity, all internal smarts about the encoding are skipped. So, if you
>force the encoding to UTF-16LE, and there is a BOM, the parser won't try to
>skip it, and the parse will fail. 

Yech.  Barf.  One practical consequence is that in most cases, it will
be a bad idea to try to override.  And the case of UTF-16LE (or BE) is 
particurly troubled.  There was passionate debate because some people
over in IETF wanted to make the BOM *forbidden* for UTF-16LE and BE; other 
people felt that it could not possibly ever be a bad idea to put a BOM on 
any UTF-16, and thus this would mean that the LE and BE variants practically 
speaking couldn't be used as media-types for XML.  Don't know how that one
eventually settled out.

Having said that, you're probably doing the right thing.  There is
a use-case, not sure how strong: some webserver out there does transcoding,
say from EUC to Shift-JIS (I'm told this actually happens) without of
course fixing up the XML declaration so you get an XML declaration that
is actually wrong and will probably cause your parse to crash & burn
unless ignored.  Of course you can get around this by using application/xml 
as the media type (no transcoding allowed) or even better, by not transcoding
in the server.

In the general case, the best thing to do is to leave the parser
alone.  With luck, the need for this escape hatch will be relatively
short-lived. -T.

Reply via email to