Le dimanche, 28 juin 2015 à 13:31, Costello, Roger L. a écrit :
> Can you think of Unicode errors in inbound XML documents that a web service
> might be willing to accept?
It depends a bit on your use case and setting (e.g. on the web, security may
need to be taken into account), but one thing that could be done is to not have
hard failures on character stream decoding errors but simply notify the user of
the problem and continue by replacing the offending bytes by the Unicode
replacement character U+FFFD until you manage to resynchronize the UTF-{8,16}
byte stream and see if you manage to still get the parsing done.
In practice such semi-broken XML documents can be produced by the export
procedures of legacy software which fail to correctly encode some of the more
special characters they have in another legacy encoding. It's better to
eventually correct these documents and as such this should not be done
*silently*, but it's nicer to the user if your import procedures are
"best-effort" and can recover from these kinds of error conditions.
Best,
Daniel