Re: Applying Postel's Law to XML, from a Unicode perspective?

Daniel Bünzli Sun, 28 Jun 2015 06:30:08 -0700

Le dimanche, 28 juin 2015 à 13:31, Costello, Roger L. a écrit :
> Can you think of Unicode errors in inbound XML documents that a web service 
> might be willing to accept?


It depends a bit on your use case and setting (e.g. on the web, security may 
need to be taken into account), but one thing that could be done is to not have 
hard failures on character stream decoding errors but simply notify the user of 
the problem and continue by replacing the offending bytes by the Unicode 
replacement character U+FFFD until you manage to resynchronize the UTF-{8,16} 
byte stream and see if you manage to still get the parsing done.  

In practice such semi-broken XML documents can be produced by the export 
procedures of legacy software which fail to correctly encode some of the more 
special characters they have in another legacy encoding. It's better to 
eventually correct these documents and as such this should not be done 
*silently*, but it's nicer to the user if your import procedures are 
"best-effort" and can recover from these kinds of error conditions.

Best,

Daniel

Re: Applying Postel's Law to XML, from a Unicode perspective?

Reply via email to