Posted to StackOverflow
(https://stackoverflow.com/questions/38645553/xmlparser-in-pharo-claims-u00a0-is-invalid-utf-8):



Given the input:

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<sms body=". what" />

Where the character after the "." in the body attribute of the sms tag is
U+00A0;

I get the error:

    XMLEncodingException: Invalid UTF-8 character encoding (line 2) (column
13)

IIUC, the UTF-8 representation of that character is 0xC2 0xA0 per Wikipedia.
Sure enough, bytes 72 and 73 of the input are 194 and 160 respectively.

This seems like a bug in XMLParser, or am I missing something?




-----
Cheers,
Sean
--
View this message in context: 
http://forum.world.st/XMLParser-Claims-U-00A0-is-Invalid-UTF-8-tp4908525.html
Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.

Reply via email to