Sean,

Your XML file is not UTF-8 encoded, it is plain Unicode. At least the way it is 
served from the URL you gave.

(('http://forum.world.st/file/n4908531/illegal-UTF-sms.xml' asUrl 
retrieveContents) at: 72 ) = 160 asCharacter. 

  "true"

Like you said,

160 asCharacter asString utf8Encoded. 

  "#[194 160]"

But

#[ 160 ] utf8Decoded.

  Boom!

You specify UTF-8 encoding inside your XML, I assume the parser then switches 
to that encoding, but your pure Unicode contents is not UTF-8 encoded and 
results in an exception. You see ?

Sven

> On 28 Jul 2016, at 22:05, Sean P. DeNigris <s...@clipperadams.com> wrote:
> 
> monty-3 wrote
>> Just to be sure, I manually recreated your file (with the great Bless hex
>> editor) and parsed it with no issue.
> 
> Thanks!
> 
> 
> monty-3 wrote
>> Please post your code and attach the actual source as a file separately.
> 
> The code is merely:
>  messageLog := FileLocator home / 'illegal-UTF-sms.xml'. 
>  doc := XMLDOMParser parse: messageLog.
> 
> File:  illegal-UTF-sms.xml
> <http://forum.world.st/file/n4908531/illegal-UTF-sms.xml>  
> 
> 
> 
> -----
> Cheers,
> Sean
> --
> View this message in context: 
> http://forum.world.st/XMLParser-Claims-U-00A0-is-Invalid-UTF-8-tp4908525p4908531.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
> 


Reply via email to