You're double decoding. Use onFileNamed:/parseFileNamed: instead (and the DOM printToFileNamed: family of messages when writing) and let XMLParser take care this for you, or disable XMLParser decoding before parsing with #decodesCharacters:.
Longer explanation: The class #on:/#parse: take either a string or a stream (read the definitions). You gave it a FileReference, but because the argument is tested with isString and sent #readStream otherwise, it didn't blowup then. File refs sent #readStream return file streams that do automatic decoding. But XMLParser automatically attempts its own decoding too, if: The input starts with a BOM or it can be inferred by null bytes before or after the first non-null byte. There is an encoding declaration with a non-UTF-8 encoding. There is a UTF-8 encoding declaration but the stream is not a normal ReadStream (your case). So it gets decoded twice, and the decoded value of the char causes the error. I'll consider changing the heuristic to make less eager to decode. > Sent: Thursday, July 28, 2016 at 4:05 PM > From: "Sean P. DeNigris" <[email protected]> > To: [email protected] > Subject: Re: [Pharo-users] XMLParser Claims U+00A0 is “Invalid UTF-8” > > monty-3 wrote > > Just to be sure, I manually recreated your file (with the great Bless hex > > editor) and parsed it with no issue. > > Thanks! > > > monty-3 wrote > > Please post your code and attach the actual source as a file separately. > > The code is merely: > messageLog := FileLocator home / 'illegal-UTF-sms.xml'. > doc := XMLDOMParser parse: messageLog. > > File: illegal-UTF-sms.xml > <http://forum.world.st/file/n4908531/illegal-UTF-sms.xml> > > > > ----- > Cheers, > Sean > -- > View this message in context: > http://forum.world.st/XMLParser-Claims-U-00A0-is-Invalid-UTF-8-tp4908525p4908531.html > Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com. > >
