Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding'

2017-10-13 Thread Stephane Ducasse
o.uk> >> To: pharo-users@lists.pharo.org >> Subject: Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 >> encoding' >> >> Correction - I am misrepresenting Sven. What he said was that Zinc would not >> look inside the HTML node to find out

Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding'

2017-10-10 Thread monty
I know what the problem is and will have it fixed shortly. Thanks for the report. > Sent: Monday, October 09, 2017 at 9:03 AM > From: "Peter Kenny" <pe...@pbkresearch.co.uk> > To: pharo-users@lists.pharo.org > Subject: Re: [Pharo-users] Problem with input t

Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding'

2017-10-09 Thread Peter Kenny
Correction - I am misrepresenting Sven. What he said was that Zinc would not look inside the HTML node to find out about coding. It would of course use information in the HTTP headers, if any. Peter Kenny wrote > Henry > > Thanks for the explanations. It's a bit clearer now. I'm still not sure

Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding'

2017-10-09 Thread Peter Kenny
Henry Thanks for the explanations. It's a bit clearer now. I'm still not sure about how ZnUrl>>retrieveContents manages to decode correctly in this case; I'm sure I recall Sven saying it didn't (and in his view shouldn't) look at the HTTP declarations in the header. There is also the mystery of

Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding'

2017-10-09 Thread Henrik Sperre Johansen
In a class named XMLHTMLParser, you may expect that logic to be expanded a bit beyond the basic XML spec though. But since there are multiple potentially correct definitions, there will always be failure cases. Not to mention, in addition to XML/HTTP, HTML4/5 also define (different) meta tags for

Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding'

2017-10-09 Thread Henrik Sperre Johansen
XML expects a prolog in the document itself defining the encoding, if absent, the standard specifies utf-8. So when you use an XML parser to parse an HTML page, it will disregard any HTTP encodings, interpret the contents as an XML document with missing prolog, and try to parse as utf8. When you

Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding'

2017-10-09 Thread Peter Kenny
Note: This was sent on Sunday at 19.45 but seems to have disappeared on its way to pharo users. Re-sent just to complete the story. _ Paul Good to have found the charset discrepancy - that may have something to do with it. But I don't think it

Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding'

2017-10-08 Thread PBKResearch
: Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding' in the HEAD tag of that page with the article they declare it is ISO-8859-1 and not UTF-8. In the page they have a C’è The little back-tick next to the C is UTF8 8217 (http://www.codetable.net/decimal/8217) So

Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding'

2017-10-08 Thread Paul DeBruicker
in the HEAD tag of that page with the article they declare it is ISO-8859-1 and not UTF-8. In the page they have a C’è The little back-tick next to the C is UTF8 8217 (http://www.codetable.net/decimal/8217) So their encoding is messed up, and maybe the XMLHTMLParser should throw a warning

[Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding'

2017-10-08 Thread PBKResearch
In another thread (on SVG Icons) Sven referred to ways of getting input from a URL for XMLDOMParser. I have recently had some problems doing this. I have found a workaround, so it is not urgent, but I thought I should put it on record in case anyone else is bitten by it, and so maybe Monty can