> Sent: Wednesday, March 15, 2017 at 2:16 PM > From: "Ben Coman" <[email protected]> > To: "Pharo Development List" <[email protected]> > Subject: Re: [Pharo-dev] ZnInvalidUTF8 on response from squeaksource > > On Thu, Mar 16, 2017 at 1:25 AM, Sven Van Caekenberghe <[email protected]> wrote: > > > > Hi, > > > > This is a recurring issue. > > > It would be cool if some magic(TM) could raise a dialog with an > explanation and pull-down list to select an encoding - but maybe that > is too much hand holding.
That's an interesting idea. > > > > > The problem is that the server serves a resource, in this case text/html, > > without specifying its encoding. > > I just bumped into [1] while browsing around to learn more, but I > don't know fully how to interpret it. > What do you make of it saying "An XHTML5 document is served as XML and > has XML syntax. XML parsers do not recognise the encoding declarations > in meta elements. They only recognise the XML declaration. Here is an > example: > <?xml version="1.0" encoding="utf-8"?> > <!DOCTYPE html .... > > compared to the page having... > <?xml version="1.0" encoding="iso-8859-1"?> > > cheers -ben That isn't Zinc's responsibility; it just handles HTTP. The HTML or XML parser using it should disable Zinc's automatic decoding based on Content-Type and do its own decoding of the raw response (which can still be done using Zinc's decoders) informed by the content of the response and not just its Content-Type. XMLParser and XMLParserHTML both use Zinc this way. > [1] > https://www.w3.org/International/questions/qa-html-encoding-declarations > > > > > > Today, when no encoding is specified, we default to UTF-8. In this case the > > server silently serves a resource which is ISO-8895-1 encoded. > > > > The error is triggered by accessing the following URL: > > > > ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself. > > > > If you inspect the response object inside the http client, you will see > > that the content-type is text/html. So Zn parses the incoming text using > > UTF-8 which fails (Zn encoders are strict by default). > > > > Here is how to change the default during a call: > > > > ZnDefaultCharacterEncoder > > value: ZnCharacterEncoder iso88591 > > during: [ ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; > > yourself ]. > > > > The solution would be that the server adds the proper charset specification. > > > > Consider the default in Pharo: > > > > ZnMimeType textHtml => text/html;charset=utf-8 > > > > The server should serve this resource using the following Content-Type: > > > > text/html;charset=iso-8859-1 > > > > This is the server's responsibility. The page in question is the MC index > > page, which would normally be dynamically generated. Somewhere the server > > decides on the encoding. That encoding does not have to change, but it > > should be properly indicated in the HTTP response headers. > > > > HTH, > > > > Sven > > > > > On 15 Mar 2017, at 17:42, David T. Lewis <[email protected]> wrote: > > > > > > squeaksource.com is still running on a quite old image, and I know that it > > > has problems with multibyte characters. If you are seeing problems related > > > to this, it's not the fault of Zinc. > > > > > > If you can confirm that this is what is happening, then I guess it is time > > > to update that trusty old squeaksource.com image :-) > > > > > > Dave > > > > > >> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[email protected]> wrote: > > >>> > > >>> Hi everyone, > > >>> > > >>> I have been working on bringing http://squeaksource.com/ical/ up to > > >>> speed > > >>> for Squeak and wanted to make sure that it also works for Pharo. > > >> Therefore, > > >>> I have created a travis build job for Squeak and Pharo > > >>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which > > >> pulls > > >>> the source from squeaksource.com. > > >>> > > >>> Now the issue is that loading the package in Pharo fails with a > > >>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this > > >>> might be the result of the squeaksource page delivering the page as > > >>> iso-8859-1 as it contains special characters. Any ideas on how to get > > >>> this > > >>> to work? I do not have access to the ical repository description and I > > >> would > > >>> like to avoid mirroring the whole repository on GitHub. > > >> > > >> > > >> In a fresh 60437 image, in Playground evaluating... > > >> > > >> Metacello new > > >> configuration: 'ICal'; > > >> repository: 'github://codeZeilen/ical-smalltalk:master/repository'; > > >> onConflict: [:ex | ex allow]; > > >> load. > > >> ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in > > >> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache > > >> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not > > >> access > > >> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte > > >> for > > >> utf-8 encoding' > > >> > > >> > > >> In a new fresh 60437 Image (i.e. empty package-cache) > > >> World menu > Monticello > +Repository > squeaksource.com... > > >> MCSqueaksourceRepository > > >> location: 'http://squeaksource.com/ical' > > >> user: '' > > >> password: '' > > >> ==> open repository then errors "MCRepositoryError: Could not access > > >> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte > > >> for > > >> utf-8 encoding" > > >> > > >> > > >> In Chrome, opening http://www.squeaksource.com/ical > > >> then clicking <Versions> > > >> and the browser's View Page Source, > > >> I see... > > >> <?xml version="1.0" encoding="iso-8859-1"?> > > >> > > >> Googling: zinc iso-8859-1 > > >> finds... > > >> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html > > >> but "ZnByteEncoder iso88591" > > >> errors with "KeyNotFound: key 'iso88591' not found in Dictionary" > > >> and inspecting "ZnByteEncoder byteTextConverters keys sorted" > > >> confirms this key is missing (@Sven, I'm curious why was this removed? ) > > >> > > >> > > >> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1 > > >> indicates IBM819 is an alias > > >> and " ZnByteEncoder newForEncoding: 'ibm819' " > > >> works okay > > >> > > >> So in MCHttpRepository>>#loadAllFileNames > > >> changing... > > >> queryAt: 'C' put: 'M;O=D' ; > > >> get. > > >> to... > > >> queryAt: 'C' put: 'M;O=D' . > > >> ZnDefaultCharacterEncoder > > >> value: (ZnByteEncoder newForEncoding: 'ibm819') > > >> during: [client get]. > > >> > > >> Then from Monticello opening the previously defined > > >> http://squeaksource.com/ical > > >> works!! > > >> > > >> > > >> Now I was hoping that reverting #loadAllFileNames > > >> and in Playground doing... > > >> converters := ZnByteEncoder byteTextConverters. > > >> converters at: 'iso-8859-1' put: (converters at: 'ibm819'). > > >> might alleviate the problem, but no luck. > > >> > > >> > > >> Anyone know a better way to deal with this that hardcoding the encoding > > >> into #loadAllFileNames? > > >> > > >> cheers -ben > > >> > > > > > > > > > > > > > > >
