> On 15 Mar 2017, at 20:52, Rein, Patrick <[email protected]> wrote: > > Unfortunately, as I am trying to fix a Travis build, I can not change the > call to Zinc. > > To be clear about this: I also think that squeaksource should serve UTF-8. > However, at the same time a missing charset in a HTTP response means that the > content > should be decoded as ISO-8859-1 [1]. So in general this does seem to me like > an issue in Zinc. > > I see that this might be a problem to change though, so I will consider > moving the project at one point (or removing that damn umlaut :) ). > > Bests > Patrick > > [1] https://tools.ietf.org/html/rfc2616#section-3.7.1
Hmm, OK, I never saw that paragraph, interesting. Thanks for the pointer, I will put it on my todo list to think about. > ________________________________________ > From: Pharo-dev <[email protected]> on behalf of Ben Coman > <[email protected]> > Sent: Wednesday, March 15, 2017 19:16 > To: Pharo Development List > Subject: Re: [Pharo-dev] ZnInvalidUTF8 on response from squeaksource > > On Thu, Mar 16, 2017 at 1:25 AM, Sven Van Caekenberghe <[email protected]> wrote: >> >> Hi, >> >> This is a recurring issue. > > > It would be cool if some magic(TM) could raise a dialog with an > explanation and pull-down list to select an encoding - but maybe that > is too much hand holding. > > >> >> The problem is that the server serves a resource, in this case text/html, >> without specifying its encoding. > > I just bumped into [1] while browsing around to learn more, but I > don't know fully how to interpret it. > What do you make of it saying "An XHTML5 document is served as XML and > has XML syntax. XML parsers do not recognise the encoding declarations > in meta elements. They only recognise the XML declaration. Here is an > example: > <?xml version="1.0" encoding="utf-8"?> > <!DOCTYPE html .... > > compared to the page having... > <?xml version="1.0" encoding="iso-8859-1"?> > > cheers -ben > > [1] > https://www.w3.org/International/questions/qa-html-encoding-declarations > > >> >> Today, when no encoding is specified, we default to UTF-8. In this case the >> server silently serves a resource which is ISO-8895-1 encoded. >> >> The error is triggered by accessing the following URL: >> >> ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself. >> >> If you inspect the response object inside the http client, you will see that >> the content-type is text/html. So Zn parses the incoming text using UTF-8 >> which fails (Zn encoders are strict by default). >> >> Here is how to change the default during a call: >> >> ZnDefaultCharacterEncoder >> value: ZnCharacterEncoder iso88591 >> during: [ ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; >> yourself ]. >> >> The solution would be that the server adds the proper charset specification. >> >> Consider the default in Pharo: >> >> ZnMimeType textHtml => text/html;charset=utf-8 >> >> The server should serve this resource using the following Content-Type: >> >> text/html;charset=iso-8859-1 >> >> This is the server's responsibility. The page in question is the MC index >> page, which would normally be dynamically generated. Somewhere the server >> decides on the encoding. That encoding does not have to change, but it >> should be properly indicated in the HTTP response headers. >> >> HTH, >> >> Sven >> >>> On 15 Mar 2017, at 17:42, David T. Lewis <[email protected]> wrote: >>> >>> squeaksource.com is still running on a quite old image, and I know that it >>> has problems with multibyte characters. If you are seeing problems related >>> to this, it's not the fault of Zinc. >>> >>> If you can confirm that this is what is happening, then I guess it is time >>> to update that trusty old squeaksource.com image :-) >>> >>> Dave >>> >>>> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[email protected]> wrote: >>>>> >>>>> Hi everyone, >>>>> >>>>> I have been working on bringing http://squeaksource.com/ical/ up to >>>>> speed >>>>> for Squeak and wanted to make sure that it also works for Pharo. >>>> Therefore, >>>>> I have created a travis build job for Squeak and Pharo >>>>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which >>>> pulls >>>>> the source from squeaksource.com. >>>>> >>>>> Now the issue is that loading the package in Pharo fails with a >>>>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this >>>>> might be the result of the squeaksource page delivering the page as >>>>> iso-8859-1 as it contains special characters. Any ideas on how to get >>>>> this >>>>> to work? I do not have access to the ical repository description and I >>>> would >>>>> like to avoid mirroring the whole repository on GitHub. >>>> >>>> >>>> In a fresh 60437 image, in Playground evaluating... >>>> >>>> Metacello new >>>> configuration: 'ICal'; >>>> repository: 'github://codeZeilen/ical-smalltalk:master/repository'; >>>> onConflict: [:ex | ex allow]; >>>> load. >>>> ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in >>>> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache >>>> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not >>>> access >>>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for >>>> utf-8 encoding' >>>> >>>> >>>> In a new fresh 60437 Image (i.e. empty package-cache) >>>> World menu > Monticello > +Repository > squeaksource.com... >>>> MCSqueaksourceRepository >>>> location: 'http://squeaksource.com/ical' >>>> user: '' >>>> password: '' >>>> ==> open repository then errors "MCRepositoryError: Could not access >>>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for >>>> utf-8 encoding" >>>> >>>> >>>> In Chrome, opening http://www.squeaksource.com/ical >>>> then clicking <Versions> >>>> and the browser's View Page Source, >>>> I see... >>>> <?xml version="1.0" encoding="iso-8859-1"?> >>>> >>>> Googling: zinc iso-8859-1 >>>> finds... >>>> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html >>>> but "ZnByteEncoder iso88591" >>>> errors with "KeyNotFound: key 'iso88591' not found in Dictionary" >>>> and inspecting "ZnByteEncoder byteTextConverters keys sorted" >>>> confirms this key is missing (@Sven, I'm curious why was this removed? ) >>>> >>>> >>>> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1 >>>> indicates IBM819 is an alias >>>> and " ZnByteEncoder newForEncoding: 'ibm819' " >>>> works okay >>>> >>>> So in MCHttpRepository>>#loadAllFileNames >>>> changing... >>>> queryAt: 'C' put: 'M;O=D' ; >>>> get. >>>> to... >>>> queryAt: 'C' put: 'M;O=D' . >>>> ZnDefaultCharacterEncoder >>>> value: (ZnByteEncoder newForEncoding: 'ibm819') >>>> during: [client get]. >>>> >>>> Then from Monticello opening the previously defined >>>> http://squeaksource.com/ical >>>> works!! >>>> >>>> >>>> Now I was hoping that reverting #loadAllFileNames >>>> and in Playground doing... >>>> converters := ZnByteEncoder byteTextConverters. >>>> converters at: 'iso-8859-1' put: (converters at: 'ibm819'). >>>> might alleviate the problem, but no luck. >>>> >>>> >>>> Anyone know a better way to deal with this that hardcoding the encoding >>>> into #loadAllFileNames? >>>> >>>> cheers -ben >>>> >>> >>> >>> >> >> > >
