> On 15 Mar 2017, at 20:52, Rein, Patrick <[email protected]> wrote:
> 
> Unfortunately, as I am trying to fix a Travis build, I can not change the 
> call to Zinc.
> 
> To be clear about this: I also think that squeaksource should serve UTF-8. 
> However, at the same time a missing charset in a HTTP response means that the 
> content
> should be decoded as ISO-8859-1 [1]. So in general this does seem to me like 
> an issue in Zinc.
> 
> I see that this might be a problem to change though, so I will consider 
> moving the project at one point (or removing that damn umlaut :) ).
> 
> Bests
> Patrick
> 
> [1] https://tools.ietf.org/html/rfc2616#section-3.7.1

Hmm, OK, I never saw that paragraph, interesting.
Thanks for the pointer, I will put it on my todo list to think about.

> ________________________________________
> From: Pharo-dev <[email protected]> on behalf of Ben Coman 
> <[email protected]>
> Sent: Wednesday, March 15, 2017 19:16
> To: Pharo Development List
> Subject: Re: [Pharo-dev] ZnInvalidUTF8 on response from squeaksource
> 
> On Thu, Mar 16, 2017 at 1:25 AM, Sven Van Caekenberghe <[email protected]> wrote:
>> 
>> Hi,
>> 
>> This is a recurring issue.
> 
> 
> It would be cool if some magic(TM) could raise a dialog with an
> explanation and pull-down list to select an encoding - but maybe that
> is too much hand holding.
> 
> 
>> 
>> The problem is that the server serves a resource, in this case text/html, 
>> without specifying its encoding.
> 
> I just bumped into [1] while browsing around to learn more, but I
> don't know fully how to interpret it.
> What do you make of it saying "An XHTML5 document is served as XML and
> has XML syntax. XML parsers do not recognise the encoding declarations
> in meta elements. They only recognise the XML declaration. Here is an
> example:
>    <?xml version="1.0" encoding="utf-8"?>
>    <!DOCTYPE html ....
> 
> compared to the page having...
>    <?xml version="1.0" encoding="iso-8859-1"?>
> 
> cheers -ben
> 
> [1]    
> https://www.w3.org/International/questions/qa-html-encoding-declarations
> 
> 
>> 
>> Today, when no encoding is specified, we default to UTF-8. In this case the 
>> server silently serves a resource which is ISO-8895-1 encoded.
>> 
>> The error is triggered by accessing the following URL:
>> 
>> ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself.
>> 
>> If you inspect the response object inside the http client, you will see that 
>> the content-type is text/html. So Zn parses the incoming text using UTF-8 
>> which fails (Zn encoders are strict by default).
>> 
>> Here is how to change the default during a call:
>> 
>> ZnDefaultCharacterEncoder
>>  value: ZnCharacterEncoder iso88591
>>  during: [ ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; 
>> yourself ].
>> 
>> The solution would be that the server adds the proper charset specification.
>> 
>> Consider the default in Pharo:
>> 
>> ZnMimeType textHtml => text/html;charset=utf-8
>> 
>> The server should serve this resource using the following Content-Type:
>> 
>> text/html;charset=iso-8859-1
>> 
>> This is the server's responsibility. The page in question is the MC index 
>> page, which would normally be dynamically generated. Somewhere the server 
>> decides on the encoding. That encoding does not have to change, but it 
>> should be properly indicated in the HTTP response headers.
>> 
>> HTH,
>> 
>> Sven
>> 
>>> On 15 Mar 2017, at 17:42, David T. Lewis <[email protected]> wrote:
>>> 
>>> squeaksource.com is still running on a quite old image, and I know that it
>>> has problems with multibyte characters. If you are seeing problems related
>>> to this, it's not the fault of Zinc.
>>> 
>>> If you can confirm that this is what is happening, then I guess it is time
>>> to update that trusty old squeaksource.com image :-)
>>> 
>>> Dave
>>> 
>>>> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[email protected]> wrote:
>>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> I have been working on bringing http://squeaksource.com/ical/ up to
>>>>> speed
>>>>> for Squeak and wanted to make sure that it also works for Pharo.
>>>> Therefore,
>>>>> I have created a travis build job for Squeak and Pharo
>>>>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
>>>> pulls
>>>>> the source from squeaksource.com.
>>>>> 
>>>>> Now the issue is that loading the package in Pharo fails with a
>>>>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
>>>>> might be the result of the squeaksource page delivering the page as
>>>>> iso-8859-1 as it contains special characters. Any ideas on how to get
>>>>> this
>>>>> to work? I do not have access to the ical repository description and I
>>>> would
>>>>> like to avoid mirroring the whole repository on GitHub.
>>>> 
>>>> 
>>>> In a fresh 60437 image, in Playground evaluating...
>>>> 
>>>> Metacello new
>>>>      configuration: 'ICal';
>>>>      repository: 'github://codeZeilen/ical-smalltalk:master/repository';
>>>>      onConflict: [:ex | ex allow];
>>>>      load.
>>>> ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
>>>> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
>>>> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
>>>> access
>>>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
>>>> utf-8 encoding'
>>>> 
>>>> 
>>>> In a new fresh 60437 Image (i.e. empty package-cache)
>>>> World menu > Monticello > +Repository > squeaksource.com...
>>>>    MCSqueaksourceRepository
>>>>       location: 'http://squeaksource.com/ical'
>>>>       user: ''
>>>>       password: ''
>>>>  ==> open repository then errors "MCRepositoryError: Could not access
>>>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
>>>> utf-8 encoding"
>>>> 
>>>> 
>>>> In Chrome, opening http://www.squeaksource.com/ical
>>>> then clicking <Versions>
>>>> and the browser's View Page Source,
>>>> I see...
>>>>  <?xml version="1.0" encoding="iso-8859-1"?>
>>>> 
>>>> Googling: zinc iso-8859-1
>>>> finds...
>>>> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
>>>> but "ZnByteEncoder iso88591"
>>>> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
>>>> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
>>>> confirms this key is missing (@Sven, I'm curious why was this removed? )
>>>> 
>>>> 
>>>> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
>>>> indicates IBM819 is an alias
>>>> and " ZnByteEncoder newForEncoding: 'ibm819' "
>>>> works okay
>>>> 
>>>> So in MCHttpRepository>>#loadAllFileNames
>>>> changing...
>>>>        queryAt: 'C' put: 'M;O=D' ;
>>>>        get.
>>>> to...
>>>>        queryAt: 'C' put: 'M;O=D' .
>>>>        ZnDefaultCharacterEncoder
>>>>             value: (ZnByteEncoder newForEncoding: 'ibm819')
>>>>             during: [client get].
>>>> 
>>>> Then from Monticello opening the previously defined
>>>> http://squeaksource.com/ical
>>>> works!!
>>>> 
>>>> 
>>>> Now I was hoping that reverting #loadAllFileNames
>>>> and in Playground doing...
>>>>   converters := ZnByteEncoder byteTextConverters.
>>>>   converters at: 'iso-8859-1' put: (converters at: 'ibm819').
>>>> might alleviate the problem, but no luck.
>>>> 
>>>> 
>>>> Anyone know a better way to deal with this that hardcoding the encoding
>>>> into #loadAllFileNames?
>>>> 
>>>> cheers -ben
>>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 


Reply via email to