> Sent: Wednesday, March 15, 2017 at 2:16 PM
> From: "Ben Coman" <[email protected]>
> To: "Pharo Development List" <[email protected]>
> Subject: Re: [Pharo-dev] ZnInvalidUTF8 on response from squeaksource
>
> On Thu, Mar 16, 2017 at 1:25 AM, Sven Van Caekenberghe <[email protected]> wrote:
> >
> > Hi,
> >
> > This is a recurring issue.
> 
> 
> It would be cool if some magic(TM) could raise a dialog with an
> explanation and pull-down list to select an encoding - but maybe that
> is too much hand holding.

That's an interesting idea.

> 
> >
> > The problem is that the server serves a resource, in this case text/html, 
> > without specifying its encoding.
> 
> I just bumped into [1] while browsing around to learn more, but I
> don't know fully how to interpret it.
> What do you make of it saying "An XHTML5 document is served as XML and
> has XML syntax. XML parsers do not recognise the encoding declarations
> in meta elements. They only recognise the XML declaration. Here is an
> example:
>     <?xml version="1.0" encoding="utf-8"?>
>     <!DOCTYPE html ....
> 
> compared to the page having...
>     <?xml version="1.0" encoding="iso-8859-1"?>
> 
> cheers -ben

That isn't Zinc's responsibility; it just handles HTTP. The HTML or XML parser 
using it should disable Zinc's automatic decoding based on Content-Type and do 
its own decoding of the raw response (which can still be done using Zinc's 
decoders) informed by the content of the response and not just its 
Content-Type. XMLParser and XMLParserHTML both use Zinc this way.

> [1]    
> https://www.w3.org/International/questions/qa-html-encoding-declarations
> 
> 
> >
> > Today, when no encoding is specified, we default to UTF-8. In this case the 
> > server silently serves a resource which is ISO-8895-1 encoded.
> >
> > The error is triggered by accessing the following URL:
> >
> > ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself.
> >
> > If you inspect the response object inside the http client, you will see 
> > that the content-type is text/html. So Zn parses the incoming text using 
> > UTF-8 which fails (Zn encoders are strict by default).
> >
> > Here is how to change the default during a call:
> >
> > ZnDefaultCharacterEncoder
> >   value: ZnCharacterEncoder iso88591
> >   during: [ ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; 
> > yourself ].
> >
> > The solution would be that the server adds the proper charset specification.
> >
> > Consider the default in Pharo:
> >
> > ZnMimeType textHtml => text/html;charset=utf-8
> >
> > The server should serve this resource using the following Content-Type:
> >
> > text/html;charset=iso-8859-1
> >
> > This is the server's responsibility. The page in question is the MC index 
> > page, which would normally be dynamically generated. Somewhere the server 
> > decides on the encoding. That encoding does not have to change, but it 
> > should be properly indicated in the HTTP response headers.
> >
> > HTH,
> >
> > Sven
> >
> > > On 15 Mar 2017, at 17:42, David T. Lewis <[email protected]> wrote:
> > >
> > > squeaksource.com is still running on a quite old image, and I know that it
> > > has problems with multibyte characters. If you are seeing problems related
> > > to this, it's not the fault of Zinc.
> > >
> > > If you can confirm that this is what is happening, then I guess it is time
> > > to update that trusty old squeaksource.com image :-)
> > >
> > > Dave
> > >
> > >> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[email protected]> wrote:
> > >>>
> > >>> Hi everyone,
> > >>>
> > >>> I have been working on bringing http://squeaksource.com/ical/ up to
> > >>> speed
> > >>> for Squeak and wanted to make sure that it also works for Pharo.
> > >> Therefore,
> > >>> I have created a travis build job for Squeak and Pharo
> > >>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
> > >> pulls
> > >>> the source from squeaksource.com.
> > >>>
> > >>> Now the issue is that loading the package in Pharo fails with a
> > >>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
> > >>> might be the result of the squeaksource page delivering the page as
> > >>> iso-8859-1 as it contains special characters. Any ideas on how to get
> > >>> this
> > >>> to work? I do not have access to the ical repository description and I
> > >> would
> > >>> like to avoid mirroring the whole repository on GitHub.
> > >>
> > >>
> > >> In a fresh 60437 image, in Playground evaluating...
> > >>
> > >>  Metacello new
> > >>       configuration: 'ICal';
> > >>       repository: 'github://codeZeilen/ical-smalltalk:master/repository';
> > >>       onConflict: [:ex | ex allow];
> > >>       load.
> > >>  ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
> > >> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
> > >> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
> > >> access
> > >> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte 
> > >> for
> > >> utf-8 encoding'
> > >>
> > >>
> > >> In a new fresh 60437 Image (i.e. empty package-cache)
> > >>  World menu > Monticello > +Repository > squeaksource.com...
> > >>     MCSqueaksourceRepository
> > >>        location: 'http://squeaksource.com/ical'
> > >>        user: ''
> > >>        password: ''
> > >>   ==> open repository then errors "MCRepositoryError: Could not access
> > >> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte 
> > >> for
> > >> utf-8 encoding"
> > >>
> > >>
> > >> In Chrome, opening http://www.squeaksource.com/ical
> > >> then clicking <Versions>
> > >> and the browser's View Page Source,
> > >> I see...
> > >>   <?xml version="1.0" encoding="iso-8859-1"?>
> > >>
> > >> Googling: zinc iso-8859-1
> > >> finds...
> > >> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
> > >> but "ZnByteEncoder iso88591"
> > >> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
> > >> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
> > >> confirms this key is missing (@Sven, I'm curious why was this removed? )
> > >>
> > >>
> > >> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
> > >> indicates IBM819 is an alias
> > >> and " ZnByteEncoder newForEncoding: 'ibm819' "
> > >> works okay
> > >>
> > >> So in MCHttpRepository>>#loadAllFileNames
> > >> changing...
> > >>         queryAt: 'C' put: 'M;O=D' ;
> > >>         get.
> > >> to...
> > >>         queryAt: 'C' put: 'M;O=D' .
> > >>         ZnDefaultCharacterEncoder
> > >>              value: (ZnByteEncoder newForEncoding: 'ibm819')
> > >>              during: [client get].
> > >>
> > >> Then from Monticello opening the previously defined
> > >> http://squeaksource.com/ical
> > >> works!!
> > >>
> > >>
> > >> Now I was hoping that reverting #loadAllFileNames
> > >> and in Playground doing...
> > >>    converters := ZnByteEncoder byteTextConverters.
> > >>    converters at: 'iso-8859-1' put: (converters at: 'ibm819').
> > >> might alleviate the problem, but no luck.
> > >>
> > >>
> > >> Anyone know a better way to deal with this that hardcoding the encoding
> > >> into #loadAllFileNames?
> > >>
> > >> cheers -ben
> > >>
> > >
> > >
> > >
> >
> >
> 
> 

Reply via email to