+1

On Tue, Oct 23, 2018 at 10:50 AM Marcus Denker <[email protected]> wrote:
>
>
>
> > On 18 Oct 2018, at 20:59, Sven Van Caekenberghe <[email protected]> wrote:
> >
> > Hi,
> >
> > Given 
> > https://pharo.fogbugz.com/f/cases/21858/Cleanup-remaining-DeprecatedFileSystem-users
> >  [where we need more help !!] I found String>>#zipped to be one of the 
> > users of the deprecated RWBinaryOrTextStream. Although this usage is easy 
> > enough to fix, I think the current #zipped / #unzipped on String is broken.
> >
> > (note also that there are no real users of these methods)
> >
> > Right now it seems cool that the following is an identity.
> >
> >  'foobar' zipped unzipped.
> >
> > However, the result of zipping something is actual something binary (a 
> > collection of opaque bytes). Thinking of it, the input is actually also 
> > bytes, not unencoded characters.
> >
> > Of course, the current methods are broken, as can be seen from a more 
> > complex (wide) string.
> >
> >  'élèves Françaises @ 10 €' zipped unzipped. >>> <something very weird>
> >
> > The error results from some implicit/wrong character encoding being used.
> >
> > The right way to do this is to explicitly encode/decode the string.
> >
> >  (GZipReadStream on: (ByteArray streamContents: [ :out |
> >     (GZipWriteStream on: out)
> >        nextPutAll: 'élèves Françaises à 10 €' utf8Encoded;
> >        close ])) upToEnd utf8Decoded.
> >
> > From this it would follow that #zipped / #unzipped would make more sense on 
> > ByteArray. So that the above identity would become.
> >
> >  'élèves Françaises à 10 €' utf8Encoded zipped unzipped utf8Decoded.
> >
> > This change of signature would be comparable to what we recently did with 
> > #base64Encoded / #base64Decoded
> >
> > What do you think ?
> >
>
> Yes, to me this sounds interesting… I think compression is indeed better done 
> on the level of bytes then on Strings.
>
>
>         Marcus
>
>

Reply via email to