+1
On Tue, Oct 23, 2018 at 10:50 AM Marcus Denker <[email protected]> wrote:
>
>
>
> > On 18 Oct 2018, at 20:59, Sven Van Caekenberghe <[email protected]> wrote:
> >
> > Hi,
> >
> > Given
> > https://pharo.fogbugz.com/f/cases/21858/Cleanup-remaining-DeprecatedFileSystem-users
> > [where we need more help !!] I found String>>#zipped to be one of the
> > users of the deprecated RWBinaryOrTextStream. Although this usage is easy
> > enough to fix, I think the current #zipped / #unzipped on String is broken.
> >
> > (note also that there are no real users of these methods)
> >
> > Right now it seems cool that the following is an identity.
> >
> > 'foobar' zipped unzipped.
> >
> > However, the result of zipping something is actual something binary (a
> > collection of opaque bytes). Thinking of it, the input is actually also
> > bytes, not unencoded characters.
> >
> > Of course, the current methods are broken, as can be seen from a more
> > complex (wide) string.
> >
> > 'élèves Françaises @ 10 €' zipped unzipped. >>> <something very weird>
> >
> > The error results from some implicit/wrong character encoding being used.
> >
> > The right way to do this is to explicitly encode/decode the string.
> >
> > (GZipReadStream on: (ByteArray streamContents: [ :out |
> > (GZipWriteStream on: out)
> > nextPutAll: 'élèves Françaises à 10 €' utf8Encoded;
> > close ])) upToEnd utf8Decoded.
> >
> > From this it would follow that #zipped / #unzipped would make more sense on
> > ByteArray. So that the above identity would become.
> >
> > 'élèves Françaises à 10 €' utf8Encoded zipped unzipped utf8Decoded.
> >
> > This change of signature would be comparable to what we recently did with
> > #base64Encoded / #base64Decoded
> >
> > What do you think ?
> >
>
> Yes, to me this sounds interesting… I think compression is indeed better done
> on the level of bytes then on Strings.
>
>
> Marcus
>
>