> On 18 Oct 2018, at 20:59, Sven Van Caekenberghe <[email protected]> wrote:
>
> Hi,
>
> Given
> https://pharo.fogbugz.com/f/cases/21858/Cleanup-remaining-DeprecatedFileSystem-users
> [where we need more help !!] I found String>>#zipped to be one of the users
> of the deprecated RWBinaryOrTextStream. Although this usage is easy enough to
> fix, I think the current #zipped / #unzipped on String is broken.
>
> (note also that there are no real users of these methods)
>
> Right now it seems cool that the following is an identity.
>
> 'foobar' zipped unzipped.
>
> However, the result of zipping something is actual something binary (a
> collection of opaque bytes). Thinking of it, the input is actually also
> bytes, not unencoded characters.
>
> Of course, the current methods are broken, as can be seen from a more complex
> (wide) string.
>
> 'élèves Françaises @ 10 €' zipped unzipped. >>> <something very weird>
>
> The error results from some implicit/wrong character encoding being used.
>
> The right way to do this is to explicitly encode/decode the string.
>
> (GZipReadStream on: (ByteArray streamContents: [ :out |
> (GZipWriteStream on: out)
> nextPutAll: 'élèves Françaises à 10 €' utf8Encoded;
> close ])) upToEnd utf8Decoded.
>
> From this it would follow that #zipped / #unzipped would make more sense on
> ByteArray. So that the above identity would become.
>
> 'élèves Françaises à 10 €' utf8Encoded zipped unzipped utf8Decoded.
>
> This change of signature would be comparable to what we recently did with
> #base64Encoded / #base64Decoded
>
> What do you think ?
>
Yes, to me this sounds interesting… I think compression is indeed better done
on the level of bytes then on Strings.
Marcus