https://github.com/pharo-project/pharo/pull/1947
> On 26 Oct 2018, at 22:28, Stephane Ducasse <[email protected]> wrote: > > +1 > > On Tue, Oct 23, 2018 at 10:50 AM Marcus Denker <[email protected]> wrote: >> >> >> >>> On 18 Oct 2018, at 20:59, Sven Van Caekenberghe <[email protected]> wrote: >>> >>> Hi, >>> >>> Given >>> https://pharo.fogbugz.com/f/cases/21858/Cleanup-remaining-DeprecatedFileSystem-users >>> [where we need more help !!] I found String>>#zipped to be one of the >>> users of the deprecated RWBinaryOrTextStream. Although this usage is easy >>> enough to fix, I think the current #zipped / #unzipped on String is broken. >>> >>> (note also that there are no real users of these methods) >>> >>> Right now it seems cool that the following is an identity. >>> >>> 'foobar' zipped unzipped. >>> >>> However, the result of zipping something is actual something binary (a >>> collection of opaque bytes). Thinking of it, the input is actually also >>> bytes, not unencoded characters. >>> >>> Of course, the current methods are broken, as can be seen from a more >>> complex (wide) string. >>> >>> 'élèves Françaises @ 10 €' zipped unzipped. >>> <something very weird> >>> >>> The error results from some implicit/wrong character encoding being used. >>> >>> The right way to do this is to explicitly encode/decode the string. >>> >>> (GZipReadStream on: (ByteArray streamContents: [ :out | >>> (GZipWriteStream on: out) >>> nextPutAll: 'élèves Françaises à 10 €' utf8Encoded; >>> close ])) upToEnd utf8Decoded. >>> >>> From this it would follow that #zipped / #unzipped would make more sense on >>> ByteArray. So that the above identity would become. >>> >>> 'élèves Françaises à 10 €' utf8Encoded zipped unzipped utf8Decoded. >>> >>> This change of signature would be comparable to what we recently did with >>> #base64Encoded / #base64Decoded >>> >>> What do you think ? >>> >> >> Yes, to me this sounds interesting… I think compression is indeed better >> done on the level of bytes then on Strings. >> >> >> Marcus >> >> >
