> On 18 Oct 2018, at 20:59, Sven Van Caekenberghe <[email protected]> wrote:
> 
> Hi,
> 
> Given 
> https://pharo.fogbugz.com/f/cases/21858/Cleanup-remaining-DeprecatedFileSystem-users
>  [where we need more help !!] I found String>>#zipped to be one of the users 
> of the deprecated RWBinaryOrTextStream. Although this usage is easy enough to 
> fix, I think the current #zipped / #unzipped on String is broken.
> 
> (note also that there are no real users of these methods)
> 
> Right now it seems cool that the following is an identity.
> 
>  'foobar' zipped unzipped.
> 
> However, the result of zipping something is actual something binary (a 
> collection of opaque bytes). Thinking of it, the input is actually also 
> bytes, not unencoded characters.
> 
> Of course, the current methods are broken, as can be seen from a more complex 
> (wide) string.
> 
>  'élèves Françaises @ 10 €' zipped unzipped. >>> <something very weird>
> 
> The error results from some implicit/wrong character encoding being used.
> 
> The right way to do this is to explicitly encode/decode the string.
> 
>  (GZipReadStream on: (ByteArray streamContents: [ :out | 
>     (GZipWriteStream on: out)
>        nextPutAll: 'élèves Françaises à 10 €' utf8Encoded; 
>        close ])) upToEnd utf8Decoded.
> 
> From this it would follow that #zipped / #unzipped would make more sense on 
> ByteArray. So that the above identity would become.
> 
>  'élèves Françaises à 10 €' utf8Encoded zipped unzipped utf8Decoded.
> 
> This change of signature would be comparable to what we recently did with 
> #base64Encoded / #base64Decoded
> 
> What do you think ?
> 

Yes, to me this sounds interesting… I think compression is indeed better done 
on the level of bytes then on Strings. 


        Marcus


Reply via email to