Re: [Pharo-dev] About #zipped / #unzipped

Sven Van Caekenberghe Wed, 31 Oct 2018 08:41:51 -0700

https://github.com/pharo-project/pharo/pull/1947


> On 26 Oct 2018, at 22:28, Stephane Ducasse <[email protected]> wrote:
> 
> +1
> 
> On Tue, Oct 23, 2018 at 10:50 AM Marcus Denker <[email protected]> wrote:
>> 
>> 
>> 
>>> On 18 Oct 2018, at 20:59, Sven Van Caekenberghe <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> Given 
>>> https://pharo.fogbugz.com/f/cases/21858/Cleanup-remaining-DeprecatedFileSystem-users
>>>  [where we need more help !!] I found String>>#zipped to be one of the 
>>> users of the deprecated RWBinaryOrTextStream. Although this usage is easy 
>>> enough to fix, I think the current #zipped / #unzipped on String is broken.
>>> 
>>> (note also that there are no real users of these methods)
>>> 
>>> Right now it seems cool that the following is an identity.
>>> 
>>> 'foobar' zipped unzipped.
>>> 
>>> However, the result of zipping something is actual something binary (a 
>>> collection of opaque bytes). Thinking of it, the input is actually also 
>>> bytes, not unencoded characters.
>>> 
>>> Of course, the current methods are broken, as can be seen from a more 
>>> complex (wide) string.
>>> 
>>> 'élèves Françaises @ 10 €' zipped unzipped. >>> <something very weird>
>>> 
>>> The error results from some implicit/wrong character encoding being used.
>>> 
>>> The right way to do this is to explicitly encode/decode the string.
>>> 
>>> (GZipReadStream on: (ByteArray streamContents: [ :out |
>>>    (GZipWriteStream on: out)
>>>       nextPutAll: 'élèves Françaises à 10 €' utf8Encoded;
>>>       close ])) upToEnd utf8Decoded.
>>> 
>>> From this it would follow that #zipped / #unzipped would make more sense on 
>>> ByteArray. So that the above identity would become.
>>> 
>>> 'élèves Françaises à 10 €' utf8Encoded zipped unzipped utf8Decoded.
>>> 
>>> This change of signature would be comparable to what we recently did with 
>>> #base64Encoded / #base64Decoded
>>> 
>>> What do you think ?
>>> 
>> 
>> Yes, to me this sounds interesting… I think compression is indeed better 
>> done on the level of bytes then on Strings.
>> 
>> 
>>        Marcus
>> 
>> 
>

Re: [Pharo-dev] About #zipped / #unzipped

Reply via email to