Re: [Pharo-users] How to zip a WideString

Sven Van Caekenberghe Thu, 03 Oct 2019 02:57:43 -0700

Hi Peter,

About #zipped / #unzipped and the inflate / deflate classes: your observation 
is correct, these work from string to string, while clearly the compressed 
representation should be binary.


The contents (input, what is inside the compressed data) can be anything, it is 
not necessarily a string (it could be an image, so also something binary). Only 
the creator of the compressed data knows, you cannot assume to know in general.

It would be possible (and it would be very nice) to change this, however that 
will have serious impact on users (as the contract changes).

About your use case: why would your DB not be capable of storing large strings 
? A good DB should be capable of storing any kind of string (full unicode) 
efficiently.

What DB and what sizes are we talking about ?

Sven

> On 3 Oct 2019, at 11:06, PBKResearch <pe...@pbkresearch.co.uk> wrote:
> 
> Hello
>  
> I have a problem with text storage, to which I seem to have found a solution, 
> but it’s a bit clumsy-looking. I would be grateful for confirmation that (a) 
> there is no neater solution, (b) I can rely on this to work – I only know 
> that it works in a few test cases.
>  
> I need to store a large number of text strings in a database. To avoid the 
> database files becoming too large, I am thinking of zipping the strings, or 
> at least the less frequently accessed ones. Depending on the source, some of 
> the strings will be instances of ByteString, some of WideString (because they 
> contain characters not representable in one byte). Storing a WideString 
> uncompressed seems to occupy 4 bytes per character, so I decided, before 
> thinking of compression, to store the strings utf8Encoded, which yields a 
> ByteArray. But zipped can only be applied to a String, not a ByteArray.
>  
> So my proposed solution is:
>  
> For compression:             myZipString := myWideString utf8Encoded asString 
> zipped.
> For decompression:         myOutputString := myZipString unzipped asByteArray 
> utf8Decoded.
>  
> As I said, it works in all the cases I tried, whether WideString or not, but 
> the chains of transformations look clunky somehow. Can anyone see a neater 
> way of doing it? And can I rely on it working, especially when I am handling 
> foreign texts with many multi-byte characters?
>  
> Thanks in advance for any help.
>  
> Peter Kenny

Re: [Pharo-users] How to zip a WideString

Reply via email to