Re: [Pharo-dev] String deduplication

Chris Muller Mon, 02 Jun 2014 07:36:06 -0700

This is a normal pattern in application-development.  For example,
Todd Blanchard's HTML validating parser grabs a huge chunk of HTML,
and all the parts are simply referenced by position within that big
String.


On Mon, Jun 2, 2014 at 8:26 AM, Christophe Demarey
<[email protected]> wrote:
>
> Le 30 mai 2014 à 09:39, Philippe Marschall a écrit :
>
>> Hi
>>
>> This is an idea I stole from somebody else. The assumption is that you have 
>> a lot of Strings in the image that are equal. We could therefore remove the 
>> duplicates and make all the objects refer to the same instance.
>
>
> I worked on a String optimisation for a Java virtual machine dedicated to 
> small hardwares.
> A string was represented by an array of bytes (UTF8 or ASCII encoding), a 
> start position, and the number of characters of the string.
> It allows to reuse the internal byte array for different strings but the 
> String object was different for each String.
> With this approach, you are able to save a lot of memory (but with some CPU 
> overhead) and you don't have a problems because you have different String 
> objects for each String.
>
> ex: b1: 'Hello World! It's a sunny day'
>
> 'Hello World! It's a sunny day' : start = 0, count = 29, value b1
> 'Hello' : start = 0, count = 5, value b1
> 'Hello World!' : start = 0, count = 12, value b1
> 'World': start = 6, count = 5, value b1
>
>
> I don't see how it may be applied to Smalltalk and it makes sense.
>
> Christophe.
>

Re: [Pharo-dev] String deduplication

Reply via email to