Hello,
I like the idea but this is not as simple.
In some framework you may use different string with a same name as markers
that are not equals.
Typically:
Object>>#string1
^ 'string'
Object>>#string2
^ 'string'
Object>>#test
self assert: self string1 == self string1. "Answers true"
self assert: self string2 == self string2. "Answers true"
self assert: self string1 == self string2 "Answers false"
Frameworks relying on that will not work any more.
And this kind of bugs is not easy to spot, it typically crashes identity
collections in a non deterministic fashion.
Regards
2014-05-30 9:39 GMT+02:00 Philippe Marschall <
[email protected]>:
> Hi
>
> This is an idea I stole from somebody else. The assumption is that you
> have a lot of Strings in the image that are equal. We could therefore
> remove the duplicates and make all the objects refer to the same instance.
>
> However it's not a simple as that. The main issue is that String has two
> responsibilities. The first is as an immutable value object. The second is
> as a mutable character buffer for building immutable value objects. We must
> not deduplicate the second kind. Unfortunately it's not straight forward to
> figure out which kind a string is. The approach I took is looking at
> whether it contains any 0 characters. An other option would be to check
> whether any WirteStreams are referring to it.
> Also, since there are behavioral differences between String and Symbol
> besides #= we must exclude Symbols (eg. there is #'hello' and 'hello' in
> the heap and they compare #= true but we must not make anybody who refers
> to 'hello' suddenly refer to #'hello').
>
> Anyway here's the code, this saves about 2 MB in a fairly stock Pharo 3
> image. Sorry for the bad variable names.
>
> | b d m |
> b := Bag new.
> d := OrderedCollection new.
> m := Dictionary new.
> "count all string instances"
> String allSubInstancesDo: [ :s |
> s isSymbol ifFalse: [
> b add: s ] ].
> "find the ones that have no duplicates or are likely buffers"
> b doWithOccurrences: [ :s :i |
> (i = 1 or: [ s anySatisfy: [ :c | c codePoint = 0 ] ]) ifTrue: [
> d add: s -> i ] ].
> "remove the ones that have no duplicates or are likely buffers"
> d do: [ :a |
> a value timesRepeat: [
> b remove: a key ] ].
> "map all duplicate strings to their duplicates"
> String allSubInstancesDo: [ :s |
> s isSymbol ifFalse: [
> (b includes: s) ifTrue: [
> | l |
> l := m at: s ifAbsentPut: [ OrderedCollection new ].
> l add: s ] ].
> "remove the duplicates"
> m keysAndValues do [ :k :v |
> | f |
> f := v at: 1.
> 2 to: v size do: [ :i |
> (v at: i) becomeForward: f ] ]
>
> Cheers
> Philippe
>
>
>