Hi
This is an idea I stole from somebody else. The assumption is that you
have a lot of Strings in the image that are equal. We could therefore
remove the duplicates and make all the objects refer to the same instance.
However it's not a simple as that. The main issue is that String has two
responsibilities. The first is as an immutable value object. The second
is as a mutable character buffer for building immutable value objects.
We must not deduplicate the second kind. Unfortunately it's not straight
forward to figure out which kind a string is. The approach I took is
looking at whether it contains any 0 characters. An other option would
be to check whether any WirteStreams are referring to it.
Also, since there are behavioral differences between String and Symbol
besides #= we must exclude Symbols (eg. there is #'hello' and 'hello' in
the heap and they compare #= true but we must not make anybody who
refers to 'hello' suddenly refer to #'hello').
Anyway here's the code, this saves about 2 MB in a fairly stock Pharo 3
image. Sorry for the bad variable names.
| b d m |
b := Bag new.
d := OrderedCollection new.
m := Dictionary new.
"count all string instances"
String allSubInstancesDo: [ :s |
s isSymbol ifFalse: [
b add: s ] ].
"find the ones that have no duplicates or are likely buffers"
b doWithOccurrences: [ :s :i |
(i = 1 or: [ s anySatisfy: [ :c | c codePoint = 0 ] ]) ifTrue: [
d add: s -> i ] ].
"remove the ones that have no duplicates or are likely buffers"
d do: [ :a |
a value timesRepeat: [
b remove: a key ] ].
"map all duplicate strings to their duplicates"
String allSubInstancesDo: [ :s |
s isSymbol ifFalse: [
(b includes: s) ifTrue: [
| l |
l := m at: s ifAbsentPut: [ OrderedCollection new ].
l add: s ] ].
"remove the duplicates"
m keysAndValues do [ :k :v |
| f |
f := v at: 1.
2 to: v size do: [ :i |
(v at: i) becomeForward: f ] ]
Cheers
Philippe