I'm curious to see which method is really using mutable strings. Of course, while constructing and write streaming on it, but then it's like a temporary storage area and we don't have to care. We're speaking of a place where a String would be modified to retain some state... Of course, since we can't exclude this possibility theoretically, the proposed changed is unsafe. But practically...
2014-05-30 13:46 GMT+02:00 Marcus Denker <[email protected]>: > > On 30 May 2014, at 10:59, Clément Bera <[email protected]> wrote: > > Hello, > > I like the idea but this is not as simple. > > In some framework you may use different string with a same name as markers > that are not equals. > > Typically: > > Object>>#string1 > ^ 'string' > > Object>>#string2 > ^ 'string' > > Object>>#test > self assert: self string1 == self string1. "Answers true" > self assert: self string2 == self string2. "Answers true" > self assert: self string1 == self string2 "Answers false" > > Frameworks relying on that will not work any more. > > And this kind of bugs is not easy to spot, it typically crashes identity > collections in a non deterministic fashion. > > > With an indirection (a kind of reference) that > > -> points to the string > -> forwards everything, but does a copy on write on state change > -> implements == to return false > > > it would work. Of course you have then the same amount of objects(+1), but > they would be all very > small, thus leading to saving for large objects and especially when > applied to subgraphs. > > Marcus > > > Regards > > > 2014-05-30 9:39 GMT+02:00 Philippe Marschall < > [email protected]>: > >> Hi >> >> This is an idea I stole from somebody else. The assumption is that you >> have a lot of Strings in the image that are equal. We could therefore >> remove the duplicates and make all the objects refer to the same instance. >> >> However it's not a simple as that. The main issue is that String has two >> responsibilities. The first is as an immutable value object. The second is >> as a mutable character buffer for building immutable value objects. We must >> not deduplicate the second kind. Unfortunately it's not straight forward to >> figure out which kind a string is. The approach I took is looking at >> whether it contains any 0 characters. An other option would be to check >> whether any WirteStreams are referring to it. >> Also, since there are behavioral differences between String and Symbol >> besides #= we must exclude Symbols (eg. there is #'hello' and 'hello' in >> the heap and they compare #= true but we must not make anybody who refers >> to 'hello' suddenly refer to #'hello'). >> >> Anyway here's the code, this saves about 2 MB in a fairly stock Pharo 3 >> image. Sorry for the bad variable names. >> >> | b d m | >> b := Bag new. >> d := OrderedCollection new. >> m := Dictionary new. >> "count all string instances" >> String allSubInstancesDo: [ :s | >> s isSymbol ifFalse: [ >> b add: s ] ]. >> "find the ones that have no duplicates or are likely buffers" >> b doWithOccurrences: [ :s :i | >> (i = 1 or: [ s anySatisfy: [ :c | c codePoint = 0 ] ]) ifTrue: [ >> d add: s -> i ] ]. >> "remove the ones that have no duplicates or are likely buffers" >> d do: [ :a | >> a value timesRepeat: [ >> b remove: a key ] ]. >> "map all duplicate strings to their duplicates" >> String allSubInstancesDo: [ :s | >> s isSymbol ifFalse: [ >> (b includes: s) ifTrue: [ >> | l | >> l := m at: s ifAbsentPut: [ OrderedCollection new ]. >> l add: s ] ]. >> "remove the duplicates" >> m keysAndValues do [ :k :v | >> | f | >> f := v at: 1. >> 2 to: v size do: [ :i | >> (v at: i) becomeForward: f ] ] >> >> Cheers >> Philippe >> >> >> > >
