On Tue, 28 Aug 2018 at 12:43, Alistair Grant <[email protected]> wrote: > > Hi Sven and Everyone, > > Instantiation of character encoders is relatively expensive, as can be > seen by the following code snippet: > > | count str ba enc new old | > > count := 10000. > str := 'test-äöü'. > new := [ > count timesRepeat: [ > (ZnCharacterEncoder newForEncoding: 'utf8') encodeString: str > ] ] timeToRunWithoutGC. > enc := ZnCharacterEncoder newForEncoding: 'utf8'. > old := [ > count timesRepeat: [ > enc encodeString: str ] ] timeToRunWithoutGC. > { old. new. } > > " #(14 122)" > > What do you think of the idea of caching default instances of encoders? > > My idea was to have a dictionary in ZnCharacterEncoder and add a class > method #forEncoding:. This would lazily create an instance, store it > in the dictionary, and then re-use the instance whenever that encoder > is requested. If people really want their own instance (not sure why, > there aren't any instance variables, and so there's no state), they > can use the existing #newForEncoding:. > > For example, sending #exists to a file reference currently > instantiates two encoders. > > This was actually previously fixed by #20273[1], but seems to have > been lost with the refactoring of FilePluginPrims to File. From > memory, Cyril's application saved 12 seconds by caching the encoder. > But in that case the instance was private to the file plugin > primitives. This would be a more general solution. > > [1] > https://pharo.fogbugz.com/f/cases/20273/FilePluginPrims-could-save-an-encoder-to-speedup-some-methods
Just so people don't focus on the wrong thing: I realise that the ratio (almost 10:1 above) drops quickly as the string being converted grows in length. But if we take typical file names, the ratio, while smaller, is still significant, as suggested by Cyril's anecdote. Thanks, Alistair
