This is awesome! Thanks Sven :) Nico
Sven Van Caekenberghe writes: > On 10 Dec 2013, at 23:09, Stéphane Ducasse <[email protected]> wrote: > >> Sven thanks a lot >> How can we resist to push this in Pharo 30 :) > > That is the idea, and I will do it, but I would first like some > feedback/validation. > >> Stef >> >> On Dec 10, 2013, at 4:54 PM, Sven Van Caekenberghe <[email protected]> wrote: >> >>> Hi, >>> >>> Caching is an important technique to trade time (speed of execution) for >>> space (memory consumption) in computation. Used correctly, caching can >>> significantly improve an application’s performance. >>> >>> Given the key/value nature of a cache, a Dictionary is often the first data >>> structure used for implementing it. This is however not a good idea since a >>> simple Dictionary is unbounded by definition and can/will lead to explosive >>> cache growth. >>> >>> Quick and dirty hacks to turn a Dictionary into a least-recently-used (LRU) >>> and/or time-to-live (TTL) based limited cache are often incorrect and >>> inefficient. >>> >>> The LRUCache implementation currently in Pharo is too simple and not >>> efficient, being O(n) on its main operation. >>> >>> Pharo deserves and needs a good LRU/TTL cache implementation that >>> can/should be used uniformly in various places of the system and in code >>> built on top of it. That is why I implemented a proposal. >>> >>> Neo-Caching is an implementation of both a good LRU cache as well as a TTL >>> extension of it. The caches are easy to use, yet have a number of >>> interesting features. The implementation is properly O(1) on its main >>> operations and is 2 to 3 times faster than LRUCache. The package also >>> contains a double linked list implementation. The code comes with a full >>> set of unit tests. There are class and method comments and the code is easy >>> to read. >>> >>> The code (which has no dependencies) can be found here >>> >>> http://mc.stfx.eu/Neo >>> http://www.smalltalkhub.com/mc/SvenVanCaekenberghe/Neo/main >>> >>> There is only one package to load. The code has been written in Pharo 3.0 >>> but it should run on older versions as well. >>> >>> >>> Code >>> >>> >>> Let’s create a 16K ordered collection of keys that have some repetitions in >>> them with a reasonable distribution: >>> >>> data := Array streamContents: [ :out | >>> 1 to: 4096 do: [ :each | >>> each primeFactorsOn: out. >>> out nextPut: each ] ]. >>> data := data collect: [ :each | each asWords ]. >>> >>> By using #asWords the keys have better hash properties. Now, we put all >>> this in a cache: >>> >>> cache := NeoLRUCache new. >>> cache maximumWeight: 512. >>> cache factory: [ :key | key ]. >>> data do: [ :each | cache at: each ]. >>> cache. >>> >>> ==> a NeoLRUCache(#512 512/512 [ 1 ] [ :key | key ] 72%) >>> >>> We had a 72% hit ratio, which is good. The cache is of course full, 512/512 >>> with 512 entries (not necessarily the same thing, see further). >>> >>> We can now benchmark this: >>> >>> [ data do: [ :each | cache at: each ] ] bench. >>> >>> ==> '35.8 per second.’ >>> >>> And compare it to LRUCache: >>> >>> cache := LRUCache size: 512 factory: [ :key | key ]. >>> >>> [ data do: [ :each | cache at: each ] ] bench. >>> >>> ==> '12.6 per second.’ >>> >>> >>> Features >>> >>> >>> Instead of just counting the number of entries, which is the default >>> behaviour, the concept of weight is used to determine when a cache is full. >>> For each cached value, a block or selector is used to compute its weight. >>> When adding a new entry causes the weight to exceed a maximum, eviction of >>> the least recently used item(s) takes place, until the weight is again >>> below the maximum. >>> >>> cache >>> computeWeight: #sizeInMemory; >>> maximumWeight: 16*1024. >>> >>> Will keep the cache below 16Kb in real memory usage. This can be very >>> useful when caching various sized images or MC packages, for example. >>> >>> By default, no concurrent access protection takes place, but optionally a >>> semaphore for mutual exclusion can be used. This slows down access. >>> >>> cache useSemaphore. >>> >>> NeoTTLCache extends NeoLRUCache by maintaining a timestamp for each cached >>> value. Upon cache hit, there is a check to see if this timestamp is not >>> older than the allowed time to live. If so, the value is stale and will be >>> recomputed. >>> >>> (cache := NeoTTLCache new) >>> timeToLive: 10 minutes; >>> factory: [ :key | ZnEasy get: key ]; >>> maximumWeight: 32*1024; >>> computeWeight: #contentLength. >>> >>> cache at: ‘http://zn.stfx.eu/zn/numbers.txt'. >>> >>> ==> a ZnResponse(200 OK text/plain;charset=utf-8 71B) >>> >>> cache >>> >>> ==> a NeoTTLCache(#1 71/32768 #contentLength [ :key | ZnEasy get: key ] 0% >>> 0:00:10:00) >>> >>> Would be a simple HTTP cache, keeping resolved URLs in memory for 10 >>> minutes maximum before refreshing them. It uses #contentLength on >>> ZnResponse to compute the weight. You can see from the print string that >>> there is now 1 entry, while the weight is 17 bytes out of 32768. >>> >>> >>> These are the main points, please have a look at the code. Feedback is >>> welcome. >>> >>> Regards, >>> >>> Sven >>> >>> >> >> -- Nicolas Petton http://nicolas-petton.fr
