On 10 Dec 2013, at 23:09, Stéphane Ducasse <[email protected]> wrote:
> Sven thanks a lot > How can we resist to push this in Pharo 30 :) That is the idea, and I will do it, but I would first like some feedback/validation. > Stef > > On Dec 10, 2013, at 4:54 PM, Sven Van Caekenberghe <[email protected]> wrote: > >> Hi, >> >> Caching is an important technique to trade time (speed of execution) for >> space (memory consumption) in computation. Used correctly, caching can >> significantly improve an application’s performance. >> >> Given the key/value nature of a cache, a Dictionary is often the first data >> structure used for implementing it. This is however not a good idea since a >> simple Dictionary is unbounded by definition and can/will lead to explosive >> cache growth. >> >> Quick and dirty hacks to turn a Dictionary into a least-recently-used (LRU) >> and/or time-to-live (TTL) based limited cache are often incorrect and >> inefficient. >> >> The LRUCache implementation currently in Pharo is too simple and not >> efficient, being O(n) on its main operation. >> >> Pharo deserves and needs a good LRU/TTL cache implementation that can/should >> be used uniformly in various places of the system and in code built on top >> of it. That is why I implemented a proposal. >> >> Neo-Caching is an implementation of both a good LRU cache as well as a TTL >> extension of it. The caches are easy to use, yet have a number of >> interesting features. The implementation is properly O(1) on its main >> operations and is 2 to 3 times faster than LRUCache. The package also >> contains a double linked list implementation. The code comes with a full set >> of unit tests. There are class and method comments and the code is easy to >> read. >> >> The code (which has no dependencies) can be found here >> >> http://mc.stfx.eu/Neo >> http://www.smalltalkhub.com/mc/SvenVanCaekenberghe/Neo/main >> >> There is only one package to load. The code has been written in Pharo 3.0 >> but it should run on older versions as well. >> >> >> Code >> >> >> Let’s create a 16K ordered collection of keys that have some repetitions in >> them with a reasonable distribution: >> >> data := Array streamContents: [ :out | >> 1 to: 4096 do: [ :each | >> each primeFactorsOn: out. >> out nextPut: each ] ]. >> data := data collect: [ :each | each asWords ]. >> >> By using #asWords the keys have better hash properties. Now, we put all this >> in a cache: >> >> cache := NeoLRUCache new. >> cache maximumWeight: 512. >> cache factory: [ :key | key ]. >> data do: [ :each | cache at: each ]. >> cache. >> >> ==> a NeoLRUCache(#512 512/512 [ 1 ] [ :key | key ] 72%) >> >> We had a 72% hit ratio, which is good. The cache is of course full, 512/512 >> with 512 entries (not necessarily the same thing, see further). >> >> We can now benchmark this: >> >> [ data do: [ :each | cache at: each ] ] bench. >> >> ==> '35.8 per second.’ >> >> And compare it to LRUCache: >> >> cache := LRUCache size: 512 factory: [ :key | key ]. >> >> [ data do: [ :each | cache at: each ] ] bench. >> >> ==> '12.6 per second.’ >> >> >> Features >> >> >> Instead of just counting the number of entries, which is the default >> behaviour, the concept of weight is used to determine when a cache is full. >> For each cached value, a block or selector is used to compute its weight. >> When adding a new entry causes the weight to exceed a maximum, eviction of >> the least recently used item(s) takes place, until the weight is again below >> the maximum. >> >> cache >> computeWeight: #sizeInMemory; >> maximumWeight: 16*1024. >> >> Will keep the cache below 16Kb in real memory usage. This can be very useful >> when caching various sized images or MC packages, for example. >> >> By default, no concurrent access protection takes place, but optionally a >> semaphore for mutual exclusion can be used. This slows down access. >> >> cache useSemaphore. >> >> NeoTTLCache extends NeoLRUCache by maintaining a timestamp for each cached >> value. Upon cache hit, there is a check to see if this timestamp is not >> older than the allowed time to live. If so, the value is stale and will be >> recomputed. >> >> (cache := NeoTTLCache new) >> timeToLive: 10 minutes; >> factory: [ :key | ZnEasy get: key ]; >> maximumWeight: 32*1024; >> computeWeight: #contentLength. >> >> cache at: ‘http://zn.stfx.eu/zn/numbers.txt'. >> >> ==> a ZnResponse(200 OK text/plain;charset=utf-8 71B) >> >> cache >> >> ==> a NeoTTLCache(#1 71/32768 #contentLength [ :key | ZnEasy get: key ] 0% >> 0:00:10:00) >> >> Would be a simple HTTP cache, keeping resolved URLs in memory for 10 minutes >> maximum before refreshing them. It uses #contentLength on ZnResponse to >> compute the weight. You can see from the print string that there is now 1 >> entry, while the weight is 17 bytes out of 32768. >> >> >> These are the main points, please have a look at the code. Feedback is >> welcome. >> >> Regards, >> >> Sven >> >> > >
