Great! I took a look at the code. It's simple and clean. Thanks a lot for this addition. It is highly needed.
Cheers, Doru On Tue, Dec 10, 2013 at 11:58 PM, Sven Van Caekenberghe <[email protected]>wrote: > > On 10 Dec 2013, at 23:09, Stéphane Ducasse <[email protected]> > wrote: > > > Sven thanks a lot > > How can we resist to push this in Pharo 30 :) > > That is the idea, and I will do it, but I would first like some > feedback/validation. > > > Stef > > > > On Dec 10, 2013, at 4:54 PM, Sven Van Caekenberghe <[email protected]> wrote: > > > >> Hi, > >> > >> Caching is an important technique to trade time (speed of execution) > for space (memory consumption) in computation. Used correctly, caching can > significantly improve an application’s performance. > >> > >> Given the key/value nature of a cache, a Dictionary is often the first > data structure used for implementing it. This is however not a good idea > since a simple Dictionary is unbounded by definition and can/will lead to > explosive cache growth. > >> > >> Quick and dirty hacks to turn a Dictionary into a least-recently-used > (LRU) and/or time-to-live (TTL) based limited cache are often incorrect and > inefficient. > >> > >> The LRUCache implementation currently in Pharo is too simple and not > efficient, being O(n) on its main operation. > >> > >> Pharo deserves and needs a good LRU/TTL cache implementation that > can/should be used uniformly in various places of the system and in code > built on top of it. That is why I implemented a proposal. > >> > >> Neo-Caching is an implementation of both a good LRU cache as well as a > TTL extension of it. The caches are easy to use, yet have a number of > interesting features. The implementation is properly O(1) on its main > operations and is 2 to 3 times faster than LRUCache. The package also > contains a double linked list implementation. The code comes with a full > set of unit tests. There are class and method comments and the code is easy > to read. > >> > >> The code (which has no dependencies) can be found here > >> > >> http://mc.stfx.eu/Neo > >> http://www.smalltalkhub.com/mc/SvenVanCaekenberghe/Neo/main > >> > >> There is only one package to load. The code has been written in Pharo > 3.0 but it should run on older versions as well. > >> > >> > >> Code > >> > >> > >> Let’s create a 16K ordered collection of keys that have some > repetitions in them with a reasonable distribution: > >> > >> data := Array streamContents: [ :out | > >> 1 to: 4096 do: [ :each | > >> each primeFactorsOn: out. > >> out nextPut: each ] ]. > >> data := data collect: [ :each | each asWords ]. > >> > >> By using #asWords the keys have better hash properties. Now, we put all > this in a cache: > >> > >> cache := NeoLRUCache new. > >> cache maximumWeight: 512. > >> cache factory: [ :key | key ]. > >> data do: [ :each | cache at: each ]. > >> cache. > >> > >> ==> a NeoLRUCache(#512 512/512 [ 1 ] [ :key | key ] 72%) > >> > >> We had a 72% hit ratio, which is good. The cache is of course full, > 512/512 with 512 entries (not necessarily the same thing, see further). > >> > >> We can now benchmark this: > >> > >> [ data do: [ :each | cache at: each ] ] bench. > >> > >> ==> '35.8 per second.’ > >> > >> And compare it to LRUCache: > >> > >> cache := LRUCache size: 512 factory: [ :key | key ]. > >> > >> [ data do: [ :each | cache at: each ] ] bench. > >> > >> ==> '12.6 per second.’ > >> > >> > >> Features > >> > >> > >> Instead of just counting the number of entries, which is the default > behaviour, the concept of weight is used to determine when a cache is full. > For each cached value, a block or selector is used to compute its weight. > When adding a new entry causes the weight to exceed a maximum, eviction of > the least recently used item(s) takes place, until the weight is again > below the maximum. > >> > >> cache > >> computeWeight: #sizeInMemory; > >> maximumWeight: 16*1024. > >> > >> Will keep the cache below 16Kb in real memory usage. This can be very > useful when caching various sized images or MC packages, for example. > >> > >> By default, no concurrent access protection takes place, but optionally > a semaphore for mutual exclusion can be used. This slows down access. > >> > >> cache useSemaphore. > >> > >> NeoTTLCache extends NeoLRUCache by maintaining a timestamp for each > cached value. Upon cache hit, there is a check to see if this timestamp is > not older than the allowed time to live. If so, the value is stale and will > be recomputed. > >> > >> (cache := NeoTTLCache new) > >> timeToLive: 10 minutes; > >> factory: [ :key | ZnEasy get: key ]; > >> maximumWeight: 32*1024; > >> computeWeight: #contentLength. > >> > >> cache at: ‘http://zn.stfx.eu/zn/numbers.txt'. > >> > >> ==> a ZnResponse(200 OK text/plain;charset=utf-8 71B) > >> > >> cache > >> > >> ==> a NeoTTLCache(#1 71/32768 #contentLength [ :key | ZnEasy get: key ] > 0% 0:00:10:00) > >> > >> Would be a simple HTTP cache, keeping resolved URLs in memory for 10 > minutes maximum before refreshing them. It uses #contentLength on > ZnResponse to compute the weight. You can see from the print string that > there is now 1 entry, while the weight is 17 bytes out of 32768. > >> > >> > >> These are the main points, please have a look at the code. Feedback is > welcome. > >> > >> Regards, > >> > >> Sven > >> > >> > > > > > > > -- www.tudorgirba.com "Every thing has its own flow"
