Re: [Pharo-dev] [ANN] Neo-Caching

Nicolas Petton Wed, 11 Dec 2013 02:10:41 -0800

This is awesome! Thanks Sven :)

Nico


Sven Van Caekenberghe writes:

> On 10 Dec 2013, at 23:09, Stéphane Ducasse <[email protected]> wrote:
>
>> Sven thanks a lot
>> How can we resist to push this in Pharo 30 :)
>
> That is the idea, and I will do it, but I would first like some 
> feedback/validation.
>
>> Stef
>> 
>> On Dec 10, 2013, at 4:54 PM, Sven Van Caekenberghe <[email protected]> wrote:
>> 
>>> Hi,
>>> 
>>> Caching is an important technique to trade time (speed of execution) for 
>>> space (memory consumption) in computation. Used correctly, caching can 
>>> significantly improve an application’s performance.
>>> 
>>> Given the key/value nature of a cache, a Dictionary is often the first data 
>>> structure used for implementing it. This is however not a good idea since a 
>>> simple Dictionary is unbounded by definition and can/will lead to explosive 
>>> cache growth.
>>> 
>>> Quick and dirty hacks to turn a Dictionary into a least-recently-used (LRU) 
>>> and/or time-to-live (TTL) based limited cache are often incorrect and 
>>> inefficient.
>>> 
>>> The LRUCache implementation currently in Pharo is too simple and not 
>>> efficient, being O(n) on its main operation.
>>> 
>>> Pharo deserves and needs a good LRU/TTL cache implementation that 
>>> can/should be used uniformly in various places of the system and in code 
>>> built on top of it. That is why I implemented a proposal.
>>> 
>>> Neo-Caching is an implementation of both a good LRU cache as well as a TTL 
>>> extension of it. The caches are easy to use, yet have a number of 
>>> interesting features. The implementation is properly O(1) on its main 
>>> operations and is 2 to 3 times faster than LRUCache. The package also 
>>> contains a double linked list implementation. The code comes with a full 
>>> set of unit tests. There are class and method comments and the code is easy 
>>> to read.
>>> 
>>> The code (which has no dependencies) can be found here
>>> 
>>> http://mc.stfx.eu/Neo
>>> http://www.smalltalkhub.com/mc/SvenVanCaekenberghe/Neo/main
>>> 
>>> There is only one package to load. The code has been written in Pharo 3.0 
>>> but it should run on older versions as well.
>>> 
>>> 
>>> Code
>>> 
>>> 
>>> Let’s create a 16K ordered collection of keys that have some repetitions in 
>>> them with a reasonable distribution:
>>> 
>>> data := Array streamContents: [ :out |
>>> 1 to: 4096 do: [ :each |
>>>   each primeFactorsOn: out.
>>>   out nextPut: each ] ].
>>> data := data collect: [ :each | each asWords ].
>>> 
>>> By using #asWords the keys have better hash properties. Now, we put all 
>>> this in a cache:
>>> 
>>> cache := NeoLRUCache new.
>>> cache maximumWeight: 512.
>>> cache factory: [ :key | key ].
>>> data do: [ :each | cache at: each ].
>>> cache.
>>> 
>>> ==> a NeoLRUCache(#512 512/512 [ 1 ] [ :key | key ] 72%)
>>> 
>>> We had a 72% hit ratio, which is good. The cache is of course full, 512/512 
>>> with 512 entries (not necessarily the same thing, see further).
>>> 
>>> We can now benchmark this:
>>> 
>>> [ data do: [ :each | cache at: each ] ] bench.
>>> 
>>> ==> '35.8 per second.’
>>> 
>>> And compare it to LRUCache:
>>> 
>>> cache := LRUCache size: 512 factory: [ :key | key ].
>>> 
>>> [ data do: [ :each | cache at: each ] ] bench.
>>> 
>>> ==> '12.6 per second.’
>>> 
>>> 
>>> Features
>>> 
>>> 
>>> Instead of just counting the number of entries, which is the default 
>>> behaviour, the concept of weight is used to determine when a cache is full. 
>>> For each cached value, a block or selector is used to compute its weight. 
>>> When adding a new entry causes the weight to exceed a maximum, eviction of 
>>> the least recently used item(s) takes place, until the weight is again 
>>> below the maximum.
>>> 
>>> cache 
>>> computeWeight: #sizeInMemory;
>>> maximumWeight: 16*1024.
>>> 
>>> Will keep the cache below 16Kb in real memory usage. This can be very 
>>> useful when caching various sized images or MC packages, for example. 
>>> 
>>> By default, no concurrent access protection takes place, but optionally a 
>>> semaphore for mutual exclusion can be used. This slows down access.
>>> 
>>> cache useSemaphore.
>>> 
>>> NeoTTLCache extends NeoLRUCache by maintaining a timestamp for each cached 
>>> value. Upon cache hit, there is a check to see if this timestamp is not 
>>> older than the allowed time to live. If so, the value is stale and will be 
>>> recomputed.
>>> 
>>> (cache := NeoTTLCache new)
>>> timeToLive: 10 minutes;
>>> factory: [ :key | ZnEasy get: key ];
>>> maximumWeight: 32*1024;
>>> computeWeight: #contentLength.
>>> 
>>> cache at: ‘http://zn.stfx.eu/zn/numbers.txt'.
>>> 
>>> ==> a ZnResponse(200 OK text/plain;charset=utf-8 71B)
>>> 
>>> cache
>>> 
>>> ==> a NeoTTLCache(#1 71/32768 #contentLength [ :key | ZnEasy get: key ] 0% 
>>> 0:00:10:00)
>>> 
>>> Would be a simple HTTP cache, keeping resolved URLs in memory for 10 
>>> minutes maximum before refreshing them. It uses #contentLength on 
>>> ZnResponse to compute the weight. You can see from the print string that 
>>> there is now 1 entry, while the weight is 17 bytes out of 32768.
>>> 
>>> 
>>> These are the main points, please have a look at the code. Feedback is 
>>> welcome.
>>> 
>>> Regards,
>>> 
>>> Sven
>>> 
>>> 
>> 
>> 


-- 
Nicolas Petton
http://nicolas-petton.fr

Re: [Pharo-dev] [ANN] Neo-Caching

Reply via email to