Re: [Pharo-dev] [ANN] Neo-Caching

Sven Van Caekenberghe Tue, 10 Dec 2013 09:53:57 -0800

On 10 Dec 2013, at 17:48, [email protected] wrote:

> Looks useful indeed. Thx.


Thanks.

> I like the useSemaphore thing.

If you need it, it is necessary. Otherwise it slows things down.

> Question: what does Neo stands for as prefix? I wonder. Makes me think of The 
> Matrix and Red Pills.

Ah, it’s just the name of a namespace, I usually think of it as ‘New’, but yes 
there is a Matrix link also ;-)

> Phil
>  
> 
> 
> 
> On Tue, Dec 10, 2013 at 4:54 PM, Sven Van Caekenberghe <[email protected]> wrote:
> Hi,
> 
> Caching is an important technique to trade time (speed of execution) for 
> space (memory consumption) in computation. Used correctly, caching can 
> significantly improve an application’s performance.
> 
> Given the key/value nature of a cache, a Dictionary is often the first data 
> structure used for implementing it. This is however not a good idea since a 
> simple Dictionary is unbounded by definition and can/will lead to explosive 
> cache growth.
> 
> Quick and dirty hacks to turn a Dictionary into a least-recently-used (LRU) 
> and/or time-to-live (TTL) based limited cache are often incorrect and 
> inefficient.
> 
> The LRUCache implementation currently in Pharo is too simple and not 
> efficient, being O(n) on its main operation.
> 
> Pharo deserves and needs a good LRU/TTL cache implementation that can/should 
> be used uniformly in various places of the system and in code built on top of 
> it. That is why I implemented a proposal.
> 
> Neo-Caching is an implementation of both a good LRU cache as well as a TTL 
> extension of it. The caches are easy to use, yet have a number of interesting 
> features. The implementation is properly O(1) on its main operations and is 2 
> to 3 times faster than LRUCache. The package also contains a double linked 
> list implementation. The code comes with a full set of unit tests. There are 
> class and method comments and the code is easy to read.
> 
> The code (which has no dependencies) can be found here
> 
>   http://mc.stfx.eu/Neo
>   http://www.smalltalkhub.com/mc/SvenVanCaekenberghe/Neo/main
> 
> There is only one package to load. The code has been written in Pharo 3.0 but 
> it should run on older versions as well.
> 
> 
> Code
> 
> 
> Let’s create a 16K ordered collection of keys that have some repetitions in 
> them with a reasonable distribution:
> 
> data := Array streamContents: [ :out |
>   1 to: 4096 do: [ :each |
>     each primeFactorsOn: out.
>     out nextPut: each ] ].
> data := data collect: [ :each | each asWords ].
> 
> By using #asWords the keys have better hash properties. Now, we put all this 
> in a cache:
> 
> cache := NeoLRUCache new.
> cache maximumWeight: 512.
> cache factory: [ :key | key ].
> data do: [ :each | cache at: each ].
> cache.
> 
> ==> a NeoLRUCache(#512 512/512 [ 1 ] [ :key | key ] 72%)
> 
> We had a 72% hit ratio, which is good. The cache is of course full, 512/512 
> with 512 entries (not necessarily the same thing, see further).
> 
> We can now benchmark this:
> 
> [ data do: [ :each | cache at: each ] ] bench.
> 
> ==> '35.8 per second.’
> 
> And compare it to LRUCache:
> 
> cache := LRUCache size: 512 factory: [ :key | key ].
> 
> [ data do: [ :each | cache at: each ] ] bench.
> 
> ==> '12.6 per second.’
> 
> 
> Features
> 
> 
> Instead of just counting the number of entries, which is the default 
> behaviour, the concept of weight is used to determine when a cache is full. 
> For each cached value, a block or selector is used to compute its weight. 
> When adding a new entry causes the weight to exceed a maximum, eviction of 
> the least recently used item(s) takes place, until the weight is again below 
> the maximum.
> 
> cache
>   computeWeight: #sizeInMemory;
>   maximumWeight: 16*1024.
> 
> Will keep the cache below 16Kb in real memory usage. This can be very useful 
> when caching various sized images or MC packages, for example.
> 
> By default, no concurrent access protection takes place, but optionally a 
> semaphore for mutual exclusion can be used. This slows down access.
> 
> cache useSemaphore.
> 
> NeoTTLCache extends NeoLRUCache by maintaining a timestamp for each cached 
> value. Upon cache hit, there is a check to see if this timestamp is not older 
> than the allowed time to live. If so, the value is stale and will be 
> recomputed.
> 
> (cache := NeoTTLCache new)
>   timeToLive: 10 minutes;
>   factory: [ :key | ZnEasy get: key ];
>   maximumWeight: 32*1024;
>   computeWeight: #contentLength.
> 
> cache at: ‘http://zn.stfx.eu/zn/numbers.txt'.
> 
> ==> a ZnResponse(200 OK text/plain;charset=utf-8 71B)
> 
> cache
> 
> ==> a NeoTTLCache(#1 71/32768 #contentLength [ :key | ZnEasy get: key ] 0% 
> 0:00:10:00)
> 
> Would be a simple HTTP cache, keeping resolved URLs in memory for 10 minutes 
> maximum before refreshing them. It uses #contentLength on ZnResponse to 
> compute the weight. You can see from the print string that there is now 1 
> entry, while the weight is 17 bytes out of 32768.
> 
> 
> These are the main points, please have a look at the code. Feedback is 
> welcome.
> 
> Regards,
> 
> Sven
> 
> 
>

Re: [Pharo-dev] [ANN] Neo-Caching

Reply via email to