[Pharo-dev] [ANN] Neo-Caching

Sven Van Caekenberghe Tue, 10 Dec 2013 07:55:10 -0800

Hi,

Caching is an important technique to trade time (speed of execution) for space 
(memory consumption) in computation. Used correctly, caching can significantly 
improve an application’s performance.


Given the key/value nature of a cache, a Dictionary is often the first data 
structure used for implementing it. This is however not a good idea since a 
simple Dictionary is unbounded by definition and can/will lead to explosive 
cache growth.

Quick and dirty hacks to turn a Dictionary into a least-recently-used (LRU) 
and/or time-to-live (TTL) based limited cache are often incorrect and 
inefficient.

The LRUCache implementation currently in Pharo is too simple and not efficient, 
being O(n) on its main operation.

Pharo deserves and needs a good LRU/TTL cache implementation that can/should be 
used uniformly in various places of the system and in code built on top of it. 
That is why I implemented a proposal.

Neo-Caching is an implementation of both a good LRU cache as well as a TTL 
extension of it. The caches are easy to use, yet have a number of interesting 
features. The implementation is properly O(1) on its main operations and is 2 
to 3 times faster than LRUCache. The package also contains a double linked list 
implementation. The code comes with a full set of unit tests. There are class 
and method comments and the code is easy to read.

The code (which has no dependencies) can be found here

  http://mc.stfx.eu/Neo
  http://www.smalltalkhub.com/mc/SvenVanCaekenberghe/Neo/main

There is only one package to load. The code has been written in Pharo 3.0 but 
it should run on older versions as well.


Code


Let’s create a 16K ordered collection of keys that have some repetitions in 
them with a reasonable distribution:

data := Array streamContents: [ :out |
  1 to: 4096 do: [ :each |
    each primeFactorsOn: out.
    out nextPut: each ] ].
data := data collect: [ :each | each asWords ].

By using #asWords the keys have better hash properties. Now, we put all this in 
a cache:

cache := NeoLRUCache new.
cache maximumWeight: 512.
cache factory: [ :key | key ].
data do: [ :each | cache at: each ].
cache.

==> a NeoLRUCache(#512 512/512 [ 1 ] [ :key | key ] 72%)

We had a 72% hit ratio, which is good. The cache is of course full, 512/512 
with 512 entries (not necessarily the same thing, see further).

We can now benchmark this:

[ data do: [ :each | cache at: each ] ] bench.

==> '35.8 per second.’

And compare it to LRUCache:

cache := LRUCache size: 512 factory: [ :key | key ].

[ data do: [ :each | cache at: each ] ] bench.

==> '12.6 per second.’


Features


Instead of just counting the number of entries, which is the default behaviour, 
the concept of weight is used to determine when a cache is full. For each 
cached value, a block or selector is used to compute its weight. When adding a 
new entry causes the weight to exceed a maximum, eviction of the least recently 
used item(s) takes place, until the weight is again below the maximum.

cache 
  computeWeight: #sizeInMemory;
  maximumWeight: 16*1024.

Will keep the cache below 16Kb in real memory usage. This can be very useful 
when caching various sized images or MC packages, for example. 

By default, no concurrent access protection takes place, but optionally a 
semaphore for mutual exclusion can be used. This slows down access.

cache useSemaphore.

NeoTTLCache extends NeoLRUCache by maintaining a timestamp for each cached 
value. Upon cache hit, there is a check to see if this timestamp is not older 
than the allowed time to live. If so, the value is stale and will be recomputed.

(cache := NeoTTLCache new)
  timeToLive: 10 minutes;
  factory: [ :key | ZnEasy get: key ];
  maximumWeight: 32*1024;
  computeWeight: #contentLength.

cache at: ‘http://zn.stfx.eu/zn/numbers.txt'.

==> a ZnResponse(200 OK text/plain;charset=utf-8 71B)

cache

==> a NeoTTLCache(#1 71/32768 #contentLength [ :key | ZnEasy get: key ] 0% 
0:00:10:00)

Would be a simple HTTP cache, keeping resolved URLs in memory for 10 minutes 
maximum before refreshing them. It uses #contentLength on ZnResponse to compute 
the weight. You can see from the print string that there is now 1 entry, while 
the weight is 17 bytes out of 32768.


These are the main points, please have a look at the code. Feedback is welcome.

Regards,

Sven

[Pharo-dev] [ANN] Neo-Caching

Reply via email to