Re: [Pharo-dev] [ANN] Neo-Caching

Tudor Girba Tue, 10 Dec 2013 21:30:59 -0800

Great!

I took a look at the code. It's simple and clean. Thanks a lot for this
addition. It is highly needed.


Cheers,
Doru



On Tue, Dec 10, 2013 at 11:58 PM, Sven Van Caekenberghe <[email protected]>wrote:

>
> On 10 Dec 2013, at 23:09, Stéphane Ducasse <[email protected]>
> wrote:
>
> > Sven thanks a lot
> > How can we resist to push this in Pharo 30 :)
>
> That is the idea, and I will do it, but I would first like some
> feedback/validation.
>
> > Stef
> >
> > On Dec 10, 2013, at 4:54 PM, Sven Van Caekenberghe <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> Caching is an important technique to trade time (speed of execution)
> for space (memory consumption) in computation. Used correctly, caching can
> significantly improve an application’s performance.
> >>
> >> Given the key/value nature of a cache, a Dictionary is often the first
> data structure used for implementing it. This is however not a good idea
> since a simple Dictionary is unbounded by definition and can/will lead to
> explosive cache growth.
> >>
> >> Quick and dirty hacks to turn a Dictionary into a least-recently-used
> (LRU) and/or time-to-live (TTL) based limited cache are often incorrect and
> inefficient.
> >>
> >> The LRUCache implementation currently in Pharo is too simple and not
> efficient, being O(n) on its main operation.
> >>
> >> Pharo deserves and needs a good LRU/TTL cache implementation that
> can/should be used uniformly in various places of the system and in code
> built on top of it. That is why I implemented a proposal.
> >>
> >> Neo-Caching is an implementation of both a good LRU cache as well as a
> TTL extension of it. The caches are easy to use, yet have a number of
> interesting features. The implementation is properly O(1) on its main
> operations and is 2 to 3 times faster than LRUCache. The package also
> contains a double linked list implementation. The code comes with a full
> set of unit tests. There are class and method comments and the code is easy
> to read.
> >>
> >> The code (which has no dependencies) can be found here
> >>
> >> http://mc.stfx.eu/Neo
> >> http://www.smalltalkhub.com/mc/SvenVanCaekenberghe/Neo/main
> >>
> >> There is only one package to load. The code has been written in Pharo
> 3.0 but it should run on older versions as well.
> >>
> >>
> >> Code
> >>
> >>
> >> Let’s create a 16K ordered collection of keys that have some
> repetitions in them with a reasonable distribution:
> >>
> >> data := Array streamContents: [ :out |
> >> 1 to: 4096 do: [ :each |
> >>   each primeFactorsOn: out.
> >>   out nextPut: each ] ].
> >> data := data collect: [ :each | each asWords ].
> >>
> >> By using #asWords the keys have better hash properties. Now, we put all
> this in a cache:
> >>
> >> cache := NeoLRUCache new.
> >> cache maximumWeight: 512.
> >> cache factory: [ :key | key ].
> >> data do: [ :each | cache at: each ].
> >> cache.
> >>
> >> ==> a NeoLRUCache(#512 512/512 [ 1 ] [ :key | key ] 72%)
> >>
> >> We had a 72% hit ratio, which is good. The cache is of course full,
> 512/512 with 512 entries (not necessarily the same thing, see further).
> >>
> >> We can now benchmark this:
> >>
> >> [ data do: [ :each | cache at: each ] ] bench.
> >>
> >> ==> '35.8 per second.’
> >>
> >> And compare it to LRUCache:
> >>
> >> cache := LRUCache size: 512 factory: [ :key | key ].
> >>
> >> [ data do: [ :each | cache at: each ] ] bench.
> >>
> >> ==> '12.6 per second.’
> >>
> >>
> >> Features
> >>
> >>
> >> Instead of just counting the number of entries, which is the default
> behaviour, the concept of weight is used to determine when a cache is full.
> For each cached value, a block or selector is used to compute its weight.
> When adding a new entry causes the weight to exceed a maximum, eviction of
> the least recently used item(s) takes place, until the weight is again
> below the maximum.
> >>
> >> cache
> >> computeWeight: #sizeInMemory;
> >> maximumWeight: 16*1024.
> >>
> >> Will keep the cache below 16Kb in real memory usage. This can be very
> useful when caching various sized images or MC packages, for example.
> >>
> >> By default, no concurrent access protection takes place, but optionally
> a semaphore for mutual exclusion can be used. This slows down access.
> >>
> >> cache useSemaphore.
> >>
> >> NeoTTLCache extends NeoLRUCache by maintaining a timestamp for each
> cached value. Upon cache hit, there is a check to see if this timestamp is
> not older than the allowed time to live. If so, the value is stale and will
> be recomputed.
> >>
> >> (cache := NeoTTLCache new)
> >> timeToLive: 10 minutes;
> >> factory: [ :key | ZnEasy get: key ];
> >> maximumWeight: 32*1024;
> >> computeWeight: #contentLength.
> >>
> >> cache at: ‘http://zn.stfx.eu/zn/numbers.txt'.
> >>
> >> ==> a ZnResponse(200 OK text/plain;charset=utf-8 71B)
> >>
> >> cache
> >>
> >> ==> a NeoTTLCache(#1 71/32768 #contentLength [ :key | ZnEasy get: key ]
> 0% 0:00:10:00)
> >>
> >> Would be a simple HTTP cache, keeping resolved URLs in memory for 10
> minutes maximum before refreshing them. It uses #contentLength on
> ZnResponse to compute the weight. You can see from the print string that
> there is now 1 entry, while the weight is 17 bytes out of 32768.
> >>
> >>
> >> These are the main points, please have a look at the code. Feedback is
> welcome.
> >>
> >> Regards,
> >>
> >> Sven
> >>
> >>
> >
> >
>
>
>


-- 
www.tudorgirba.com

"Every thing has its own flow"

Re: [Pharo-dev] [ANN] Neo-Caching

Reply via email to