[RFC] cache architecture

Amos Jeffries Mon, 23 Jan 2012 18:24:19 -0800

This is just a discussion at present for a checkup and possiblylong-term re-design of the overall Architecture for store logics. So thelist of SHOULD DO etc will contain things Squid already does.

This post is prompted byhttp://bugs.squid-cache.org/show_bug.cgi?id=3441 and other ongoing hintsabout user frustrations on the help lists and elsewhere.


Getting to the chase;

Squids existing methods of startup cache loading and error recoveryare slow with side-effects impacting bandwidth and end-user experiencein various annoying ways. The swap.state mechanism speeds loading upenormously as compared to the DIRTY scan, but in some cases is still tooslow.



Ideal Architecture;

Squid starts with assumption of not caching. Cache spaces are loaded assoon as possible with priority to the faster types. But loadedasynchronously to the startup in a plug-n-play design.

1) Requests are able to be processed at all times, but storage abilitywill vary independent of Squid operational status.

+ minimal downtime to first request accepted and responded
- lost all or some caching benefits at times

2) cache_mem shall be enabled by default and first amongst all caches

+ reduces the bandwidth impact from (1) if it happens before firstrequest+ could also be setup async while Squid is already operating (pro from(1) while minimising the con)

3) possibly multiple cache_mem. A traditional non-shared cache_mem, ashared memory space, and an in-transit unstructured space.+ non-shared cache_mem allows larger objects than possible with theshared memory.+ separate in-transit area allows collapsed forwarding to occur forincomplete but cacheable objectsnote that private and otherwise non-shareable in-transit objects area separate thing not mentioned here.- maybe complex to implement and long-term plans to allow pagingmem_node pieces of large files should obsolete the shared/non-sharedsplit.


4) config load/reload at some point enables a cache_dir

+ being async means we are not delaying first response waiting forpotentially long slow disk processed to complete

- creates a high MISS ratio during the wait for these to be available

- adds CPU and async event queue load on top of active traffic loads,possibly slowing both traffic and cache availability


5) cache_dir maintains distinct (read,add,delete) states for itself
+ this allows read-only (1,0,0) caches, read-and-retain (1,1,0) caches

+ also allows old storage areas to be gracefully deprecated using(1,0,1) with object count decrease visibly reporting the progress ofmigration.

6) cache_dir structure maintains a "current" and a "max" availablefileno setting.current always starting at 0 and being up to max. max being atwhatever swap.state, a hard-coded value or appropriate source tellsSquid it should be.+ allows scans to start with caches set to full access, but limit thearea of access to a range of already scanned fileno between 0 andcurrent.+ allows any number of scan algorithms beyond CLEAN/DIRTY and whileminimising user visible impact.

+ allows algorithms to be switched while processing
+ allows growing or shrinking cache spaces in real-time

7) cache_dir scan must account for corruption of both individual files,the index entries, and any meta data construct like swap.state

8) cache_dir scan should account for externally added files. Regardlessof CLEAN/DIRTY algorithm being used.by this I mean check for and handle (accept or erase) cache_direntries not accounted for by the swap.state or equivalent meta data.+ allows reporting what action was taken about the extra files. Be iterase or import and any related errors.



Anything else?

Amos

[RFC] cache architecture

Reply via email to