This is just a discussion at present for a checkup and possibly long-term re-design of the overall Architecture for store logics. So the list of SHOULD DO etc will contain things Squid already does.

This post is prompted by http://bugs.squid-cache.org/show_bug.cgi?id=3441 and other ongoing hints about user frustrations on the help lists and elsewhere.

Getting to the chase;

Squids existing methods of startup cache loading and error recovery are slow with side-effects impacting bandwidth and end-user experience in various annoying ways. The swap.state mechanism speeds loading up enormously as compared to the DIRTY scan, but in some cases is still too slow.


Ideal Architecture;

Squid starts with assumption of not caching. Cache spaces are loaded as soon as possible with priority to the faster types. But loaded asynchronously to the startup in a plug-n-play design.

1) Requests are able to be processed at all times, but storage ability will vary independent of Squid operational status.
+ minimal downtime to first request accepted and responded
- lost all or some caching benefits at times

2) cache_mem shall be enabled by default and first amongst all caches
+ reduces the bandwidth impact from (1) if it happens before first request + could also be setup async while Squid is already operating (pro from (1) while minimising the con)

3) possibly multiple cache_mem. A traditional non-shared cache_mem, a shared memory space, and an in-transit unstructured space. + non-shared cache_mem allows larger objects than possible with the shared memory. + separate in-transit area allows collapsed forwarding to occur for incomplete but cacheable objects note that private and otherwise non-shareable in-transit objects are a separate thing not mentioned here. - maybe complex to implement and long-term plans to allow paging mem_node pieces of large files should obsolete the shared/non-shared split.

4) config load/reload at some point enables a cache_dir
+ being async means we are not delaying first response waiting for potentially long slow disk processed to complete
- creates a high MISS ratio during the wait for these to be available
- adds CPU and async event queue load on top of active traffic loads, possibly slowing both traffic and cache availability

5) cache_dir maintains distinct (read,add,delete) states for itself
+ this allows read-only (1,0,0) caches, read-and-retain (1,1,0) caches
+ also allows old storage areas to be gracefully deprecated using (1,0,1) with object count decrease visibly reporting the progress of migration.

6) cache_dir structure maintains a "current" and a "max" available fileno setting. current always starting at 0 and being up to max. max being at whatever swap.state, a hard-coded value or appropriate source tells Squid it should be. + allows scans to start with caches set to full access, but limit the area of access to a range of already scanned fileno between 0 and current. + allows any number of scan algorithms beyond CLEAN/DIRTY and while minimising user visible impact.
+ allows algorithms to be switched while processing
+ allows growing or shrinking cache spaces in real-time

7) cache_dir scan must account for corruption of both individual files, the index entries, and any meta data construct like swap.state

8) cache_dir scan should account for externally added files. Regardless of CLEAN/DIRTY algorithm being used. by this I mean check for and handle (accept or erase) cache_dir entries not accounted for by the swap.state or equivalent meta data. + allows reporting what action was taken about the extra files. Be it erase or import and any related errors.


Anything else?

Amos

Reply via email to