Re: [RFC] cache architecture

Amos Jeffries Tue, 24 Jan 2012 18:04:08 -0800

On 25.01.2012 08:21, Alex Rousskov wrote:

On 01/24/2012 01:51 AM, Amos Jeffries wrote:

On 24/01/2012 6:16 p.m., Alex Rousskov wrote:

On 01/23/2012 07:24 PM, Amos Jeffries wrote:

Admin can already configure
several cache_dir and "cache_mem 0". The problem is just that todays
Squid do some horrible things when configured that way as sideeffects
of our current design assumption.
The assumption is no longer there, but there is still code that usesit.This is one of the reasons why I consider finishing deletingstore_table
very important.


Sure.

3) possibly multiple cache_mem. A traditional non-sharedcache_mem, a
shared memory space, and an in-transit unstructured space.
In-transit space is not a cache so we should not mix it andcache_mem inan "ideal design" blueprint. Collapsed forwarding requires cachingand
has to go through cache_mem, not in-transit space.
So proposals for collapsables which are too large for cache_mem?
If I understand what collapsable requests are,

Sorry. Any request which might be shared/collapsed into another. Alongthe same line of terminology broadness as "cacheable".

Not necessarily done, but possible.

cache admission,
allocation, or eviction policy can treat collapsables specially. Thisisa lower-level issue, IMO, than the blueprint you are discussingthough.
or when "cache_mem 0"?
Collapsables can go through disk. If disk caching is also disabled,wecannot collapse requests because the admin told us not to cache atall.If the issue is important for some, it would be possible to implementacaching policy that reserves space for collapsables and cachesnothing
else, of course.

It's becoming clear that collapsing needs to run primarily out of thein-transit space. If request B happens and earlier request A iscollapsable, in-transit, and the response intro bytes still availablethe two might be collapsed. Otherwise we have to skip collapsing andfind those initial bytes by current means as a separate request. CacheHIT/MISS/IMS source of the response data being irrelevant to all thiscomparison.

So. Dropping this side discussion for now. We can pick it up again whenplanning to add collapsed forwarding.

Please keep in mind that any non-shared cache would
violate HTTP in an SMP case.
You have yet to convince me that the behavious *is* a violation. Yestheobjects coming back are not identical to the pattern of atraditionalSquid. But the new pattern is still within HTTP semantics IMO, inthe
same way that two proxies on anycast dont violate HTTP. The cases
presented so far have been about side effects of already badbehaviour
getting worse, or bad testing assumptions.
If "MUST purge" is just a MAY because things will usually work evenifthe cache does not purge, then this is a question for HTTP WG becausewe
would not be able to reach an agreement on this issue here; IMHO, a
burden of proof that ignoring MUST is OK should be on those who wanttoignore it. I do not see a point in those MUSTs if we can violate themat
will by deploying a proxy with two out-of-sync caches.


Agreed on the burden of proof.

It comes down to viewpoint and reality of distributed pathways. LikeHenrik said this is a micro-scale replica of the Internet-widedistributed cache.I see the "MUST purge" as applying to the *pathway*/chain which ishandling the request or reply triggering that MUST. Pathways which neverreceive that request/reply CANNOT obey it implicitly.Place two separate caches behind one domain name on round-robin IPselection and the same problem you point at occurs.

The difference viewpoint of syncing vertically (to the origin), versushorizontally (to sibling caches). All my reading of the spec leads me toconclude that the mandatory syncing is vertical rather than horizontalby design.

Yes, we should take reasonable measures to propigate the purge infoaround the known caches and make them all behave the same on futurerequests. But cannot really call it a true violation when that isimpossible to do so.

We also need to enumerate how many of these cases are specifically"MUST purge" versus "MUST update". The update case is a lot more lenientto sync issues than purges are.For example any response which has must-revalidate orproxy-revalidate, or any request which has max-age=0 will simply skippast and IMS auto-syncs the pathway *with origin*. For the other caseswe often have those huge heuristic estimation holes in the spec to walkthrough.

NOTE that we do have sideband means of purging when necessary toactually purge: IPC packets doing HTCP CLR equivalent between theworkers. Or plain old HTCP CLR between unique HTCP port for each worker.

8) cache_dir scan should account for externally added files.Regardless
of CLEAN/DIRTY algorithm being used.
by this I mean check for and handle (accept or erase)cache_direntries not accounted for by the swap.state or equivalent metadata.+ allows reporting what action was taken about the extra files. Beit
erase or import and any related errors.
I think this should be left to individual Stores. Each may havetheirown way of adding entries. For example, with Rock Store, you canadd
entries even at runtime, but you need to update the shared maps
appropriately.
How does rock recover from a third-party insertion of a record atthe
correct place in the backing DB followed by a shutdown?
erase the slot? overwrite with something later? load the objectdetails
during restart and use it?
Since there is no swap.state, it is the latter: load the objectdetails
during restart and use it.
For now it is perfectly possible to inject entries into UFS and COSS(atleast) provided one knows the storage structure and is willing tocope
with a DIRTY restart.
For UFS, you need a DIRTY DIRTY restart (no swap.state at all) if you
only add a file. You do not need a DIRTY restart if you also update
swap.state* logs, of course.
For COSS, you may not need a DIRTY restart because, *IIRC*, SquidCOSS
does not use swap.state although it does create it.

For Rock, I think you may safely import files without a restart.


Thank you.


Amos

Re: [RFC] cache architecture

Reply via email to