On 25.01.2012 08:21, Alex Rousskov wrote:
On 01/24/2012 01:51 AM, Amos Jeffries wrote:
On 24/01/2012 6:16 p.m., Alex Rousskov wrote:
On 01/23/2012 07:24 PM, Amos Jeffries wrote:

Admin can already configure
several cache_dir and "cache_mem 0". The problem is just that todays
Squid do some horrible things when configured that way as side effects
of our current design assumption.

The assumption is no longer there, but there is still code that uses it. This is one of the reasons why I consider finishing deleting store_table
very important.

Sure.

3) possibly multiple cache_mem. A traditional non-shared cache_mem, a
shared memory space, and an in-transit unstructured space.
In-transit space is not a cache so we should not mix it and cache_mem in an "ideal design" blueprint. Collapsed forwarding requires caching and
has to go through cache_mem, not in-transit space.

So proposals for collapsables which are too large for cache_mem?

If I understand what collapsable requests are,

Sorry. Any request which might be shared/collapsed into another. Along the same line of terminology broadness as "cacheable".
Not necessarily done, but possible.

cache admission,
allocation, or eviction policy can treat collapsables specially. This is a lower-level issue, IMO, than the blueprint you are discussing though.

or when "cache_mem 0"?

Collapsables can go through disk. If disk caching is also disabled, we cannot collapse requests because the admin told us not to cache at all. If the issue is important for some, it would be possible to implement a caching policy that reserves space for collapsables and caches nothing
else, of course.

It's becoming clear that collapsing needs to run primarily out of the in-transit space. If request B happens and earlier request A is collapsable, in-transit, and the response intro bytes still available the two might be collapsed. Otherwise we have to skip collapsing and find those initial bytes by current means as a separate request. Cache HIT/MISS/IMS source of the response data being irrelevant to all this comparison.

So. Dropping this side discussion for now. We can pick it up again when planning to add collapsed forwarding.


Please keep in mind that any non-shared cache would
violate HTTP in an SMP case.

You have yet to convince me that the behavious *is* a violation. Yes the objects coming back are not identical to the pattern of a traditional Squid. But the new pattern is still within HTTP semantics IMO, in the
same way that two proxies on anycast dont violate HTTP. The cases
presented so far have been about side effects of already bad behaviour
getting worse, or bad testing assumptions.

If "MUST purge" is just a MAY because things will usually work even if the cache does not purge, then this is a question for HTTP WG because we
would not be able to reach an agreement on this issue here; IMHO, a
burden of proof that ignoring MUST is OK should be on those who want to ignore it. I do not see a point in those MUSTs if we can violate them at
will by deploying a proxy with two out-of-sync caches.

Agreed on the burden of proof.

It comes down to viewpoint and reality of distributed pathways. Like Henrik said this is a micro-scale replica of the Internet-wide distributed cache. I see the "MUST purge" as applying to the *pathway*/chain which is handling the request or reply triggering that MUST. Pathways which never receive that request/reply CANNOT obey it implicitly. Place two separate caches behind one domain name on round-robin IP selection and the same problem you point at occurs.

The difference viewpoint of syncing vertically (to the origin), versus horizontally (to sibling caches). All my reading of the spec leads me to conclude that the mandatory syncing is vertical rather than horizontal by design.


Yes, we should take reasonable measures to propigate the purge info around the known caches and make them all behave the same on future requests. But cannot really call it a true violation when that is impossible to do so.

We also need to enumerate how many of these cases are specifically "MUST purge" versus "MUST update". The update case is a lot more lenient to sync issues than purges are. For example any response which has must-revalidate or proxy-revalidate, or any request which has max-age=0 will simply skip past and IMS auto-syncs the pathway *with origin*. For the other cases we often have those huge heuristic estimation holes in the spec to walk through.

NOTE that we do have sideband means of purging when necessary to actually purge: IPC packets doing HTCP CLR equivalent between the workers. Or plain old HTCP CLR between unique HTCP port for each worker.


8) cache_dir scan should account for externally added files. Regardless
of CLEAN/DIRTY algorithm being used.
by this I mean check for and handle (accept or erase) cache_dir entries not accounted for by the swap.state or equivalent meta data. + allows reporting what action was taken about the extra files. Be it
erase or import and any related errors.
I think this should be left to individual Stores. Each may have their own way of adding entries. For example, with Rock Store, you can add
entries even at runtime, but you need to update the shared maps
appropriately.

How does rock recover from a third-party insertion of a record at the
correct place in the backing DB followed by a shutdown?
erase the slot? overwrite with something later? load the object details
during restart and use it?

Since there is no swap.state, it is the latter: load the object details
during restart and use it.

For now it is perfectly possible to inject entries into UFS and COSS (at least) provided one knows the storage structure and is willing to cope
with a DIRTY restart.

For UFS, you need a DIRTY DIRTY restart (no swap.state at all) if you
only add a file. You do not need a DIRTY restart if you also update
swap.state* logs, of course.

For COSS, you may not need a DIRTY restart because, *IIRC*, Squid COSS
does not use swap.state although it does create it.

For Rock, I think you may safely import files without a restart.


Thank you.


Amos

Reply via email to