On 25.01.2012 08:21, Alex Rousskov wrote:
On 01/24/2012 01:51 AM, Amos Jeffries wrote:
On 24/01/2012 6:16 p.m., Alex Rousskov wrote:
On 01/23/2012 07:24 PM, Amos Jeffries wrote:
Admin can already configure
several cache_dir and "cache_mem 0". The problem is just that todays
Squid do some horrible things when configured that way as side
effects
of our current design assumption.
The assumption is no longer there, but there is still code that uses
it.
This is one of the reasons why I consider finishing deleting
store_table
very important.
Sure.
3) possibly multiple cache_mem. A traditional non-shared
cache_mem, a
shared memory space, and an in-transit unstructured space.
In-transit space is not a cache so we should not mix it and
cache_mem in
an "ideal design" blueprint. Collapsed forwarding requires caching
and
has to go through cache_mem, not in-transit space.
So proposals for collapsables which are too large for cache_mem?
If I understand what collapsable requests are,
Sorry. Any request which might be shared/collapsed into another. Along
the same line of terminology broadness as "cacheable".
Not necessarily done, but possible.
cache admission,
allocation, or eviction policy can treat collapsables specially. This
is
a lower-level issue, IMO, than the blueprint you are discussing
though.
or when "cache_mem 0"?
Collapsables can go through disk. If disk caching is also disabled,
we
cannot collapse requests because the admin told us not to cache at
all.
If the issue is important for some, it would be possible to implement
a
caching policy that reserves space for collapsables and caches
nothing
else, of course.
It's becoming clear that collapsing needs to run primarily out of the
in-transit space. If request B happens and earlier request A is
collapsable, in-transit, and the response intro bytes still available
the two might be collapsed. Otherwise we have to skip collapsing and
find those initial bytes by current means as a separate request. Cache
HIT/MISS/IMS source of the response data being irrelevant to all this
comparison.
So. Dropping this side discussion for now. We can pick it up again when
planning to add collapsed forwarding.
Please keep in mind that any non-shared cache would
violate HTTP in an SMP case.
You have yet to convince me that the behavious *is* a violation. Yes
the
objects coming back are not identical to the pattern of a
traditional
Squid. But the new pattern is still within HTTP semantics IMO, in
the
same way that two proxies on anycast dont violate HTTP. The cases
presented so far have been about side effects of already bad
behaviour
getting worse, or bad testing assumptions.
If "MUST purge" is just a MAY because things will usually work even
if
the cache does not purge, then this is a question for HTTP WG because
we
would not be able to reach an agreement on this issue here; IMHO, a
burden of proof that ignoring MUST is OK should be on those who want
to
ignore it. I do not see a point in those MUSTs if we can violate them
at
will by deploying a proxy with two out-of-sync caches.
Agreed on the burden of proof.
It comes down to viewpoint and reality of distributed pathways. Like
Henrik said this is a micro-scale replica of the Internet-wide
distributed cache.
I see the "MUST purge" as applying to the *pathway*/chain which is
handling the request or reply triggering that MUST. Pathways which never
receive that request/reply CANNOT obey it implicitly.
Place two separate caches behind one domain name on round-robin IP
selection and the same problem you point at occurs.
The difference viewpoint of syncing vertically (to the origin), versus
horizontally (to sibling caches). All my reading of the spec leads me to
conclude that the mandatory syncing is vertical rather than horizontal
by design.
Yes, we should take reasonable measures to propigate the purge info
around the known caches and make them all behave the same on future
requests. But cannot really call it a true violation when that is
impossible to do so.
We also need to enumerate how many of these cases are specifically
"MUST purge" versus "MUST update". The update case is a lot more lenient
to sync issues than purges are.
For example any response which has must-revalidate or
proxy-revalidate, or any request which has max-age=0 will simply skip
past and IMS auto-syncs the pathway *with origin*. For the other cases
we often have those huge heuristic estimation holes in the spec to walk
through.
NOTE that we do have sideband means of purging when necessary to
actually purge: IPC packets doing HTCP CLR equivalent between the
workers. Or plain old HTCP CLR between unique HTCP port for each worker.
8) cache_dir scan should account for externally added files.
Regardless
of CLEAN/DIRTY algorithm being used.
by this I mean check for and handle (accept or erase)
cache_dir
entries not accounted for by the swap.state or equivalent meta
data.
+ allows reporting what action was taken about the extra files. Be
it
erase or import and any related errors.
I think this should be left to individual Stores. Each may have
their
own way of adding entries. For example, with Rock Store, you can
add
entries even at runtime, but you need to update the shared maps
appropriately.
How does rock recover from a third-party insertion of a record at
the
correct place in the backing DB followed by a shutdown?
erase the slot? overwrite with something later? load the object
details
during restart and use it?
Since there is no swap.state, it is the latter: load the object
details
during restart and use it.
For now it is perfectly possible to inject entries into UFS and COSS
(at
least) provided one knows the storage structure and is willing to
cope
with a DIRTY restart.
For UFS, you need a DIRTY DIRTY restart (no swap.state at all) if you
only add a file. You do not need a DIRTY restart if you also update
swap.state* logs, of course.
For COSS, you may not need a DIRTY restart because, *IIRC*, Squid
COSS
does not use swap.state although it does create it.
For Rock, I think you may safely import files without a restart.
Thank you.
Amos