Re: [RFC] cache architecture

Alex Rousskov Wed, 25 Jan 2012 20:47:35 -0800

On 01/25/2012 06:53 PM, Amos Jeffries wrote:

>> We created workers as an internal performance optimization that has
>> nothing to do with HTTP. It is our responsibility to make sure that
>> optimization stays internal. If caches are not synchronized, the
>> optimization may negatively affect external HTTP agents.
> 
> I see you arguing that IPC messages about purges is a requirement we
> imposed on ourselves. I agree, and focus on IPC so that admin who
> disable ICP/HTCP/PURGE are not causing problems.


I am not talking about any specific technology to enforce cache
synchronization, just the assertion that either the internal caches are
synchronized or they are violating HTTP.


> I see no evidence that sharing an IP is any more (or less) of a
> violation than each worker having a unique IP and same FQDN. We haven't
> gone around claiming that sibling relationships or popular CDN
> hierarchies are all violating HTTP, though they hit sync problems too.

If they have sync problems, they may violate HTTP. I am just doing my
best trying to stay focused on the [local] cache architecture topic; I
do not want to get into discussion about distributed hierarchies.


>> Again, if HTTP has no text defining when two cooperating caches must be
>> in sync, then it would be difficult to decide which interpretation of
>> the HTTP spirit is "correct".
> 
> The new wording for HTTPbis part 6 draft -18 section 2.5 about
> PUT/POST/DELETE/unknown explicitly clarifies the spirit with:
> "
>    Note that this does not guarantee that all appropriate responses are
>    invalidated.  For example, the request that caused the change at the
>    origin server might not have gone through the cache where a response
>    is stored.
> "

IMO, this just warns the implementer that the network complications
(policy routing, load balancing, cache hierarchies, etc) may cause HTTP
violations. It does not permit those violations any more than a warning
of a possible DDoS attack makes that attack benign. It just says "this
MUST/SHOULD cannot guarantee anything because it applies to the caches
that received the request and not to the other caches that did not; your
next request may go through those other caches".


> section 2.2 on what responses can be served uses the wording
> "
>    Also, note that unsafe requests might invalidate already stored
>    responses; see Section 2.5.
> "
> *might* invalidate.

I think this just means that while an unsafe request MUST invalidate the
corresponding stored response, there may be no such corresponding
responses stored.


> Two giant loopholes to walk through. Invalidation MUST is a best-effort
> benefit for a hierarchy, not a guarantee of removal.

Clearly, we interpret the same specs differently. You see loopholes
negating explicit MUSTs. I see a non-normative explanation why somebody
cannot rely on the request path staying the same and a non-normative
reference to a MUST.


> With Squid SMP mode design being an entire hierarchy inside one box we
> have to adjust our viewpoint to that of hierarchy compliance. The
> workers are as compliant as ever individually. We have raised awareness
> of the hierarchy level interaction problems and need to fix it above and
> beyond the specs. They in word and spirit focus on requirements of
> individual cache instances, not distributed or hierarchy.
> 
> What we do by fixing the problem is improve the
> friendliness/predictability and usefulness of Squid responses. Not the
> compliance level.

I understand your point of view, but I have not seen anything in HTTP
that supports a definition of compliance for a hierarchy of caches (with
a single entry point) that differs from compliance of a single cache. I
have not read the entire HTTPbis yet so it is possible that I will find it.

However, since we seem to agree that worker caches must be in sync, we
can build the implementation on that consensus!

It may be a good idea to polish HTTPbis if it indeed allows both yours
and mine points of view to coexist even though they contradict each other.


Thank you,

Alex.

Re: [RFC] cache architecture

Reply via email to