On Mon, 15 Aug 2011 23:17:55 +0200, Henrik Nordström wrote:
mån 2011-08-15 klockan 09:50 -0600 skrev Alex Rousskov:
I do not like aborted retrievals as the default method of handling a
digest-based hit. Aborted transactions have negative side-effects
and
some of those effects are not controlled by Squid (e.g., monitoring
software may trigger an alert if too many requests are aborted).
I agree that we can switch from entities to instances, provided we
are
OK with excluding 206, 302, and similar non-200 responses from the
optimization. By instance definition, Squid would not be able to
compute
or use an instance digest if the response is not 200 OK. We can hope
that the vast majority of non-200 responses are either not cachable
or
are very small and not worth optimizing.
The bulk bandwidth where you would find duplicates is in positive GET
responses.
Not being able to support 206 duplicate detection without caching the
full 200 in the "topmost" cache is a little annoying however.
> In requests you can optionally add an digest based condition
similar to
> If-None-Match but here If-None-Match already serves the purpose
quite
> well, so use of the digest condition should probably be limited to
cases
> where there is no ETag.
Or to cases where ETag lies about response content changes.
True, but I kind of doubt there is much bandwidth to be found in
those
cases.
> To optimize bandwidth loss due to unneeded transmission a slow
start
> mechanism can be used where the sending part waits a couple RTTs
before
> starting to transmit the body of a large response where an
instance
> digest is presented. This allows the receiving end to check the
received
> instance digest and abort the request if not interested in
receiving the
> body.
Besides my general dislike for aborted transactions becoming a norm
(see
above), "a couple RTT" delay is a high price to pay because each RTT
is
a few seconds already.
Seconds? What kind of network is this?
Satellite (long distance), submarine radio (long wave, low bitrate), or
ad-hoc ground relay (multiple long distance IP hops).
The RTT details on latter two are mostly classified. But GEO-sync
satellites are publicly documented. A single ground-satellite-ground
loop can have close to 1sec RTT at the IP level. With complications such
as triangular routing with a ground-ground uplink that only gets worse.
Also satellites with routers aboard were due to go up sometime over the
last year, so they might also get ground-satellite-satellite-ground
loops now. I'm not sure what the real numbers are there, but the early
days there were things like 2-3 seconds RTT discussed. Mostly due to
low-power requirements, send/receive context switching (!!), or buffer
bloat on queuing to cope with bitrates. So reasonable to expect there
are at least some with really crap performance.
Amos