On 25/07/2013 8:58 p.m., Henrik Nordström wrote:
tor 2013-07-25 klockan 18:53 +1200 skrev Amos Jeffries:

Which problem specifically? that churn exists? that it can grow big +
churn? races between clients? or that letting it out to disk can cause
churn to be slooow?
In the design used by Squid-2 there is quite a bit of churn in the
x-vary object, and it's seen growing quite big in some extreme cases
("Vary: cookie" iirc).

Races between clients have been seen.

Also conflicts between x-vary updates and clients aborting, causing the
new x-vary object to also be discarded making Squid forget the map, but
that's a bug.

Proper handling of cache validations is the main concern.

I have been playing with the idea of locking these into memory cache, or
using a dedicated memory area just for them to avoid the speed issues. A
specialized store for them will also allow us to isolate the
secondary-lookup logic in that stores lookup process - it can identify
the variant and recurse down to other stores for the final selection
using the extra key bits.
What to use as permanent store?

The options were a disk backed mmap, or something like rock store, or nothign at all (regenerate from existing cache scan on every startup).

And you want to store each 304 mapping response separately so a scan can
rebuild the map?

That should only be necessary between swap.state cleanups.


And what about stores not having an index?  IIRC we have as goal to
optionally not have an in-memory cache index at all.

Yes. This is not a completey rounded out idea yet. They would need something else. If they operate like rock and build a new index on load gradually they would still work adjusting the x-vary during that operation.


I believe that they can be generated from a disk scan and if necessary
we can add swap.state TLV entries for the missing x-vary meta details to
be reloaded quickly.
The x-vary meta is not very small. For each request header combination
it's
- request header contents

I don't see why those are necessary. At the x-vary level all that is necessary is the response details to be searched for in the request headers. ie if x-vary says variant has "Content-Encoding:gzip" then search for "Accept-Encoding:gzip" in request headers for a possible match.

- timing details for validation

Only if you are doing validation using the x-vary alone, if we select the variant then go deeper before revalidation we have the full variant responses headers to work with including those.

- which object variant to map to

That would be either the lookup key pattern adjustment/addition or the explicit store+fileno details. The latter being slightly risky, but doable and much faster with less risk when the x-vary is built from a store-dir load scan instead of simply loaded from a long-term cache.

And there is also a map of known object variants and their ETag values
and also Content-Encoding, the latter to work around dynamic gzip
brainfart in many major web servers including Apache.

If you have read the Key header specification this should be clearer. I am thinking the x-vary acts like a list of ETag values (for ETag match), Digest (for Digest match), and Key patterns (for Key/Vary matches) with Vary being a special vague case of Key. It stores a set of variants using *only* the exact details causing that variant to exist in the set, none of the request-header garbage like entire Accept: header or ignored field-values.

Even the Cookie case should be greatly reduced with the use of Key header. Although that will take some time to occur. We can avoid that a bit by hacking an omit for Vary:Cookie sites just like Varnish does, unless they use Key to specify *which* cookie detail to inspect.


That would make them churn particularly badly on
startup, but avoid the necessity to store anywhere long-term, and help
detect obsolete variants undeleted from disk.
The total system churn at startup is already majorly bad with both ufs
and rock stores.. caches are growing quite large today with current disk
& memory prices.

Sig. Yeah. The auto-generated x-vary is a nice dream. It is just an optimization on top of the restructured layout though. The existing 2.7 design of static cached x-vary mostly works, even if it does have a few issues.

Amos

Reply via email to