For persistant storage, just ignore the TTL and throw away the segment with the oldest object, refreshed or not.

I am of the opinion that if a method exists to verify the object, LM or Etag, we shouldn't ever expire it. The ttl is just a setting for when we should refresh it. Of course, standard LRU should still apply.

I am also less worried about the reader/writer scenario for the headers, since by spec you shouldnt' update any headers that aren't Expires/Cache-Control (and weirdly enough, Vary)

Artur

On Sep 27, 2010, at 6:50 AM, Nils Goroll wrote:

Hi,

I'd like to add a brief update to the following section summarizing my
understanding after talking to phk today, who seems to be really busy and
probably will not find time to respond before the weekend:

To allow multiple cache objects to share body data, we want to add
reference counters to struct storage following the example of the
existing implementation for objects (HSH_Ref(), HSH_Unref() etc).

Though I still believe this should be pretty straight forward for all other storages, it won't be for -spersistent. After studying the code for an hour or
so, my understanding is the following:

Persistent storage segments the cache (see
http://www.varnish-cache.org/trac/wiki/ ArchitecturePersistentStorage) and won't re-use segments for new objects unless they are completely empty (no live objects). Right now, this relies on the LRU and TTL based expiry to eventually clean out segments before running out of space. Having multiple refs to the same obj in persistent storage (and updating it again and again) would effectively
lead to more and more segments being kept from becoming empty.

I believe what is really needed is additional space management for the
persistent storage. In a first step, when running short of storage, objects could get nuked from the smallest segment. In a second step, the mechanics to copy live objects from one segment to another could be implemented. Ideally, this could be vcl controlled ("should we rather nuke the object or bother copying it?"). But I see some complications for both, mainly that storage would need to know which objects are referencing it in order to update those (sounds
wrong).

As long as we don't have any of this, I suggest two alternative temporary solutions:

a) If an object getting refreshed lives in persistent storage, we'll simply copy it. Actually, the existing Rackspace implementation does this. This is far from optimal, but won't make much of a difference for small objects and is still much more efficient than re-fetching the object from backend like today, so we
shouldn't see any performance regression.

For other stevedores, we'll use the reference counter.

b) Add reference counters to persistent storage, too, and simply live with the cache fragmentation issue. Those using persistent storage would be advised not
to use cache refresh.

At this point, I'd favor a).


Please note that all of this is my personal understanding. I am posting these thoughts in the hope that my understanding is correct and I'd really appreciate
corrections if it's not.

Thank you, Nils

_______________________________________________
varnish-dev mailing list
[email protected]
http://lists.varnish-cache.org/mailman/listinfo/varnish-dev


_______________________________________________
varnish-dev mailing list
[email protected]
http://lists.varnish-cache.org/mailman/listinfo/varnish-dev

Reply via email to