For persistant storage, just ignore the TTL and throw away the segment
with the oldest object, refreshed or not.
I am of the opinion that if a method exists to verify the object, LM
or Etag, we shouldn't ever expire it. The ttl is just a setting for
when we should refresh it. Of course, standard LRU should still apply.
I am also less worried about the reader/writer scenario for the
headers, since by spec you shouldnt' update any headers that aren't
Expires/Cache-Control (and weirdly enough, Vary)
Artur
On Sep 27, 2010, at 6:50 AM, Nils Goroll wrote:
Hi,
I'd like to add a brief update to the following section summarizing my
understanding after talking to phk today, who seems to be really
busy and
probably will not find time to respond before the weekend:
To allow multiple cache objects to share body data, we want to add
reference counters to struct storage following the example of the
existing implementation for objects (HSH_Ref(), HSH_Unref() etc).
Though I still believe this should be pretty straight forward for
all other
storages, it won't be for -spersistent. After studying the code for
an hour or
so, my understanding is the following:
Persistent storage segments the cache (see
http://www.varnish-cache.org/trac/wiki/
ArchitecturePersistentStorage) and won't
re-use segments for new objects unless they are completely empty (no
live
objects). Right now, this relies on the LRU and TTL based expiry to
eventually
clean out segments before running out of space. Having multiple refs
to the same
obj in persistent storage (and updating it again and again) would
effectively
lead to more and more segments being kept from becoming empty.
I believe what is really needed is additional space management for the
persistent storage. In a first step, when running short of storage,
objects
could get nuked from the smallest segment. In a second step, the
mechanics to
copy live objects from one segment to another could be implemented.
Ideally,
this could be vcl controlled ("should we rather nuke the object or
bother
copying it?"). But I see some complications for both, mainly that
storage would
need to know which objects are referencing it in order to update
those (sounds
wrong).
As long as we don't have any of this, I suggest two alternative
temporary solutions:
a) If an object getting refreshed lives in persistent storage, we'll
simply copy
it. Actually, the existing Rackspace implementation does this. This
is far from
optimal, but won't make much of a difference for small objects and
is still much
more efficient than re-fetching the object from backend like today,
so we
shouldn't see any performance regression.
For other stevedores, we'll use the reference counter.
b) Add reference counters to persistent storage, too, and simply
live with the
cache fragmentation issue. Those using persistent storage would be
advised not
to use cache refresh.
At this point, I'd favor a).
Please note that all of this is my personal understanding. I am
posting these
thoughts in the hope that my understanding is correct and I'd really
appreciate
corrections if it's not.
Thank you, Nils
_______________________________________________
varnish-dev mailing list
[email protected]
http://lists.varnish-cache.org/mailman/listinfo/varnish-dev
_______________________________________________
varnish-dev mailing list
[email protected]
http://lists.varnish-cache.org/mailman/listinfo/varnish-dev