Hi,

I am ashamed to reply so late, sorry, I got lost in other stuff on Monday. I'll combine my replies:

On Mon, Nov 8, 2010 at 08:17, Zachary Zolton <[email protected]> wrote:
Of course, you'd be stuck with manually tracking the types of URLs to
purged, so I haven't been too eager to try it out yet...

Yes, that's precisely what I'd like to avoid. It's not _that_ hard of course, and Couch provides awesome entry point for the invalidation in _changes or update_notifier, but still...

On 9.Nov, 2010, at 24:42 , Robert Newson wrote:
I think it's clear that caching via ETag for documents is close to
pointless (the work to find the doc in the b+tree is over 90% of the
work and has to be done for GET or HEAD).

Yes. I wonder if there's any room for improvement on Couch's part. In any case, when we're in "every 1 request to cache means 1 request to database" situation, "caching" is truly pointless.

On Mon, Nov 8, 2010 at 11:11 PM, Zachary Zolton <[email protected] > wrote:
That makes sense: if every request to the caching proxy checks the
etag against CouchDB via a HEAD request—and CouchDB currently does
just as much work for a HEAD as it would for a GET—you're not going to
see an improvement.

Yes. But that's not the only scenario imaginable. I'd repeat what I wrote to the Varnish mailing list [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004993.html ]: 1. The cache can "accumulate" requests to a certain resource for a certain (configurable?) period of time (1 second, 1 minute, ...) and ask the backend less often -- accelerating througput. 2. The cache can return "possibly stale" content immediately and check with the backend afterwards (on the background, when n-th next request comes, ...) -- accelerating response time. It was my impression, that at least the first option is doable with Varnish (via some playing with the grace period), but I may be severely mistaken.

On Mon, Nov 8, 2010 at 5:04 PM, Randall Leeds <[email protected]> wrote:
If you have a custom caching policy whereby
the proxy will only check the ETag against the authority (Couch) once per (hour, day, whatever) then you'll get a speedup. But if your proxy
performs a HEAD request for every incoming request you will not see
much performance gain.

P-r-e-c-i-s-e-ly. If we can tune Varnish or Squid to not be so "dumb" and check with the backend based on some configs like this, we could use it for proper self-invalidating caching. (As opposed to TTL-based caching, which bring the manual expiration issues discussed above.) Unfortunately, at least based on the answers I got, this just not seems to be possible.

On Mon, Nov 8, 2010 at 12:06, Randall Leeds <[email protected]> wrote
It'd be nice if the "Couch is HTTP and can leverage existing caches and tools"
talking point truly included significant gains from etag caching.

P-R-E-C-I-S-E-L-Y. This is, for me, the most important, and embarrassing issue of this discussion. The O'Reilly book has it all over the place: http://www.google.com/search?q=varnish+OR+squid+site:http://guide.couchdb.org . Whenever you tell someone who really knows about HTTP caches "Dude, Couch is HTTP and can leverage existing caches and tools" you can and will be laughed at -- you can get away with mentioning expiration based caching and "simple" invalidation via _changes and such, but... Embarrassing still.

I'll try to do more research in this area, when time permits. I don't believe there's _not_ some arcane Varnish config option to squeeze some performance eg. in the "highly concurrent requests" scenario.

Thanks for all the replies!,

Karel

Reply via email to