Re: Grace and misbehaving servers

Batanun B Fri, 20 Mar 2020 15:15:45 -0700

On Thu , Mar 19, 2020 at 11:12 AM Dridi Boukelmoune <dridi at varni.sh> wrote:
>
> Not quite!
>
> ttl+grace+keep defines how long an object may stay in the cache
> (barring any form of invalidation).
>
> The grace I'm referring to is beresp.grace,


Well, when I wrote "if ttl + grace + keep is a low value set in 
vcl_backend_response", I was talking about beresp.grace, as in beresp.ttl + 
beresp.grace + beresp.keep.


> it defines how long we might serve a stale object while a background fetch is 
> in progress.

I'm not really seeing how that is different from what I said. If beresp.ttl + 
beresp.grace + beresp.keep is 10s in total, then a req.grace of say 24h 
wouldn't do much good, right? Or maybe I just misunderstood what you were 
saying here.


> As always in such cases it's not black or white. Depending on the
> nature of your web traffic you may want to put the cursor on always
> serving something, or never serving something stale. For example, live
> "real time" traffic may favor failing some requests over serving stale
> data.

Well, I was thinking of the typical "regular" small/medium website, like blogs, 
corporate profile, small town news etc.


> I agree that on paper it sounds simple, but in practice it might be
> harder to get right.

OK. But what if I implemented it in this way, in my VCL?

* In vcl_backend_response, set beresp.grace to 72h if status < 400
* In vcl_backend_error and vcl_backend_response (when status >= 500), return 
(abandon)
* In vcl_synth, restart the request, with a special req header set
* In vcl_recv, if this req header is present, set req.grace to 72h

Wouldn't this work? If no, why? If yes, would you say there is something else 
problematic with it? Of course I would have to handle some special cases, and 
maybe check req.restarts and such, but I'm talking about the thought process as 
a whole here. I might be missing something, but I think I would need someone to 
point it out to me because I just don't get why this would be wrong.


> Is it hurting you that less frequently requested contents don't stay
> in the cache?

If it results in people seeing error pages when a stale content would be 
perfectly fine for them, then yes.

And these less frequently requested pages might still be part of a group of 
pages that all result in an error in the backend (while the health probe still 
return 200 OK). So while one individual page might be visited infrequently, the 
total number of visits on these kind of pages might be high.

Lets say that there are 3.000 unique (and cachable) pages that are visited 
during an average weekend. And all of these are in the Varnish cache, but 2.000 
of these have stale content. Now lets say that 50% of all pages start returning 
500 errors from the backend, on a Friday evening. That would mean that about 
~1000 of these stale pages would result in the error displayed to the end users 
during that weekend. I would much more prefer if it were to still serve them 
stale content, and then I could look into the problem on Monday morning.


> Another option is to give Varnish a high TTL (and give clients a lower
> TTL) and trigger a form of invalidation directly from the backend when
> you know a resource changed.

Well, that is perfectly fine for pages that have a one-to-one mapping between 
the page (ie the URL) and the content updated. But most pages in our setup 
contain a mix of multiple contents, and it is not possible to know beforehand 
if a specific content will contribute to the result of a specific page. That is 
especially true for new content that might be included in multiple pages 
already in the cache.

The only way to handle that in a foolproof way, as far as I can tell, is to 
invalidate all pages (since any page can contain this kind of content) the 
moment any object is updated. But that would pretty much clear the cache 
constantly. And we would still have to handle the case where the cache is 
invalidated for a page that gives a 500 error when Varnish tries to fetch it.

_______________________________________________
varnish-misc mailing list
[email protected]
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

Re: Grace and misbehaving servers

Reply via email to