Varnish returning synthetic 500 error even though it has stale content it should serve. But only seems to happen during/after a burst of traffic

Batanun B Thu, 16 Dec 2021 12:13:19 -0800

Hi,

One of our websites usually has quite a low but steady stream of visitors, but 
occationally we get a sudden surge of requests over a very short time period 
(about 1-2 minutes). Varnish seems to handle the load fine, but the backends 
struggle with this. But I have noticed that Varnish doesn't serve the stale 
cached data, but instead shows our synthetic 500 page. This is true even for 
the start page, that definitely existed in the cache. And we have a grace 
period of 10 days, so I it's quite annoying that we can't simply serve the 
stale cached data during this short period.


I have tried picturing the entire flow, following the logic in the vcl, but I 
can't see what we do wrong. And annoyingly I can't reproduce the problem 
locally by simply shutting down the backends (or setting them to unhealthy), 
because whenever I do that I get the stale content served just as intended. 
Could the sheer volume itself cause this, making it impossible to reproduce by 
simply fetching the page a few times in the browser before and after disabling 
the backends? Or is there some edge case that I haven't thought of that is 
causing this?

A simpliedfied version of our vcl is included below, with only the relevant 
parts. But unless I have some blatent problem with the vcl, I think it would be 
good if I learned how to troubleshoot this using the Varnish tools, like 
varnishlog. So that next time this happens, I can use varnishlog etc to see 
what's happening.

Is it possible using varnishlog to find the very first request for a specific 
path ("/" in our case) where it returned the synthetic 500 and put it in the 
cache? And is it also possible to find the request just before that one, for 
the same path. If I could extract those two requests (including all the 
metadata in those transactions) from the djungle of thousands of requests, then 
maybe I can find some explanation why the second request doesn't return the 
stale data.

-----------------------------

sub vcl_hit {
    if (obj.ttl > 0s) {
        // Regular cache hit
        return (deliver);
    } else if (req.restarts == 0 && std.healthy(req.backend_hint)) {
        // Graced cache hit, first attempt.
        // Force cache miss to trigger fetch in foreground (ie synchronous 
fetch).
        set req.http.X-forced-miss = "true";
        return (miss);
    } else {
        // Graced cache hit, previous attempts failed (or backend unhealthy). 
Let the fetch happen in the background (ie asynchronous fetch), and return the 
cached value.
        return (deliver);
    }
}

sub vcl_recv {
    if (req.http.X-cache-pass == "true") {
        return(pass);
    }

    set req.grace = 240h; // 10 day grace
}

sub vcl_backend_response {
    if (bereq.http.X-cache-pass != "true") {
  if (beresp.status < 400) {
     set beresp.grace = 240h;
     set beresp.ttl = 30m; // Cache invalidation in the form of xkey softpurge 
can put objects into grace before the TTL is past.
        } else if (beresp.status < 500) {
            set beresp.ttl = 1m;
            return (deliver);
        } else if (beresp.status >= 500) {
            // In some cases we want to abandon the backend request on 
500-errors, since it otherwise would overwrite the cached object that still is 
usefull for grace.
            // This will make it jump to vcl_synth with a 503 status. There it 
will restart the request.
            if (bereq.is_bgfetch) {
                // In grace period. Abandoning 5xx request, since it otherwise 
would overwrite the cached object that still is usefull for grace
                return (abandon);
            } else if (bereq.http.X-forced-miss == "true") {
                return (abandon);
            }
            // Non background fetch, ie no grace period (and no stale content 
available). Cache the error page for a few seconds.
            set beresp.ttl = 15s;
            return (deliver);
        }
    }
}

sub vcl_synth {
    if (req.http.X-forced-miss == "true" && resp.status >= 500) {
        return (restart);
    }
}

sub vcl_backend_error {
    if (bereq.is_bgfetch || bereq.http.X-forced-miss == "true") {
        // By calling abandon, it will jump to vcl_synth with a 503 status. 
There it will restart the request.
        return (abandon);
    }
    set beresp.http.Content-Type = "text/html; charset=utf-8";
    set beresp.ttl = 15s;
    synthetic(std.fileread("/etc/varnish/500.html"));
    return (deliver);
}

-----------------------------

_______________________________________________
varnish-misc mailing list
[email protected]
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

Varnish returning synthetic 500 error even though it has stale content it should serve. But only seems to happen during/after a burst of traffic

Reply via email to