Re: backend timeouts/503s vs grace cache

Guillaume Quintard Wed, 15 Nov 2017 07:00:06 -0800

Once you set the backend to sick, you will probably still have requests
in-flight to this backend, so you don't want to restart your SSH tunnel
just yet. Instead, monitor the VBE.*.BACKENDNAME.conn lines, and wait for
them to drop to 0, then you can reset the SSH tunnel.


-- 
Guillaume Quintard

On Wed, Nov 15, 2017 at 3:48 PM, Andrei <[email protected]> wrote:

> What do you mean exactly when you say "drain the connections"? :D
>
> On Wed, Nov 15, 2017 at 8:46 AM, Guillaume Quintard <
> [email protected]> wrote:
>
>> Oh, then your life should be easier then! Don't forget to drain the
>> connections, varnishstat will give you the number of open connections open
>> to any backend.
>>
>> --
>> Guillaume Quintard
>>
>> On Wed, Nov 15, 2017 at 3:42 PM, Andrei <[email protected]> wrote:
>>
>>> Thanks for the pointers! The tunnel setup is pretty flexible so I'll go
>>> ahead and mark the backend sick before restarting the tunnel, then healthy
>>> once confirmed up.
>>>
>>>
>>> On Wed, Nov 15, 2017 at 8:34 AM, Guillaume Quintard <
>>> [email protected]> wrote:
>>>
>>>> You can wait until vcl_deliver and do a restart, possibly adding a
>>>> marker saying "don't bother with the backend, serve from cache".
>>>>
>>>> The actual solution would be to mark the backend as sick before
>>>> restarting the ssh tunnel, and draining the connections, but I guess that's
>>>> not an option here, is it?
>>>>
>>>> --
>>>> Guillaume Quintard
>>>>
>>>> On Wed, Nov 15, 2017 at 3:29 PM, Andrei <[email protected]> wrote:
>>>>
>>>>> Hi Guillaume,
>>>>>
>>>>> Thanks for getting back to me
>>>>>
>>>>> On Wed, Nov 15, 2017 at 8:11 AM, Guillaume Quintard <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Why bother with the complex vcl_hit? Since you are saying that the
>>>>>> cache is regularly primed, I don't really see the added value.
>>>>>>
>>>>> I was mainly going by an example a while back, and not all sites/urls
>>>>> are primed in the same manner. It just stuck in the conf ever since
>>>>>
>>>>>
>>>>>>
>>>>>> (note, after a quick glance at it, I think it could just be a race
>>>>>> condition where the backend appears up in vcl_hit and is down by the time
>>>>>> you ask it the content)
>>>>>>
>>>>> How would you suggest "restarting" the request to try and force a
>>>>> grace cache object to be returned if present in that case?
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Quintard
>>>>>>
>>>>>> On Wed, Nov 15, 2017 at 6:02 AM, Andrei <[email protected]> wrote:
>>>>>>
>>>>>>> bump
>>>>>>>
>>>>>>> On Sun, Nov 5, 2017 at 2:12 AM, Andrei <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> One of the backends we have configured, runs through an SSH tunnel
>>>>>>>> which occasionally gets restarted. When the tunnel is restarted, 
>>>>>>>> Varnish is
>>>>>>>> returning a 503 since it can't reach the backend for pages which would
>>>>>>>> normally be cached (we force cache on the front page of the related 
>>>>>>>> site).
>>>>>>>> I believe our grace implementation might be incorrect, as we would 
>>>>>>>> expect a
>>>>>>>> grace period cache return instead of 503.
>>>>>>>>
>>>>>>>> Our grace ttl is set to 21600 seconds based on a global variable:
>>>>>>>>
>>>>>>>> sub vcl_backend_response {
>>>>>>>>   set beresp.grace = std.duration(variable.global_get("ttl_grace")
>>>>>>>> + "s", 6h);
>>>>>>>> }
>>>>>>>>
>>>>>>>> Our grace implementation in sub vcl_hit is:
>>>>>>>>
>>>>>>>>   sub vcl_hit {
>>>>>>>>     # We have no fresh fish. Lets look at the stale ones.
>>>>>>>>     if (std.healthy(req.backend_hint)) {
>>>>>>>>       # Backend is healthy. Limit age to 10s.
>>>>>>>>       if (obj.ttl + 10s > 0s) {
>>>>>>>>         #set req.http.grace = "normal(limited)";
>>>>>>>>         std.log("OKHITDELIVER: obj.ttl:" + obj.ttl + " obj.keep: "
>>>>>>>> + obj.keep + " obj.grace: " + obj.grace);
>>>>>>>>         return (deliver);
>>>>>>>>       } else {
>>>>>>>>         # No candidate for grace. Fetch a fresh object.
>>>>>>>>         std.log("No candidate for grace. Fetch a fresh object.
>>>>>>>> obj.ttl:" + obj.ttl + " obj.keep: " + obj.keep + " obj.grace: " +
>>>>>>>> obj.grace);
>>>>>>>>         return(miss);
>>>>>>>>       }
>>>>>>>>     } else {
>>>>>>>>       # backend is sick - use full grace
>>>>>>>>         if (obj.ttl + obj.grace > 0s) {
>>>>>>>>         #set req.http.grace = "full";
>>>>>>>>         std.log("SICK DELIVERY: obj.hits: " +   obj.hits + "
>>>>>>>> obj.ttl:" + obj.ttl + " obj.keep: " + obj.keep + " obj.grace: " +
>>>>>>>> obj.grace);
>>>>>>>>         return (deliver);
>>>>>>>>       } else {
>>>>>>>>         # no graced object.
>>>>>>>>         std.log("No graced object. obj.ttl:" + obj.ttl + "
>>>>>>>> obj.keep: " + obj.keep + " obj.grace: " + obj.grace);
>>>>>>>>         return (miss);
>>>>>>>>       }
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     # fetch & deliver once we get the result
>>>>>>>>     return (miss); # Dead code, keep as a safeguard
>>>>>>>>   }
>>>>>>>>
>>>>>>>>
>>>>>>>> Occasionally we see:
>>>>>>>> -   VCL_Log        No candidate for grace. Fetch a fresh object.
>>>>>>>> obj.ttl:-1369.659 obj.keep: 0.000 obj.grace: 21600.000
>>>>>>>>
>>>>>>>> For the most part, it's:
>>>>>>>> -   VCL_Log        OKHITDELIVER: obj.ttl:26.872 obj.keep: 0.000
>>>>>>>> obj.grace: 21600.000
>>>>>>>>
>>>>>>>> Are we setting the grace ttl too low perhaps?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> varnish-misc mailing list
>>>>>>> [email protected]
>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
varnish-misc mailing list
[email protected]
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

Re: backend timeouts/503s vs grace cache

Reply via email to