Re: Varnish returning synthetic 500 error even though it has stale content it should serve. But only seems to happen during/after a burst of traffic

Batanun B Fri, 17 Dec 2021 05:17:04 -0800

> Easy one: you build the key from the request, while the vary header is a 
> response property. Therefore, Varnish can't know to put the varied headers 
> into the hash, because it doesn't have that information yet.


Of course! Makes perfect sense. But the documentation should make this crystal 
clear, if you ask me. Like a page titled "How is the cache key calculated?" 
which explains everything related to this. Like how the hash works, how the 
default implementation works, how to modify it, and how the backend response 
still can cause multiple different versions (like the Vary header, and maybe 
other so called "secondary keys"?), and how to modify that behavior. I also 
think it would make sense if this documentation also mentioned how all this 
information can be debugged using varnishlog or other tools.

Is there even an official word for this final "cache key"? "Hash" clearly isn't 
specific enough. I'm talking about a word that refers to the unique key that 
always corresponds to only a _single_ version of a cached object.


>Don't touch that guy! Varnish will ignore "accept-encoding" in "vary" because 
>it handles compression internally, and always forces "accept-encoding: gzip" 
>before entering vcl_backend_fetch. If your VCL mentions accept-encoding, it's 
>almost always wrong.

Sorry, I'm confused now... Don't touch _which_ guy? Our VCL doesn't contain 
anything regarding "Accept-Encoding". All I said was that the Vary header in 
the response from the backend is "Accept-Encoding". And the way I see it, this 
shouldn't be the cause of the strange problems we are seeing, since even when 
factoring in this, there should exist a matching cached object for me, and it 
should be served regardless of TTL or backend health as long as the grace 
hasn't expired (which it hasn't). Or is my reasoning flawed here, based on the 
VCL snippet in my original post? Can you think of a scenario where our VCL 
would return the synthetic 500 page even when there exists a cached objekt 
matching the hash and vary logic?


> The default VSL space is 80MB, which is "only" worth a few (tens of) 
> thousands requests, so yeah, it can be a short backlog.

Yeah, I think 80MB is a bit to small for us. Ideally we should be able to sit 
down on a Monday and troubleshoot problems that occured Friday evening, but 
that might require a way too big VSL space. But a few hundred MB should be fine.


> The problem is that you have to be recording when the initial request goes 
> through. But, if you have then, cache hits will show the VXID of that first 
> request in their "x-varnish" header, and you can find it this way 
> ("varnishlog -r log.bin -q 'vxid == THE_VXID'")

Well, would it really be a cache hit? The main transaction I'm looking for is 
the first transaction for a specific path (in this case, "/") where Varnish 
served the synthetic 500 page. And then I would also like to see the closest 
preceding transaction for that same page, where the hash (and the Vary logic) 
matches the main transaction mentioned above.

Picture for example this scenario:

1. Request for "/", with with a certain hash and "secondary key" (from the 
"Accept-Encoding"). Data is fetched from backend and stored in cache.
2. Lots of other, irrelevant requests, including requests for "/" but with a 
different hash and/or "Accept-Encoding"
3. Request for "/" that uses the cached data from point 1 above
4. Lots of other, irrelevant requests, same as 2 above, but now in such big 
numbers that the backend servers start having problems
5. Request for "/" with same hash and/or "Accept-Encoding" as 1 above, but for 
some reason Varnish now returns the synthetic 500 page and puts it in the cache

The transaction log for request in step 5 is the main one I want to find, but 
also the transaction for step 3 and possibly step 1 as well. But the 
transaction for step 5 is the most interesting one, and I don't understand how 
that would be a cache hit while returning a fresh synthetic 500 error page.

Or are you saying that we can find "number 5" above by first finding some 
transaction later in the log, with a cache hit for the 500 page, and that 
transaction in turn points to "number 5" since that was the transaction that 
created the 500 response?

This makes me realize that when looking at past log data, ie using the "-d" 
parameter, or reading from a file, it looks for matching transactions/requests 
starting from the beginning. But in this kind troubleshooting I think it could 
be useful to also be able to search from the end and go backwards. Is that 
possible? So that if I use the "-" parameter to limit the output to a single 
transaction, it outputs the _last_ one instead of the first one.
_______________________________________________
varnish-misc mailing list
[email protected]
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

Re: Varnish returning synthetic 500 error even though it has stale content it should serve. But only seems to happen during/after a burst of traffic

Reply via email to