Varnish suddenly started using much more memory

2024-05-16 Thread Batanun B
Hi,

About two weeks ago we deployed some minor changes to our Varnish servers in 
production, and after that we have noticed a big change in the memory that 
Varnish consumes.

Before the deploy, the amount of available memory on the servers were very 
stable, around 25 GB, for months on end. After the deploy, the amount of 
available memory dropped below 25 GB within 6 hours, and is dropping about 1 GB 
more each day, with no indication that it will level out before hitting rock 
bottom.

There was no change in traffic patterns during the time of the deploy. And we 
didn't change any OS or Varnish configuration. The deplow consisted only of 
trivial VCL changes, like changing the backend probe url to a dedicated 
healthcheck endpoint, and tweaking the ttl for a minor resource. Nothing of 
which could explain this massive change in memory usage.

We have configured varnish with "-s default=malloc,12G -s large=malloc,8G", 
where the combined 20GB is about 60% of the total server RAM of 32GB. This is 
below the recommended 75% maximum I've seen in many places.

Currently Varnish uses about 73% of the server memory, or 23GB (the RES column 
in htop). The default storage uses about 10 GB (SMA.default.g_bytes), while the 
large storage uses 8 GB. And the transient storage is currently about 2 MB 
(SMA.Transient.g_bytes). In total this results in about 18 GB. So what is that 
additional 5 GB used for? How can I troubleshoot that?

And, more importantly, what could possibly explain this sudden change?

The Ubuntu version stayed the same (20.04.5 LTS), and the Varnish version too 
(6.0.11-1~focal), as well as varnish-modules (0.15.1). I notice some 
differences in some installed packages of the servers, but nothing that stands 
out to me (but I'm no linux expert).

Regards
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Is there any "try catch" functionality in VCL? If not, how to handle runtime errors in vcl_init?

2023-04-19 Thread Batanun B
Just to explain my concern a bit. The worst case scenario in production, that I 
very much would like to avoid, could look something like this:

1. Something happens with our public key, so that Varnish won't be able to 
start after getting the new faulty key. Already running servers will continue 
to run, but no new servers can be initiated.
2. Before we have been able to fix the problem with the public key, some other 
problem happens in Varnish, and all running Varnish servers dies.
3. We end up with no working Varnish servers running, and not being able to 
start any new ones.
4. All our websites are down.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Varnish won't start because backend host resolves to too many addresses, but they are all identical IPs

2023-04-19 Thread Batanun B
> https://github.com/nigoroll/libvmod-dynamic/blob/master/src/vmod_dynamic.vcc#L538-L583
> maybe?
>

> I'm sure Nils will pipe up here if you need help, and if you want more
> synchronous assistance, there's always the discord channel
> .

Thanks! :)

___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Is there any "try catch" functionality in VCL? If not, how to handle runtime errors in vcl_init?

2023-04-19 Thread Batanun B
> It's the VMOD author you should ask to have an option to ignore public
> key errors.

Well, I'm usually of the mindset that if a problem can be handled in a generic 
way by the language/platform/framework, then one should avoid requiring each 
and every custom vmod/plugin/library to handle it individually. And I'm also of 
the mindset that pretty much any non-trivial code can fail, and the calling 
code should be able to catch that if needed. :)

> This is a constructor, and even if we had a try-catch kind of
> construct in the language, I don't think we would make this one
> recoverable.

In my mind, with a try-catch I could handle it like this:

try {
  new cryptoVerifier = crypto.verifier(sha256, 
std.fileread("/path/to/public.key"));
} catch (error) {
  // log error...
  // then try with with a hard coded known safe key, but that will fail when 
checking the signature
  new cryptoVerifier = crypto.verifier(sha256, (sha256, {"
-BEGIN PUBLIC KEY-
...
-END PUBLIC KEY-
"});
}

With this approach, Varnish will start like normal. And the only requests 
failing will be the ones using the cryptoVerifier.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Possible to disable/inactivate a backend using VCL?

2023-04-19 Thread Batanun B
> It was back-ported to 6.0, which is not an LTS branch limited to bug fixes ;)
>
> https://varnish-cache.org/docs/6.0/users-guide/vcl-backends.html

Thanks! Wow, I can't believe that I could miss that. I thought I read that 
specific page, as well as searched on Google, but I guess I was too focused on 
looking for terms like "disabled" or "inactivated".

___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Is there any "try catch" functionality in VCL? If not, how to handle runtime errors in vcl_init?

2023-04-19 Thread Batanun B
Hi,

We use the vmod crypto to verify cryptographic signatures for some of our 
traffic. When testing, the public key was hard coded in the VCL, but before we 
start using this feature in production we will switch to reading the public key 
from a file on disk. This file is generated on server startup, by fetching it 
from an Azure keyvault.

Now, the problem I'm picturing here is that this fetching of the public key can 
fail, or the key can be corrupt or empty, maybe by user error. Or the key could 
be valid, but the format of the key happens to be unsupported by the vmod 
crypto. So, even if we do our best to validate the key, in theory it could pass 
all our tests but still fail when we give it to the vmod crypto. And if that 
happens, Varnish won't start because the vmod crypto is initiated with the 
public key in vcl_init, like this:

sub vcl_init {
  new cryptoVerifier = crypto.verifier(sha256, 
std.fileread("/path/to/public.key"));
}

What I would prefer to happen if the key is rejected, is that vcl_init goes 
through without failure, and then the requests that use the cryptoVerifier will 
fail, but all other traffic (like 99%) still works. Can we achieve this 
somehow? Like some try-catch functionallity? If not, is there some other way to 
handle this that doesn't cause Varnish to die on startup?
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Varnish won't start because backend host resolves to too many addresses, but they are all identical IPs

2023-04-19 Thread Batanun B
> Shouldn't your DNS entries be clean? ;-)

Preferably, but I blame Microsoft here 

The problem went away by itself when I tried starting again like half an hour 
later or so, so I guess it was a temporary glitch in the matrix.

As far as I understand it, the IPs of these machines only change if they are 
deleted and created again. We do it occasionally in test/staging, and there we 
can live with Varnish needing to be restarted. In production we don't really 
delete them once they are properly setup, unless there is some major problem 
and then a restart of the load balanced varnish servers should not be a concern.

Thanks for your vmod suggestions! I will check them out. The dynamic one seems 
like the only one that supports community edition LTS 6.0. The documentation 
seems a bit lacking (no full VCL example), but I guess I could use their test 
cases as examples.


From: Guillaume Quintard 
Sent: Wednesday, April 19, 2023 4:42 PM
To: Batanun B 
Cc: varnish-misc@varnish-cache.org 
Subject: Re: Varnish won't start because backend host resolves to too many 
addresses, but they are all identical IPs

The fact the IPs are identical is weird, but I wouldn't be surprised if the dns 
entry actually contained 3 identical IPs.

> Shouldn't Varnish be able to figure out that in that case it can just choose 
> any one and it will work as expected?

Shouldn't your DNS entries be clean? ;-)

Honestly, if the IP(s) behind the service name is liable to change, you 
shouldn't use a dynamic backend because Varnish resolves the IP when the VCL is 
loaded, so if the IP changes behind your back, Varnish won't follow it, and 
you'll be screwed.
Instead, you should use dynamic backends, of which there are a handful:
- dynamic<https://github.com/nigoroll/libvmod-dynamic>, by UPLEX: it's been 
around for ages, it's battle-tested, and it's included in the oficial Varnish 
Docker image<https://hub.docker.com/_/varnish>
- 
udo+activedns<https://docs.varnish-software.com/varnish-enterprise/vmods/udo/#subscribe>,
 by Varnish Software: the design is slightly different and allows you to 
specify pretty much any load-balancing policy you might need. You'll need a 
subscription but you'll get excellent support (disclaimer, I'm an ex employee)
- 
reqwest<https://github.com/gquintard/vmod_reqwest#backend-https-following-up-to-5-redirect-hops-and-brotli-auto-decompression>,
 by yours truly: the interface focuses on providing a simple experience and a 
few bells and whistles (HTTPS, HTTP2, brotli, following redirects)

As you can see, the static backend's reluctance to fully handle DNS has been a 
fertile ground for vmods :-)

--
Guillaume Quintard


On Wed, Apr 19, 2023 at 1:49 AM Batanun B 
mailto:bata...@hotmail.com>> wrote:
All of the sudden Varnish fails to start in my development environment, and 
gives me the following error message:

Message from VCC-compiler:
Backend host "redacted-hostname": resolves to too many addresses.
Only one IPv4 and one IPv6 are allowed.
Please specify which exact address you want to use, we found all of these:
 555.123.123.3:80
 555.123.123.3:80
 555.123.123.3:80

I have changed the hostname and the IP above to not expose our server, but all 
three IP numbers are 100% identical. Shouldn't Varnish be able to figure out 
that in that case it can just choose any one and it will work as expected? It 
really should remove duplicates, and only if there are more than one 
non-duplicate IP then it should fail.

The problem is that the backend host is a so called "app service" in Microsoft 
Azure, which is basically a platform as a service (PaaS), where Microsoft 
handles the networking including the domain name (no user access it directly). 
I have no idea why it suddenly resolves to multiple duplicate IPs.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Possible to disable/inactivate a backend using VCL?

2023-04-19 Thread Batanun B
> backend theBackend none;
> Here's the relevant documentation: 
> https://varnish-cache.org/docs/trunk/users-guide/vcl-backends.html#the-none-backend
> It was added in 6.4.

Look like exactly what we need! Sadly we are "stuck" on 6.0 until the next LTS 
version comes. So I think that until then I will use our poor man version of 
the "none" backend, ie pointing to localhost with an port number that won't 
give a response.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Strange Broken Pipe error from Varnish health checks

2023-04-19 Thread Batanun B
Couldn't a HEAD request solve this? Then nginx wouldn't bother with the body at 
all, right?

This is what we do with our health checks. For example:

backend someBackend {
.host = "[redacted]";
.port = "80";
.probe = {
.interval = 9s;
.request =
"HEAD /healthcheck HTTP/1.1"
"Host: [redacted]"
"User-Agent: varnish-health-probe"
"Connection: Close"
"Accept: */*";
}
}

From: varnish-misc  
on behalf of George 
Sent: Monday, April 17, 2023 10:21 AM
To: varnish-misc@varnish-cache.org 
Subject: Strange Broken Pipe error from Varnish health checks

Hi,

I have a Varnish/nginx cluster running with varnish-7.1.2-1.el7.x86_64 on 
CentOS 7.

The issue I am having comes from the varnish health checks. I am getting a 
"broken pipe" error in the nginx error log at random times like below:
Apr 10 17:32:46 VARNISH-MASTER nginx_varnish_error: 2023/04/10 17:32:46 [info] 
17808#17808: *67626636 writev() failed (32: Broken pipe), client: unix:, 
server: _, request: "GET /varnish_check HTTP/1.1", host: "0.0.0.0"

The strange thing is that this error appears only when Varnish performs the 
health checks. I have other scripts doing it(nagios, curl, wget, AWS ELB) but 
those do not show any errors. In addition to this Varnish and nginx where the 
health checks occur are on the same server and it makes no difference if I use 
a TCP connection or socket based one.

Below are the varnish vcl and nginx locations for the health checks:
backend nginx_varnish {
   .path = "/run/nginx/nginx.sock";
   .first_byte_timeout = 600s;
   .probe = health;
}

location = /varnish_check {
keepalive_timeout 305;
return 200 'Varnish Check';
access_log /var/log/nginx/varnish_check.log main;
error_log /var/log/nginx/varnish_check_errors.log debug;
error_log 
syslog:server=unix:/run/nginx_log.in.sock,facility=local1,tag=nginx_varnish_error,nohostname
 info;
}

Are there any docs I can read about how exactly varnish performs the health 
checks and what internal processes are involved?
Did anyone happen to have similar issues? This is not causing any operational 
problems for the cluster but it is just something that I want to determine why 
it is happening because it just should not be happening.

Please help
THanks in advance.

___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Varnish won't start because backend host resolves to too many addresses, but they are all identical IPs

2023-04-19 Thread Batanun B
All of the sudden Varnish fails to start in my development environment, and 
gives me the following error message:

Message from VCC-compiler:
Backend host "redacted-hostname": resolves to too many addresses.
Only one IPv4 and one IPv6 are allowed.
Please specify which exact address you want to use, we found all of these:
 555.123.123.3:80
 555.123.123.3:80
 555.123.123.3:80

I have changed the hostname and the IP above to not expose our server, but all 
three IP numbers are 100% identical. Shouldn't Varnish be able to figure out 
that in that case it can just choose any one and it will work as expected? It 
really should remove duplicates, and only if there are more than one 
non-duplicate IP then it should fail.

The problem is that the backend host is a so called "app service" in Microsoft 
Azure, which is basically a platform as a service (PaaS), where Microsoft 
handles the networking including the domain name (no user access it directly). 
I have no idea why it suddenly resolves to multiple duplicate IPs.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Possible to disable/inactivate a backend using VCL?

2023-04-19 Thread Batanun B
Hi Guillaume,

> I'm curious, if it's completely deactivated what's the benefit of having it 
> in the vcl?

It is only intended to be deactivated in production (until we go live). Our 
test and staging environments have the backend active.

> if (false) {
>   set req.backend_hint = you_deactivated_backend;
> }

Thanks, I will test this.
My current prod-specific setup for this backend looks like this:

backend theBackend {
.host = "localhost";
.port = "";
.probe = {
.interval = 1h;
}
}

This seems to be working when testing it locally. It also solves the problem of 
having to assign some arbitrary ip or hostname (the actual backend host for 
this service hasn't been created in production yet, since we are several months 
away from go live), which actually was our main problem. What do you think 
about this approach instead? Preferably this would be a built in feature in 
Varnish, with a setting "disabled = true" or similar in the backend definition, 
and then it would not require any host or ip to be configured.

___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Possible to disable/inactivate a backend using VCL?

2023-04-14 Thread Batanun B
Hi,

We are currently working on a new feature that won't go live for several months 
still. This new feature has it's own backend in Varnish. Most of our VCL code 
is identical for all environments, and this code refers to the new backend, so 
it needs to be defined otherwise Varnish won't start. But in production we 
don't want to show anything of this feature. And we would like to have this 
backend completely disabled or inactivated in Varnish. Can we do that using 
VCL? Like forcing the health to be sick, or something similar. We would prefer 
to keep this inside the backend declaration if possible, and we would also 
prefer somethink not too "hackish" (like pointing it to a dead IP).

Does Varnish has any recommended approach for this? I know there is a cli 
command to set the health, but as I said we would really prefer doing this 
using VCL only.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Confusion about LTS versions. Any comprehensive documentation regarding everything LTS?

2022-03-02 Thread Batanun B
Hi Guillaume,

Well, not really. It is a snapshot of the state almost two years ago, and I 
expected some page where the information is kept up-to-date. And LTS is only 
mentioned two times in that email, not answering any of my LTS questions.

Is there an up-to-date list of _all_ LTS versions (down to the minor version 
number)? Now I have no idea how to check if 6.2.1-2 is LTS or not.

Is an LTS version just saying that "version x.y.z is LTS"? Or are the LTS 
versions completely separate from the non-LTS versions? As in, can there exist 
an LTS version 1.2.3 and a non-LTS 1.2.3 which does not contain the same thing?

Any info on upcoming LTS versions? I'm specifically interested in new non-minor 
versions, like 6.6 or 7.0.

___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Confusion about LTS versions. Any comprehensive documentation regarding everything LTS?

2022-03-02 Thread Batanun B
Hi,

Is there any documentation focused on the LTS versions of Varnish Cache? And 
with that I mean things like "What does the LTS version of Varnish mean?", "Why 
should or shouldn't I choose an LTS version?", "What is the latest LTS 
version?" and "How do I install an LTS version?".

Currently I can't find any such documentation anywhere.

We use Varnish Cache 6.0 LTS (6.0.6) now, on Ubuntu 18.04 LTS. I'm testing 
setting up a new Varnish server, on Ubuntu 20.04 LTS, and it automatically 
installs Varnish 6.2.1-2.

Is that an LTS version? How can I verify that? And if not, how can I make sure 
that the version being selected is an LTS version?

I followed the instructions at 
https://packagecloud.io/varnishcache/varnish60lts/install#manual-deb and in the 
file varnishcache_varnish60lts.list I made sure to change "trusty" to "focal" 
to match the Ubuntu version.

Also, the "Releases & Downloads" page is quite confusing. First, it doesn't say 
_anything_ about LTS versions. Secondly, it specifically mentions version 
7.0.2, 6.6.2 and 6.0.10 as supported, and says "All releases not mentioned 
above are End-Of-Life and unsupported". What does that mean?

https://varnish-cache.org/releases/

Also, is there a place where we can see the roadmap for future planned LTS 
versions? Now I have no idea if there will be a new LTS coming next week, next 
year, or 2030.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Varnish returning synthetic 500 error even though it has stale content it should serve. But only seems to happen during/after a burst of traffic

2021-12-17 Thread Batanun B
> I'd suggest to determine the correct delivery of the front page via an 
> external monitoring
> (e.g. icinga2 or a simple script). As far as I understand, you don't need to
> know the exact request, but more of a rough point in time of when the requests
> start failing. So a monitoring script which curls every minute should be
> sufficient and causes a lot less trouble.

Well, in our use case we would need the exact requests in order to understand 
why it happens. It might very well be a bug in our VCL, but I would like to 
have show me something that could narrow it down.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Varnish returning synthetic 500 error even though it has stale content it should serve. But only seems to happen during/after a burst of traffic

2021-12-17 Thread Batanun B
> I recommend permanent logging, if you want to be able to debug older 
> incidents.
> We do it like this:
> 
> ```
> /usr/bin/varnishlog -w /var/log/varnish/varnish-500.log -a -D \
> -P /var/run/varnishlog-500.pid \
> -q 'RespStatus >= 500 or BerespStatus >=500'
> ```
> 
> I attach our systemd unit file, for you may be interested.

Thanks. I have thought about that too. But I think we might want to include 
non-error transactions as well. I mean, with the problems this post is about we 
want to see when the cached version of the start page was generated and when it 
was last served from cache successfully. But maybe we could have a permanent 
logging just for the start page, regardless of http status. That should 
hopefully reduce the logging intensity enough so that logging to disk isn't 
effecting the Varnish performance.

One thing though... If you log all "status: 500+" transactions to disk, isn't 
there a risk that your logging might exacerbate a situation where your site is 
overwhelmed with traffic? Where a large load causes your backends to start 
failing, and that triggers intense logging of those erroneous transactions 
which might reduce the performance of Varnish, causing more timeouts etc which 
cause more logging and so on...
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Varnish returning synthetic 500 error even though it has stale content it should serve. But only seems to happen during/after a burst of traffic

2021-12-17 Thread Batanun B
> Easy one: you build the key from the request, while the vary header is a 
> response property. Therefore, Varnish can't know to put the varied headers 
> into the hash, because it doesn't have that information yet.

Of course! Makes perfect sense. But the documentation should make this crystal 
clear, if you ask me. Like a page titled "How is the cache key calculated?" 
which explains everything related to this. Like how the hash works, how the 
default implementation works, how to modify it, and how the backend response 
still can cause multiple different versions (like the Vary header, and maybe 
other so called "secondary keys"?), and how to modify that behavior. I also 
think it would make sense if this documentation also mentioned how all this 
information can be debugged using varnishlog or other tools.

Is there even an official word for this final "cache key"? "Hash" clearly isn't 
specific enough. I'm talking about a word that refers to the unique key that 
always corresponds to only a _single_ version of a cached object.


>Don't touch that guy! Varnish will ignore "accept-encoding" in "vary" because 
>it handles compression internally, and always forces "accept-encoding: gzip" 
>before entering vcl_backend_fetch. If your VCL mentions accept-encoding, it's 
>almost always wrong.

Sorry, I'm confused now... Don't touch _which_ guy? Our VCL doesn't contain 
anything regarding "Accept-Encoding". All I said was that the Vary header in 
the response from the backend is "Accept-Encoding". And the way I see it, this 
shouldn't be the cause of the strange problems we are seeing, since even when 
factoring in this, there should exist a matching cached object for me, and it 
should be served regardless of TTL or backend health as long as the grace 
hasn't expired (which it hasn't). Or is my reasoning flawed here, based on the 
VCL snippet in my original post? Can you think of a scenario where our VCL 
would return the synthetic 500 page even when there exists a cached objekt 
matching the hash and vary logic?


> The default VSL space is 80MB, which is "only" worth a few (tens of) 
> thousands requests, so yeah, it can be a short backlog.

Yeah, I think 80MB is a bit to small for us. Ideally we should be able to sit 
down on a Monday and troubleshoot problems that occured Friday evening, but 
that might require a way too big VSL space. But a few hundred MB should be fine.


> The problem is that you have to be recording when the initial request goes 
> through. But, if you have then, cache hits will show the VXID of that first 
> request in their "x-varnish" header, and you can find it this way 
> ("varnishlog -r log.bin -q 'vxid == THE_VXID'")

Well, would it really be a cache hit? The main transaction I'm looking for is 
the first transaction for a specific path (in this case, "/") where Varnish 
served the synthetic 500 page. And then I would also like to see the closest 
preceding transaction for that same page, where the hash (and the Vary logic) 
matches the main transaction mentioned above.

Picture for example this scenario:

1. Request for "/", with with a certain hash and "secondary key" (from the 
"Accept-Encoding"). Data is fetched from backend and stored in cache.
2. Lots of other, irrelevant requests, including requests for "/" but with a 
different hash and/or "Accept-Encoding"
3. Request for "/" that uses the cached data from point 1 above
4. Lots of other, irrelevant requests, same as 2 above, but now in such big 
numbers that the backend servers start having problems
5. Request for "/" with same hash and/or "Accept-Encoding" as 1 above, but for 
some reason Varnish now returns the synthetic 500 page and puts it in the cache

The transaction log for request in step 5 is the main one I want to find, but 
also the transaction for step 3 and possibly step 1 as well. But the 
transaction for step 5 is the most interesting one, and I don't understand how 
that would be a cache hit while returning a fresh synthetic 500 error page.

Or are you saying that we can find "number 5" above by first finding some 
transaction later in the log, with a cache hit for the 500 page, and that 
transaction in turn points to "number 5" since that was the transaction that 
created the 500 response?

This makes me realize that when looking at past log data, ie using the "-d" 
parameter, or reading from a file, it looks for matching transactions/requests 
starting from the beginning. But in this kind troubleshooting I think it could 
be useful to also be able to search from the end and go backwards. Is that 
possible? So that if I use the "-" parameter to limit the output to a single 
transaction, it outputs the _last_ one instead of the first one.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Varnish returning synthetic 500 error even though it has stale content it should serve. But only seems to happen during/after a burst of traffic

2021-12-16 Thread Batanun B
> Could be a vary issue

Ah, I remember hearing about that earlier, and made a mental note to read up on 
that. But I forgot all about it. Now I did just that, and boy was that a cold 
shower for me! I definitely need to unset that header. But why, for the love of 
all that is holy, does Varnish not include the vary-data in the hash? Why isn't 
the hash the _single_ key used when storing and looking up data in the cache? 
Why does Varnish hide this stuff from us?

However, when checking the Vary header from the backend, it is set to 
"Accept-Encoding". And since I haven't changed anything in my browser, it 
should send the same "Accept-Encoding" request header whenever I surf the 
website. And since I have visited the startpage multiple times the last 10 
days, it should have a cached version of it matching my "Accept-Encoding".

> can you post the output of `varnishlog -d -g request -q 'RespStaus eq 500'?

Well, that gives me nothing that is relevant here, sadly. The last time this 
happened was a few days ago, and the buffer doesn't seem to be big enough to 
keep data that far back.

But maybe you could describe what you would look for? I would love to learn how 
to troubleshoot this.

> In the meantime, here's a cheat sheet to get started with varnishlog:
> https://docs.varnish-software.com/tutorials/vsl-query/

Thanks, although most of that stuff I already knew. And it doesn't really give 
any more advanced examples. Like the problem I mentioned earlier. I really 
would like to know if it is possible to find the first request where it served 
the 500 page for the "/" url, as well as the request just before that, for the 
same url. Do you know how to construct a query that gives me that?
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Varnish returning synthetic 500 error even though it has stale content it should serve. But only seems to happen during/after a burst of traffic

2021-12-16 Thread Batanun B
Hi,

One of our websites usually has quite a low but steady stream of visitors, but 
occationally we get a sudden surge of requests over a very short time period 
(about 1-2 minutes). Varnish seems to handle the load fine, but the backends 
struggle with this. But I have noticed that Varnish doesn't serve the stale 
cached data, but instead shows our synthetic 500 page. This is true even for 
the start page, that definitely existed in the cache. And we have a grace 
period of 10 days, so I it's quite annoying that we can't simply serve the 
stale cached data during this short period.

I have tried picturing the entire flow, following the logic in the vcl, but I 
can't see what we do wrong. And annoyingly I can't reproduce the problem 
locally by simply shutting down the backends (or setting them to unhealthy), 
because whenever I do that I get the stale content served just as intended. 
Could the sheer volume itself cause this, making it impossible to reproduce by 
simply fetching the page a few times in the browser before and after disabling 
the backends? Or is there some edge case that I haven't thought of that is 
causing this?

A simpliedfied version of our vcl is included below, with only the relevant 
parts. But unless I have some blatent problem with the vcl, I think it would be 
good if I learned how to troubleshoot this using the Varnish tools, like 
varnishlog. So that next time this happens, I can use varnishlog etc to see 
what's happening.

Is it possible using varnishlog to find the very first request for a specific 
path ("/" in our case) where it returned the synthetic 500 and put it in the 
cache? And is it also possible to find the request just before that one, for 
the same path. If I could extract those two requests (including all the 
metadata in those transactions) from the djungle of thousands of requests, then 
maybe I can find some explanation why the second request doesn't return the 
stale data.

-

sub vcl_hit {
if (obj.ttl > 0s) {
// Regular cache hit
return (deliver);
} else if (req.restarts == 0 && std.healthy(req.backend_hint)) {
// Graced cache hit, first attempt.
// Force cache miss to trigger fetch in foreground (ie synchronous 
fetch).
set req.http.X-forced-miss = "true";
return (miss);
} else {
// Graced cache hit, previous attempts failed (or backend unhealthy). 
Let the fetch happen in the background (ie asynchronous fetch), and return the 
cached value.
return (deliver);
}
}

sub vcl_recv {
if (req.http.X-cache-pass == "true") {
return(pass);
}

set req.grace = 240h; // 10 day grace
}

sub vcl_backend_response {
if (bereq.http.X-cache-pass != "true") {
  if (beresp.status < 400) {
 set beresp.grace = 240h;
 set beresp.ttl = 30m; // Cache invalidation in the form of xkey softpurge 
can put objects into grace before the TTL is past.
} else if (beresp.status < 500) {
set beresp.ttl = 1m;
return (deliver);
} else if (beresp.status >= 500) {
// In some cases we want to abandon the backend request on 
500-errors, since it otherwise would overwrite the cached object that still is 
usefull for grace.
// This will make it jump to vcl_synth with a 503 status. There it 
will restart the request.
if (bereq.is_bgfetch) {
// In grace period. Abandoning 5xx request, since it otherwise 
would overwrite the cached object that still is usefull for grace
return (abandon);
} else if (bereq.http.X-forced-miss == "true") {
return (abandon);
}
// Non background fetch, ie no grace period (and no stale content 
available). Cache the error page for a few seconds.
set beresp.ttl = 15s;
return (deliver);
}
}
}

sub vcl_synth {
if (req.http.X-forced-miss == "true" && resp.status >= 500) {
return (restart);
}
}

sub vcl_backend_error {
if (bereq.is_bgfetch || bereq.http.X-forced-miss == "true") {
// By calling abandon, it will jump to vcl_synth with a 503 status. 
There it will restart the request.
return (abandon);
}
set beresp.http.Content-Type = "text/html; charset=utf-8";
set beresp.ttl = 15s;
synthetic(std.fileread("/etc/varnish/500.html"));
return (deliver);
}

-
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Keeping multiple cookies based on regex or prefix, with Varnish 6.0

2021-11-15 Thread Batanun B
Hi,

We use Varnish Cache 6.0 LTS, with varnish modules 0.15.0. As far as I can see, 
this includes a version of the cookie vmod that doesn't support regex. Yet, our 
use case leans towards using a regex, or at least a substring.

Case in point. Our frontend uses a bunch of cookies. And a handful of them is 
needed by the backend too. I have done a simple test case, with a just a single 
cookie, and was able to use the built in cookie vmod to allow that cookie to go 
through to the backend. But in order for this to be useful for us, we need to 
allow multiple cookies. Cookies that we don't know the exact name of when 
writing the VCL. But they will all start with a defined prefix.

Example:
The incoming request contains the following cookie header:

Cookie: _fbp=fb.123; _gid=GA1.2.3; custom-x=qwerty; _ga_ABCD=GS1.123; 
custom-y=qwerty; _ga=GA1123

And we want to keep all cookies that start with "custom-", hence we want the 
rewritten cookie header to be:

Cookie: custom-x=qwerty; custom-y=qwerty

How can we achieve this with our version of Varnish and varnish-modules?
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Can't get "streaming" or pipe to work, Varnish still waits for the full response

2020-09-14 Thread Batanun B
> It's not obvious why a response would be delayed, but it could
> happen to be related to the fetch_chunksize parameter.

Hmm... fetch_chunksize is 16k by default, and 4k minmum. Does this mean that 
even if streaming is working fine, it will still buffer at least 4kb of data 
before sending that to the client? That would be way to much for this use case. 
The output is basically plain text, and it is flushed after a few lines of text 
has been written. We are talking maybe 50-100 bytes per flush.

> However I have never come across a setup where we needed to

Well, I actually bit the bullet and decided to refactor this particular admin 
page that I needed this for, so that the job is done in the background and the 
current status can be fetched whenever one feels like. That way the response is 
always quick. Hopefully we won't need this for too many of these old legacy 
admin pages.



From: Dridi Boukelmoune 
Sent: Monday, September 14, 2020 8:35 AM
To: Batanun B 
Cc: varnish-misc@varnish-cache.org 
Subject: Re: Can't get "streaming" or pipe to work, Varnish still waits for the 
full response

On Sat, Sep 12, 2020 at 10:08 PM Batanun B  wrote:
>
> Hi,
>
> We have some old (legacy) internal admin pages that do some classic old 
> school processing while the page is loading, and outputting the current 
> status as it is working. When requesting these pages directly (through 
> Tomcat), I can see the results in the browser at the same time as the results 
> are written on the other end. But when I go through Varnish, no matter what I 
> try, I only see a blank page that is loading/waiting, and then when the 
> backend is done writing, then I get the entire result in one go.
>
> How can I configure Varnish to bring any content to the client the moment it 
> gets it from the backend, and not wait until the entire response is done?
>
> In vcl_backend_response I do this:
>   set beresp.do_stream = true;
>   set beresp.uncacheable = true;
>   return (deliver);

Streaming is on by default, you don't need to do anything.

> I have also tried returning (pipe) in vcl_recv (with and without do_stream 
> and uncacheable). And gzip is turn off. But nothing helps. What can I do 
> more? And how can I debug this? Varnishlog shows nothing that is telling me 
> that it is buffering, or waiting for the response, or anything like that.

It is indeed hard to get that information just from Varnish, you could
try to capture TCP packets to check how long it takes for backend
traffic to be forwarded to clients. It's not obvious why a response
would be delayed, but it could happen to be related to the
fetch_chunksize parameter.

However I have never come across a setup where we needed to
tune that knob...

Dridi
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Can't get "streaming" or pipe to work, Varnish still waits for the full response

2020-09-12 Thread Batanun B
Hi,

We have some old (legacy) internal admin pages that do some classic old school 
processing while the page is loading, and outputting the current status as it 
is working. When requesting these pages directly (through Tomcat), I can see 
the results in the browser at the same time as the results are written on the 
other end. But when I go through Varnish, no matter what I try, I only see a 
blank page that is loading/waiting, and then when the backend is done writing, 
then I get the entire result in one go.

How can I configure Varnish to bring any content to the client the moment it 
gets it from the backend, and not wait until the entire response is done?

In vcl_backend_response I do this:
  set beresp.do_stream = true;
  set beresp.uncacheable = true;
  return (deliver);

I have also tried returning (pipe) in vcl_recv (with and without do_stream and 
uncacheable). And gzip is turn off. But nothing helps. What can I do more? And 
how can I debug this? Varnishlog shows nothing that is telling me that it is 
buffering, or waiting for the response, or anything like that.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Possible to detect a previous xkey softpurge?

2020-09-12 Thread Batanun B
> Arguably, if you use Varnish to cache responses, you might as well
> always tell your backend not to serve from cache. Because if a soft
> purge moves you inside the grace period, there's no guarantee that the
> next hit will happen before the object leaves the grace period. At
> this point this will no longer trigger a background fetch...

Well, the caching in the backend is not on the same level as the Varnish cache. 
In Varnish, a single request results in a single object to cache. In the 
backend, a single request can result in hundreds or thousands of separate 
lookups (some involving separate http calls to other services), each cachable 
with their own unique key. And most of these objects are reused from the cache 
for other requests as well. And skipping that internal cache completely, 
letting the backend do all these lookups and sub-requests every single time a 
request comes in to the backend, that would be terrible for performance. So we 
really only want to skip this internal cache in special circumstances.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Possible to detect a previous xkey softpurge?

2020-09-12 Thread Batanun B
> From what I understand now, you want to make sure that after a softpurge,


> you never get the stale object. So, if I may: why use a softpurge at all?

> Just remove the objects completely and be done with it.

I think I was not explaining my use case properly. It is not the Varnish cache 
I was talking about. The backend has it's own cache, and it can sometimes serve 
stale data (if requested seconds after an update). So completely removing the 
object from the _Varnish_ cache does not help here. And in fact it would just 
do harm, since if then the backend goes down we don't have any stale data from 
Varnish cache to serve. With softpurge and a long grace, we always are ready 
for a backend failure (assuming the data is in the cache).
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Possible to detect a previous xkey softpurge?

2020-09-03 Thread Batanun B
Hi,

I'm not sure how that "no-grace" header would be set. The softpurge could 
theoretically impact hundred of URLs, and what we would like is that any 
requests for these URLs after the softpurge should include a special header 
when talking with the backend.

Skipping grace in general, and sending that special header to all requests to 
the backend, is not what we want. 

But now I am thinking of an alternative, that might give us somewhat what we 
want while being much simpler and not needing to know if a softpurge has 
happened or not. Since we really only need to do this in a short time period 
after a softpurge, and the softpurge sets ttl to zero, then we can skip the 
"after a softpurge" requirement and simply check if the ttl recently expired. 
As far as I understand it, when the ttl has expired, the obj.ttl is a negative 
value indicating how many seconds since the ttl expired. So 15 seconds after 
the ttl, it would be -15s. Then we can have something like this in in vcl_hit:

```
if (obj.ttl > -15s) {
set req.http.X-my-backend-skip-cache = "true";
return (miss);
}
```

I can't check this right now, from the computer I am at. But it should work, 
right? Then the only "false positives" we will end up with are the requests 
that happen to come in within 15 seconds of the regular ttl expiring. But if we 
get the cache invalidation to work fully (including in the backend), then we 
should be able to increase the regular ttl higher than the current 5s, and then 
this false positive should happen much more rarely.


Guillaume Quintard  wrote:
>
> Hi, 
> 
> You can't detect a softpurge, but you can tell Varnish to ignore grace:
> 
> ```
> sub vcl_recv {
>     if (req.http.no-grace) {
>         set req.grace = 0s;
>     }
> }
> ```
> 
> the softpurge kill the ttl at the object level, and this kills the grace at 
> the request level, so Varnish will reach out to the backend.
> 
> But note that it will also do the same even without a prior softpurge, it 
> just needs an expired ttl.
> 
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Possible to detect a previous xkey softpurge?

2020-09-03 Thread Batanun B
Hi,

We sometimes have a problem with the backend using its internal cache for a few 
seconds too long after something has been updated. We trigger a softpurge (xkey 
vmod) in varnish, but if someone requests the page again very soon after that, 
the data that Varnish gets from the backend might be old. In this case, we 
would like to be able to tell the backend, maybe using an extra header, that it 
should skip its internal caches and give us the updated content.

But, I'm not sure how to archive this in Varnish. Is it possible to detect that 
the page requested has been softpurged earlier? If yes, is it also possible to 
see when that softpurge took place? Because we would only ever need to do this 
if the softpurge happened less than let's say 30 seconds ago.

And the reason that the backend data might be old after an update is that what 
we call "the backend" (from a Varnish perspective) is actually a complex setup 
of services. And there, the update happens in one place, and our "backend" is 
actually a frontend server that sometimes don't get the information about the 
update quick enough. I think that the implementation of this system is a bit 
faulty, but it is what it is, and I would like to use the power of Varnish to 
handle this, if possible.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Using user defined variable in backend definition?

2020-08-26 Thread Batanun B
Thanks for the suggestions! I'm not sure if we are in the right moment of this 
project to introduce more competitivity in the form of switching to directors, 
and a dynamic on top of that. At least not just for this simple thing. But we 
might need/want to use directors later on, for the dynamic address resolution. 
And then I might revisit this thread, and look at your example. Thanks!
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Using user defined variable in backend definition?

2020-08-24 Thread Batanun B
Hi,

I'm experimenting with user defined variables, and when using them in regular 
string concatenation, and in synch output, it works fine. But I would like to 
use them in the backend definitions, and I simply can't get it to work.

backend myBackend {
.host = var.get("myBackendHost");
}

But that fails with a VCC-compiler error on that line, saying "Expected CSTR 
got 'var.get'".
I also tried with the "variable" vmod, but it resulted in the same type of 
error.

Is there any way to get this to work? I would really like to have all 
environment specific configuration (including backend hostnames) in an 
environment specific vcl file, and then include it in the main vcl where the 
backend definitions should be (and use the variables).  And I would like to 
achive this using just VCL (and vmods), so no custom script that does search 
and replace or anything like that.

So, I would like something like this to work:

default.vcl:
...
import var;

include "environment.vcl";

backend myBackend {
.host = var.get("myBackendHost");
}
...

environment.vcl:
sub vcl_init {
var.set("myBackendHost", "myHostName");
}





___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Detect request coalescing?

2020-07-30 Thread Batanun B
Just a quick question that I wasn't able to find any information on using 
regular google searches. Is it possible to detect request coalescing somehow? 
Meaning that, when looking at the response headers, can I somehow see that the 
request had to wait for another request for the same resource to finish, before 
this request could return? Can I somehow detect that in VCL, and add a custom 
header with that information?

Regards
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Varnish intermittently returns incomplete images

2020-05-08 Thread Batanun B
also... could you explain this part for me? "so it had to truncate it and throw 
it away"
Why does it have to truncate it? Why not avoid caching it, and returning it as 
is, from the backend, untouched?

From: Guillaume Quintard 
Sent: Friday, May 8, 2020 7:34 PM
To: Batanun B 
Cc: varnish-misc@varnish-cache.org 
Subject: Re: Varnish intermittently returns incomplete images

Hi,

Do you have objects that are sensibly smaller that your images in your cache?

What you are describing sounds like LRU failure (check nuke_limit in 
"varnishadm param.show"), basically, on a miss, varnish couldn't evict enough 
objects and make room for the new object, so it had to truncate it and throw it 
away.

If that's the issue, you can increase nuke_limit, or get a bigger cache, or 
segregate small and large objects into different storages.

--
Guillaume Quintard


On Fri, May 8, 2020 at 10:14 AM Batanun B 
mailto:bata...@hotmail.com>> wrote:
Our Varnish (test environment) intermittently returns incomplete images. So the 
binary content is not complete. When requesting the image from the backend 
directly (using curl), the complete image is returned every time (I tested 1000 
times using a script).

This happens intermittently. Sometimes Varnish returns the complete image, 
sometimes half of it, sometimes 20% etc... The incomplete image is returned 
quickly, so I don't think there is a timeout involved (we have not configured 
any specific timeout in varnish).

I see nothing special in varnishlog when this happens. But I don't know how to 
troubleshoot this in a good way. Any suggestions?
___
varnish-misc mailing list
varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Varnish intermittently returns incomplete images

2020-05-08 Thread Batanun B
Our Varnish (test environment) intermittently returns incomplete images. So the 
binary content is not complete. When requesting the image from the backend 
directly (using curl), the complete image is returned every time (I tested 1000 
times using a script).

This happens intermittently. Sometimes Varnish returns the complete image, 
sometimes half of it, sometimes 20% etc... The incomplete image is returned 
quickly, so I don't think there is a timeout involved (we have not configured 
any specific timeout in varnish).

I see nothing special in varnishlog when this happens. But I don't know how to 
troubleshoot this in a good way. Any suggestions?
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Grace and misbehaving servers

2020-03-25 Thread Batanun B
On Mo, Mar 23, 2020 at 10:00 AM Dridi Boukelmoune  wrote:
>
> For starters, there currently is no way to know for sure that you
> entered vcl_synth because of a return(abandon) transition. There are
> plans to make it possible, but currently you can do that with
> confidence lower than 100%.

I see. I actually had a feeling about that, since I didn't see an obvious way 
to pass that kind of information into vcl_synth when triggered by an abandon.

Although, just having a general rule to restart 500-requests there, regardless 
of what caused it, is not really that bad anyway.


> A problem with the restart logic is the race it opens since you now
> have two lookups, but overall, that's the kind of convoluted VCL that
> should work. The devil might be in the details.

Could you describe this race condition that you mean can happen? What could the 
worst case scenario be? If it is just a guru meditation for this single 
request, and it happens very rarely, then that is something I can live with. If 
it is something that can cause Varnish to crash or hang, then it is not 
something I can live with :)


> In this case you might want to combine your VCL restart logic with
> vmod_saintmode.

Yes, I have already heard some things about this vmod. I will definitely look 
into it. Thanks.


> And you might solve this problem with vmod_xkey!

We actually already use this vmod. But like I said, it doesn't solve the 
problem with new content that effects existing pages. Several pages might for 
example include information about the latest objects created in the system. If 
one of these pages were loaded and cached at time T1, and then at T2 a new 
object O2 was created, an "xkey purge" with the key "O2" will have no effect 
since that page was not associated with the "O2" key at time T1, because O2 
didn't even exist then.

And since there is no way to know beforehand which these pages are, the only 
bullet proof way I can see of handling this is to purge all pages* any time any 
content is updated.

* or at least a large subset of all pages, since the vast majority might 
include something related to newly created objects

___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Fix incorrect Last-Modified from backend?

2020-03-20 Thread Batanun B
On Thu, Mar 19, 2020 at 09:57 AM Dridi Boukelmoune  wrote:
>
> By the way, when it comes to revalidation based on the body, you
> should use ETag instead of Last-Modified.

Sadly, there is no ETag available. And I can't see any way of adding it without 
"hacking" the software (patching their code in an unsupported way, causing 
problems every time we want to upgrade that software).

But I'm waiting for a reply on a support case I created, asking about this. But 
if I had easy access to the body in Varnish this would be "trivial" to 
implement there.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Grace and misbehaving servers

2020-03-20 Thread Batanun B
On Thu , Mar 19, 2020 at 11:12 AM Dridi Boukelmoune  wrote:
>
> Not quite!
>
> ttl+grace+keep defines how long an object may stay in the cache
> (barring any form of invalidation).
>
> The grace I'm referring to is beresp.grace,

Well, when I wrote "if ttl + grace + keep is a low value set in 
vcl_backend_response", I was talking about beresp.grace, as in beresp.ttl + 
beresp.grace + beresp.keep.


> it defines how long we might serve a stale object while a background fetch is 
> in progress.

I'm not really seeing how that is different from what I said. If beresp.ttl + 
beresp.grace + beresp.keep is 10s in total, then a req.grace of say 24h 
wouldn't do much good, right? Or maybe I just misunderstood what you were 
saying here.


> As always in such cases it's not black or white. Depending on the
> nature of your web traffic you may want to put the cursor on always
> serving something, or never serving something stale. For example, live
> "real time" traffic may favor failing some requests over serving stale
> data.

Well, I was thinking of the typical "regular" small/medium website, like blogs, 
corporate profile, small town news etc.


> I agree that on paper it sounds simple, but in practice it might be
> harder to get right.

OK. But what if I implemented it in this way, in my VCL?

* In vcl_backend_response, set beresp.grace to 72h if status < 400
* In vcl_backend_error and vcl_backend_response (when status >= 500), return 
(abandon)
* In vcl_synth, restart the request, with a special req header set
* In vcl_recv, if this req header is present, set req.grace to 72h

Wouldn't this work? If no, why? If yes, would you say there is something else 
problematic with it? Of course I would have to handle some special cases, and 
maybe check req.restarts and such, but I'm talking about the thought process as 
a whole here. I might be missing something, but I think I would need someone to 
point it out to me because I just don't get why this would be wrong.


> Is it hurting you that less frequently requested contents don't stay
> in the cache?

If it results in people seeing error pages when a stale content would be 
perfectly fine for them, then yes.

And these less frequently requested pages might still be part of a group of 
pages that all result in an error in the backend (while the health probe still 
return 200 OK). So while one individual page might be visited infrequently, the 
total number of visits on these kind of pages might be high.

Lets say that there are 3.000 unique (and cachable) pages that are visited 
during an average weekend. And all of these are in the Varnish cache, but 2.000 
of these have stale content. Now lets say that 50% of all pages start returning 
500 errors from the backend, on a Friday evening. That would mean that about 
~1000 of these stale pages would result in the error displayed to the end users 
during that weekend. I would much more prefer if it were to still serve them 
stale content, and then I could look into the problem on Monday morning.


> Another option is to give Varnish a high TTL (and give clients a lower
> TTL) and trigger a form of invalidation directly from the backend when
> you know a resource changed.

Well, that is perfectly fine for pages that have a one-to-one mapping between 
the page (ie the URL) and the content updated. But most pages in our setup 
contain a mix of multiple contents, and it is not possible to know beforehand 
if a specific content will contribute to the result of a specific page. That is 
especially true for new content that might be included in multiple pages 
already in the cache.

The only way to handle that in a foolproof way, as far as I can tell, is to 
invalidate all pages (since any page can contain this kind of content) the 
moment any object is updated. But that would pretty much clear the cache 
constantly. And we would still have to handle the case where the cache is 
invalidated for a page that gives a 500 error when Varnish tries to fetch it.

___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Fix incorrect Last-Modified from backend?

2020-03-18 Thread Batanun B
Hi,

Long story short, one of our backend systems serves an incorrect Last-Modified 
response header, and I don't see a way to fix it at the source (third party 
system, not based on Nginx/Tomcat/IIS or anything like that).

So, I would like to "fix" it in Varnish, since I don't expect the maker of that 
software being able to fix this within a reasonable time. Is there a built in 
way in Varnish to make it generate it's own Last-Modified response header? 
Something like:

* If no stale object exists in cache, set Last-Modified to the value of the 
Date response header
* If a stale object exists in cache, and it's body content is identical to the 
newly fetched content, keep the Last-Modified from the step above
* If a stale object exists in cache, but it's body content is different to the 
newly fetched content, set Last-Modified to the value of the Date response 
header

Any suggestions on how to handle this situation? Any general Varnish guidelines 
when working with a backend that acts like this?

Regards
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Grace and misbehaving servers

2020-03-17 Thread Batanun B
Hi Dridi,

On Monday, March 16, 2020 9:58 AM Dridi Boukelmoune  wrote:

> Not really, it's actually the other way around. The beresp.grace
> variable defines how long you may serve an object past its TTL once it
> enters the cache.
> 
> Subsequent requests can then limit grace mode, so think of req.grace
> as a req.max_grace variable (which maybe hints that it should have
> been called that in the first place).

OK. So beresp.grace mainly effects how long the object can stay in the cache? 
And if ttl + grace + keep is a low value set in vcl_backend_response, then 
vcl_recv is limited in how high the grace can be? 

And req.grace doesn't effect the time that the object is in the cache? Even if 
req.grace is set to a low value on the very first request (ie the same request 
that triggers the call to the backend)?


> What you are describing is stale-if-error, something we don't support
> but could be approximated with somewhat convoluted VCL. It used to be
> easier when Varnish had saint mode built-in because it generally
> resulted in less convoluted VCL.
> 
> It's not something I would recommend attempting today.

That's strange. This stale-if-error sounds like something pretty much everyone 
would want, right? I mean, if there is is stale content available why show an 
error page to the end user?

But maybe it was my want to "cache/remember" previous failed fetches and that 
made it complicated? So if I loosen the requirements/wish-list a bit, into this:

Assuming that:
* A request comes in to Varnish
* The content is stale, but still in the cache
* The backend is considered healthy
* The short (10s) grace has expired
* Varnish triggers a synchronus fetch in the backend
* This fetch fails (timeout or 5xx error)

I would then like Varnish to:
* Return the stale content

Would this be possible using basic Varnish community edition, without a 
"convoluted VCL", as you put it? Is it possible without triggering a restart of 
the request? Either way, I am interested in hearing about how it can be 
achieved. Is there any documentation or blog post that mentions this? Or can 
you give me some example code perhaps? Even a convoluted example would be OK by 
me.

Increasing the req.grace value for every request is not an option, since we 
only want to serve old content if Varnish can't get hold of new content. And 
some of our pages are visited very rarely, so we can't rely on a constant 
stream of visitors keeping the content fresh in the cache.

Regards
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc