SMA.Transient.g_alloc 3518 . Allocations outstanding SMA.Transient.g_bytes 546390 . Bytes outstanding SMA.Transient.g_space 0 . Bytes available
g_space is always 0. It could mean anything? On Wed, Jun 28, 2017 at 10:43 AM, Guillaume Quintard < [email protected]> wrote: > Yeah, I was wondering about Transient, but it seems under control. > > Apart from moving away from file storage, I have nothing at the moment :-/ > > -- > Guillaume Quintard > > On Wed, Jun 28, 2017 at 3:39 PM, Stefano Baldo <[email protected]> > wrote: > >> Hi. >> >> root@2c6c325b279f:/# varnishstat -1 | grep g_bytes >> SMA.Transient.g_bytes 519022 . Bytes >> outstanding >> SMF.s0.g_bytes 23662845952 . Bytes >> outstanding >> >> You mean g_bytes from SMA.Transient? I have set no malloc storage. >> >> >> On Wed, Jun 28, 2017 at 10:26 AM, Guillaume Quintard < >> [email protected]> wrote: >> >>> Hi, >>> >>> can you look that "varnishstat -1 | grep g_bytes" and see if if matches >>> the memory you are seeing? >>> >>> -- >>> Guillaume Quintard >>> >>> On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <[email protected]> >>> wrote: >>> >>>> Hi Guillaume. >>>> >>>> I increased the cli_timeout yesterday to 900sec (15min) and it >>>> restarted anyway, which seems to indicate that the thread is really >>>> stalled. >>>> >>>> This was 1 minute after the last restart: >>>> >>>> MAIN.n_object 3908216 . object structs made >>>> SMF.s0.g_alloc 7794510 . Allocations outstanding >>>> >>>> I've just changed the I/O Scheduler to noop to see what happens. >>>> >>>> One interest thing I've found is about the memory usage. >>>> >>>> In the 1st minute of use: >>>> MemTotal: 3865572 kB >>>> MemFree: 120768 kB >>>> MemAvailable: 2300268 kB >>>> >>>> 1 minute before a restart: >>>> MemTotal: 3865572 kB >>>> MemFree: 82480 kB >>>> MemAvailable: 68316 kB >>>> >>>> It seems like the system is possibly running out of memory. >>>> >>>> When calling varnishd, I'm specifying only "-s file,..." as storage. I >>>> see in some examples that is common to use "-s file" AND "-s malloc" >>>> together. Should I be passing "-s malloc" as well to somehow try to limit >>>> the memory usage by varnishd? >>>> >>>> Best, >>>> Stefano >>>> >>>> >>>> On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard < >>>> [email protected]> wrote: >>>> >>>>> Sadly, nothing suspicious here, you can still try: >>>>> - bumping the cli_timeout >>>>> - changing your disk scheduler >>>>> - changing the advice option of the file storage >>>>> >>>>> I'm still convinced this is due to Varnish getting stuck waiting for >>>>> the disk because of the file storage fragmentation. >>>>> >>>>> Maybe you could look at SMF.*.g_alloc and compare it to the number of >>>>> objects. Ideally, we would have a 1:1 relation between objects and >>>>> allocations. If that number drops prior to a restart, that would be a good >>>>> clue. >>>>> >>>>> >>>>> -- >>>>> Guillaume Quintard >>>>> >>>>> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Guillaume. >>>>>> >>>>>> It keeps restarting. >>>>>> Would you mind taking a quick look in the following VCL file to check >>>>>> if you find anything suspicious? >>>>>> >>>>>> Thank you very much. >>>>>> >>>>>> Best, >>>>>> Stefano >>>>>> >>>>>> vcl 4.0; >>>>>> >>>>>> import std; >>>>>> >>>>>> backend default { >>>>>> .host = "sites-web-server-lb"; >>>>>> .port = "80"; >>>>>> } >>>>>> >>>>>> include "/etc/varnish/bad_bot_detection.vcl"; >>>>>> >>>>>> sub vcl_recv { >>>>>> call bad_bot_detection; >>>>>> >>>>>> if (req.url == "/nocache" || req.url == "/version") { >>>>>> return(pass); >>>>>> } >>>>>> >>>>>> unset req.http.Cookie; >>>>>> if (req.method == "PURGE") { >>>>>> ban("obj.http.x-host == " + req.http.host + " && >>>>>> obj.http.x-user-agent !~ Googlebot"); >>>>>> return(synth(750)); >>>>>> } >>>>>> >>>>>> set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/"); >>>>>> } >>>>>> >>>>>> sub vcl_synth { >>>>>> if (resp.status == 750) { >>>>>> set resp.status = 200; >>>>>> synthetic("PURGED => " + req.url); >>>>>> return(deliver); >>>>>> } elsif (resp.status == 501) { >>>>>> set resp.status = 200; >>>>>> set resp.http.Content-Type = "text/html; charset=utf-8"; >>>>>> synthetic(std.fileread("/etc/varnish/pages/invalid_domain.ht >>>>>> ml")); >>>>>> return(deliver); >>>>>> } >>>>>> } >>>>>> >>>>>> sub vcl_backend_response { >>>>>> unset beresp.http.Set-Cookie; >>>>>> set beresp.http.x-host = bereq.http.host; >>>>>> set beresp.http.x-user-agent = bereq.http.user-agent; >>>>>> >>>>>> if (bereq.url == "/themes/basic/assets/theme.min.css" >>>>>> || bereq.url == "/api/events/PAGEVIEW" >>>>>> || bereq.url ~ "^\/assets\/img\/") { >>>>>> set beresp.http.Cache-Control = "max-age=0"; >>>>>> } else { >>>>>> unset beresp.http.Cache-Control; >>>>>> } >>>>>> >>>>>> if (beresp.status == 200 || >>>>>> beresp.status == 301 || >>>>>> beresp.status == 302 || >>>>>> beresp.status == 404) { >>>>>> if (bereq.url ~ "\&ordenar=aleatorio$") { >>>>>> set beresp.http.X-TTL = "1d"; >>>>>> set beresp.ttl = 1d; >>>>>> } else { >>>>>> set beresp.http.X-TTL = "1w"; >>>>>> set beresp.ttl = 1w; >>>>>> } >>>>>> } >>>>>> >>>>>> if (bereq.url !~ >>>>>> "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$") >>>>>> { >>>>>> set beresp.do_gzip = true; >>>>>> } >>>>>> } >>>>>> >>>>>> sub vcl_pipe { >>>>>> set bereq.http.connection = "close"; >>>>>> return (pipe); >>>>>> } >>>>>> >>>>>> sub vcl_deliver { >>>>>> unset resp.http.x-host; >>>>>> unset resp.http.x-user-agent; >>>>>> } >>>>>> >>>>>> sub vcl_backend_error { >>>>>> if (beresp.status == 502 || beresp.status == 503 || beresp.status >>>>>> == 504) { >>>>>> set beresp.status = 200; >>>>>> set beresp.http.Content-Type = "text/html; charset=utf-8"; >>>>>> synthetic(std.fileread("/etc/varnish/pages/maintenance.html")); >>>>>> return (deliver); >>>>>> } >>>>>> } >>>>>> >>>>>> sub vcl_hash { >>>>>> if (req.http.User-Agent ~ "Google Page Speed") { >>>>>> hash_data("Google Page Speed"); >>>>>> } elsif (req.http.User-Agent ~ "Googlebot") { >>>>>> hash_data("Googlebot"); >>>>>> } >>>>>> } >>>>>> >>>>>> sub vcl_deliver { >>>>>> if (resp.status == 501) { >>>>>> return (synth(resp.status)); >>>>>> } >>>>>> if (obj.hits > 0) { >>>>>> set resp.http.X-Cache = "hit"; >>>>>> } else { >>>>>> set resp.http.X-Cache = "miss"; >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Nice! It may have been the cause, time will tell.can you report back >>>>>>> in a few days to let us know? >>>>>>> -- >>>>>>> Guillaume Quintard >>>>>>> >>>>>>> On Jun 26, 2017 20:21, "Stefano Baldo" <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Guillaume. >>>>>>>> >>>>>>>> I think things will start to going better now after changing the >>>>>>>> bans. >>>>>>>> This is how my last varnishstat looked like moments before a crash >>>>>>>> regarding the bans: >>>>>>>> >>>>>>>> MAIN.bans 41336 . Count of bans >>>>>>>> MAIN.bans_completed 37967 . Number of bans >>>>>>>> marked 'completed' >>>>>>>> MAIN.bans_obj 0 . Number of bans >>>>>>>> using obj.* >>>>>>>> MAIN.bans_req 41335 . Number of bans >>>>>>>> using req.* >>>>>>>> MAIN.bans_added 41336 0.68 Bans added >>>>>>>> MAIN.bans_deleted 0 0.00 Bans deleted >>>>>>>> >>>>>>>> And this is how it looks like now: >>>>>>>> >>>>>>>> MAIN.bans 2 . Count of bans >>>>>>>> MAIN.bans_completed 1 . Number of bans >>>>>>>> marked 'completed' >>>>>>>> MAIN.bans_obj 2 . Number of bans >>>>>>>> using obj.* >>>>>>>> MAIN.bans_req 0 . Number of bans >>>>>>>> using req.* >>>>>>>> MAIN.bans_added 2016 0.69 Bans added >>>>>>>> MAIN.bans_deleted 2014 0.69 Bans deleted >>>>>>>> >>>>>>>> Before the changes, bans were never deleted! >>>>>>>> Now the bans are added and quickly deleted after a minute or even a >>>>>>>> couple of seconds. >>>>>>>> >>>>>>>> May this was the cause of the problem? It seems like varnish was >>>>>>>> having a large number of bans to manage and test against. >>>>>>>> I will let it ride now. Let's see if the problem persists or it's >>>>>>>> gone! :-) >>>>>>>> >>>>>>>> Best, >>>>>>>> Stefano >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Looking good! >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Guillaume Quintard >>>>>>>>> >>>>>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Guillaume, >>>>>>>>>> >>>>>>>>>> Can the following be considered "ban lurker friendly"? >>>>>>>>>> >>>>>>>>>> sub vcl_backend_response { >>>>>>>>>> set beresp.http.x-url = bereq.http.host + bereq.url; >>>>>>>>>> set beresp.http.x-user-agent = bereq.http.user-agent; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> sub vcl_recv { >>>>>>>>>> if (req.method == "PURGE") { >>>>>>>>>> ban("obj.http.x-url == " + req.http.host + req.url + " && >>>>>>>>>> obj.http.x-user-agent !~ Googlebot"); >>>>>>>>>> return(synth(750)); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> sub vcl_deliver { >>>>>>>>>> unset resp.http.x-url; >>>>>>>>>> unset resp.http.x-user-agent; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Stefano >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.* >>>>>>>>>>> expression. Easiest way is to stash the host, user-agent and url in >>>>>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver). >>>>>>>>>>> >>>>>>>>>>> I don't think you need to expand the VSL at all. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Guillaume Quintard >>>>>>>>>>> >>>>>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Guillaume. >>>>>>>>>>> >>>>>>>>>>> Thanks for answering. >>>>>>>>>>> >>>>>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase >>>>>>>>>>> performance but it stills restarting. >>>>>>>>>>> Also, I checked the I/O performance for the disk and there is no >>>>>>>>>>> signal of overhead. >>>>>>>>>>> >>>>>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its >>>>>>>>>>> 80m default size passing "-l 200m,20m" to varnishd and using >>>>>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There >>>>>>>>>>> was a problem here. After a couple of hours varnish died and I >>>>>>>>>>> received a >>>>>>>>>>> "no space left on device" message - deleting the /var/lib/varnish >>>>>>>>>>> solved >>>>>>>>>>> the problem and varnish was up again, but it's weird because there >>>>>>>>>>> was free >>>>>>>>>>> memory on the host to be used with the tmpfs directory, so I don't >>>>>>>>>>> know >>>>>>>>>>> what could have happened. I will try to stop increasing the >>>>>>>>>>> /var/lib/varnish size. >>>>>>>>>>> >>>>>>>>>>> Anyway, I am worried about the bans. You asked me if the bans >>>>>>>>>>> are lurker friedly. Well, I don't think so. My bans are created >>>>>>>>>>> this way: >>>>>>>>>>> >>>>>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " + >>>>>>>>>>> req.url + " && req.http.User-Agent !~ Googlebot"); >>>>>>>>>>> >>>>>>>>>>> Are they lurker friendly? I was taking a quick look and the >>>>>>>>>>> documentation and it looks like they're not. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Stefano >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Stefano, >>>>>>>>>>>> >>>>>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets >>>>>>>>>>>> stuck trying to push/pull data and can't make time to reply to the >>>>>>>>>>>> CLI. I'd >>>>>>>>>>>> recommend monitoring the disk activity (bandwidth and iops) to >>>>>>>>>>>> confirm. >>>>>>>>>>>> >>>>>>>>>>>> After some time, the file storage is terrible on a hard drive >>>>>>>>>>>> (SSDs take a bit more time to degrade) because of fragmentation. >>>>>>>>>>>> One >>>>>>>>>>>> solution to help the disks cope is to overprovision themif they're >>>>>>>>>>>> SSDs, >>>>>>>>>>>> and you can try different advices in the file storage definition >>>>>>>>>>>> in the >>>>>>>>>>>> command line (last parameter, after granularity). >>>>>>>>>>>> >>>>>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too. >>>>>>>>>>>> >>>>>>>>>>>> 40K bans is a lot, are they ban-lurker friendly? >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Guillaume Quintard >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello. >>>>>>>>>>>>> >>>>>>>>>>>>> I am having a critical problem with Varnish Cache in >>>>>>>>>>>>> production for over a month and any help will be appreciated. >>>>>>>>>>>>> The problem is that Varnish child process is recurrently being >>>>>>>>>>>>> restarted after 10~20h of use, with the following message: >>>>>>>>>>>>> >>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) >>>>>>>>>>>>> not responding to CLI, killed it. >>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply >>>>>>>>>>>>> from ping: 400 CLI communication error >>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) >>>>>>>>>>>>> died signal=9 >>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup >>>>>>>>>>>>> complete >>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) >>>>>>>>>>>>> Started >>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) >>>>>>>>>>>>> said Child starts >>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) >>>>>>>>>>>>> said SMF.s0 mmap'ed 483183820800 bytes of 483183820800 >>>>>>>>>>>>> >>>>>>>>>>>>> The following link is the varnishstat output just 1 minute >>>>>>>>>>>>> before a restart: >>>>>>>>>>>>> >>>>>>>>>>>>> https://pastebin.com/g0g5RVTs >>>>>>>>>>>>> >>>>>>>>>>>>> Environment: >>>>>>>>>>>>> >>>>>>>>>>>>> varnish-5.1.2 revision 6ece695 >>>>>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0) >>>>>>>>>>>>> Installed using pre-built package from official repo at >>>>>>>>>>>>> packagecloud.io >>>>>>>>>>>>> CPU 2x2.9 GHz >>>>>>>>>>>>> Mem 3.69 GiB >>>>>>>>>>>>> Running inside a Docker container >>>>>>>>>>>>> NFILES=131072 >>>>>>>>>>>>> MEMLOCK=82000 >>>>>>>>>>>>> >>>>>>>>>>>>> Additional info: >>>>>>>>>>>>> >>>>>>>>>>>>> - I need to cache a large number of objets and the cache >>>>>>>>>>>>> should last for almost a week, so I have set up a 450G storage >>>>>>>>>>>>> space, I >>>>>>>>>>>>> don't know if this is a problem; >>>>>>>>>>>>> - I use ban a lot. There was about 40k bans in the system just >>>>>>>>>>>>> before the last crash. I really don't know if this is too much or >>>>>>>>>>>>> may have >>>>>>>>>>>>> anything to do with it; >>>>>>>>>>>>> - No registered CPU spikes (almost always by 30%); >>>>>>>>>>>>> - No panic is reported, the only info I can retrieve is from >>>>>>>>>>>>> syslog; >>>>>>>>>>>>> - During all the time, event moments before the crashes, >>>>>>>>>>>>> everything is okay and requests are being responded very fast. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Stefano Baldo >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> varnish-misc mailing list >>>>>>>>>>>>> [email protected] >>>>>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish >>>>>>>>>>>>> -misc >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
