Nice! It may have been the cause, time will tell.can you report back in a few days to let us know? -- Guillaume Quintard
On Jun 26, 2017 20:21, "Stefano Baldo" <[email protected]> wrote: > Hi Guillaume. > > I think things will start to going better now after changing the bans. > This is how my last varnishstat looked like moments before a crash > regarding the bans: > > MAIN.bans 41336 . Count of bans > MAIN.bans_completed 37967 . Number of bans marked > 'completed' > MAIN.bans_obj 0 . Number of bans using obj.* > MAIN.bans_req 41335 . Number of bans using req.* > MAIN.bans_added 41336 0.68 Bans added > MAIN.bans_deleted 0 0.00 Bans deleted > > And this is how it looks like now: > > MAIN.bans 2 . Count of bans > MAIN.bans_completed 1 . Number of bans marked > 'completed' > MAIN.bans_obj 2 . Number of bans using obj.* > MAIN.bans_req 0 . Number of bans using req.* > MAIN.bans_added 2016 0.69 Bans added > MAIN.bans_deleted 2014 0.69 Bans deleted > > Before the changes, bans were never deleted! > Now the bans are added and quickly deleted after a minute or even a couple > of seconds. > > May this was the cause of the problem? It seems like varnish was having a > large number of bans to manage and test against. > I will let it ride now. Let's see if the problem persists or it's gone! :-) > > Best, > Stefano > > > On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard < > [email protected]> wrote: > >> Looking good! >> >> -- >> Guillaume Quintard >> >> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <[email protected]> >> wrote: >> >>> Hi Guillaume, >>> >>> Can the following be considered "ban lurker friendly"? >>> >>> sub vcl_backend_response { >>> set beresp.http.x-url = bereq.http.host + bereq.url; >>> set beresp.http.x-user-agent = bereq.http.user-agent; >>> } >>> >>> sub vcl_recv { >>> if (req.method == "PURGE") { >>> ban("obj.http.x-url == " + req.http.host + req.url + " && >>> obj.http.x-user-agent !~ Googlebot"); >>> return(synth(750)); >>> } >>> } >>> >>> sub vcl_deliver { >>> unset resp.http.x-url; >>> unset resp.http.x-user-agent; >>> } >>> >>> Best, >>> Stefano >>> >>> >>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard < >>> [email protected]> wrote: >>> >>>> Not lurker friendly at all indeed. You'll need to avoid req.* >>>> expression. Easiest way is to stash the host, user-agent and url in >>>> beresp.http.* and ban against those (unset them in vcl_deliver). >>>> >>>> I don't think you need to expand the VSL at all. >>>> >>>> -- >>>> Guillaume Quintard >>>> >>>> On Jun 26, 2017 16:51, "Stefano Baldo" <[email protected]> wrote: >>>> >>>> Hi Guillaume. >>>> >>>> Thanks for answering. >>>> >>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase >>>> performance but it stills restarting. >>>> Also, I checked the I/O performance for the disk and there is no signal >>>> of overhead. >>>> >>>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m >>>> default size passing "-l 200m,20m" to varnishd and using >>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was a >>>> problem here. After a couple of hours varnish died and I received a "no >>>> space left on device" message - deleting the /var/lib/varnish solved the >>>> problem and varnish was up again, but it's weird because there was free >>>> memory on the host to be used with the tmpfs directory, so I don't know >>>> what could have happened. I will try to stop increasing the >>>> /var/lib/varnish size. >>>> >>>> Anyway, I am worried about the bans. You asked me if the bans are >>>> lurker friedly. Well, I don't think so. My bans are created this way: >>>> >>>> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url + >>>> " && req.http.User-Agent !~ Googlebot"); >>>> >>>> Are they lurker friendly? I was taking a quick look and the >>>> documentation and it looks like they're not. >>>> >>>> Best, >>>> Stefano >>>> >>>> >>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard < >>>> [email protected]> wrote: >>>> >>>>> Hi Stefano, >>>>> >>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck >>>>> trying to push/pull data and can't make time to reply to the CLI. I'd >>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm. >>>>> >>>>> After some time, the file storage is terrible on a hard drive (SSDs >>>>> take a bit more time to degrade) because of fragmentation. One solution to >>>>> help the disks cope is to overprovision themif they're SSDs, and you can >>>>> try different advices in the file storage definition in the command line >>>>> (last parameter, after granularity). >>>>> >>>>> Is your /var/lib/varnish mount on tmpfs? That could help too. >>>>> >>>>> 40K bans is a lot, are they ban-lurker friendly? >>>>> >>>>> -- >>>>> Guillaume Quintard >>>>> >>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <[email protected] >>>>> > wrote: >>>>> >>>>>> Hello. >>>>>> >>>>>> I am having a critical problem with Varnish Cache in production for >>>>>> over a month and any help will be appreciated. >>>>>> The problem is that Varnish child process is recurrently being >>>>>> restarted after 10~20h of use, with the following message: >>>>>> >>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not >>>>>> responding to CLI, killed it. >>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from >>>>>> ping: 400 CLI communication error >>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died >>>>>> signal=9 >>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete >>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started >>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said >>>>>> Child starts >>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said >>>>>> SMF.s0 mmap'ed 483183820800 bytes of 483183820800 >>>>>> >>>>>> The following link is the varnishstat output just 1 minute before a >>>>>> restart: >>>>>> >>>>>> https://pastebin.com/g0g5RVTs >>>>>> >>>>>> Environment: >>>>>> >>>>>> varnish-5.1.2 revision 6ece695 >>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0) >>>>>> Installed using pre-built package from official repo at >>>>>> packagecloud.io >>>>>> CPU 2x2.9 GHz >>>>>> Mem 3.69 GiB >>>>>> Running inside a Docker container >>>>>> NFILES=131072 >>>>>> MEMLOCK=82000 >>>>>> >>>>>> Additional info: >>>>>> >>>>>> - I need to cache a large number of objets and the cache should last >>>>>> for almost a week, so I have set up a 450G storage space, I don't know if >>>>>> this is a problem; >>>>>> - I use ban a lot. There was about 40k bans in the system just before >>>>>> the last crash. I really don't know if this is too much or may have >>>>>> anything to do with it; >>>>>> - No registered CPU spikes (almost always by 30%); >>>>>> - No panic is reported, the only info I can retrieve is from syslog; >>>>>> - During all the time, event moments before the crashes, everything >>>>>> is okay and requests are being responded very fast. >>>>>> >>>>>> Best, >>>>>> Stefano Baldo >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> varnish-misc mailing list >>>>>> [email protected] >>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc >>>>>> >>>>> >>>>> >>>> >>>> >>> >> >
_______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
