Sure, will do! On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard < [email protected]> wrote:
> Nice! It may have been the cause, time will tell.can you report back in a > few days to let us know? > -- > Guillaume Quintard > > On Jun 26, 2017 20:21, "Stefano Baldo" <[email protected]> wrote: > >> Hi Guillaume. >> >> I think things will start to going better now after changing the bans. >> This is how my last varnishstat looked like moments before a crash >> regarding the bans: >> >> MAIN.bans 41336 . Count of bans >> MAIN.bans_completed 37967 . Number of bans marked >> 'completed' >> MAIN.bans_obj 0 . Number of bans using >> obj.* >> MAIN.bans_req 41335 . Number of bans using >> req.* >> MAIN.bans_added 41336 0.68 Bans added >> MAIN.bans_deleted 0 0.00 Bans deleted >> >> And this is how it looks like now: >> >> MAIN.bans 2 . Count of bans >> MAIN.bans_completed 1 . Number of bans marked >> 'completed' >> MAIN.bans_obj 2 . Number of bans using >> obj.* >> MAIN.bans_req 0 . Number of bans using >> req.* >> MAIN.bans_added 2016 0.69 Bans added >> MAIN.bans_deleted 2014 0.69 Bans deleted >> >> Before the changes, bans were never deleted! >> Now the bans are added and quickly deleted after a minute or even a >> couple of seconds. >> >> May this was the cause of the problem? It seems like varnish was having a >> large number of bans to manage and test against. >> I will let it ride now. Let's see if the problem persists or it's gone! >> :-) >> >> Best, >> Stefano >> >> >> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard < >> [email protected]> wrote: >> >>> Looking good! >>> >>> -- >>> Guillaume Quintard >>> >>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <[email protected]> >>> wrote: >>> >>>> Hi Guillaume, >>>> >>>> Can the following be considered "ban lurker friendly"? >>>> >>>> sub vcl_backend_response { >>>> set beresp.http.x-url = bereq.http.host + bereq.url; >>>> set beresp.http.x-user-agent = bereq.http.user-agent; >>>> } >>>> >>>> sub vcl_recv { >>>> if (req.method == "PURGE") { >>>> ban("obj.http.x-url == " + req.http.host + req.url + " && >>>> obj.http.x-user-agent !~ Googlebot"); >>>> return(synth(750)); >>>> } >>>> } >>>> >>>> sub vcl_deliver { >>>> unset resp.http.x-url; >>>> unset resp.http.x-user-agent; >>>> } >>>> >>>> Best, >>>> Stefano >>>> >>>> >>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard < >>>> [email protected]> wrote: >>>> >>>>> Not lurker friendly at all indeed. You'll need to avoid req.* >>>>> expression. Easiest way is to stash the host, user-agent and url in >>>>> beresp.http.* and ban against those (unset them in vcl_deliver). >>>>> >>>>> I don't think you need to expand the VSL at all. >>>>> >>>>> -- >>>>> Guillaume Quintard >>>>> >>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <[email protected]> wrote: >>>>> >>>>> Hi Guillaume. >>>>> >>>>> Thanks for answering. >>>>> >>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase >>>>> performance but it stills restarting. >>>>> Also, I checked the I/O performance for the disk and there is no >>>>> signal of overhead. >>>>> >>>>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m >>>>> default size passing "-l 200m,20m" to varnishd and using >>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was a >>>>> problem here. After a couple of hours varnish died and I received a "no >>>>> space left on device" message - deleting the /var/lib/varnish solved the >>>>> problem and varnish was up again, but it's weird because there was free >>>>> memory on the host to be used with the tmpfs directory, so I don't know >>>>> what could have happened. I will try to stop increasing the >>>>> /var/lib/varnish size. >>>>> >>>>> Anyway, I am worried about the bans. You asked me if the bans are >>>>> lurker friedly. Well, I don't think so. My bans are created this way: >>>>> >>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url + >>>>> " && req.http.User-Agent !~ Googlebot"); >>>>> >>>>> Are they lurker friendly? I was taking a quick look and the >>>>> documentation and it looks like they're not. >>>>> >>>>> Best, >>>>> Stefano >>>>> >>>>> >>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Stefano, >>>>>> >>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck >>>>>> trying to push/pull data and can't make time to reply to the CLI. I'd >>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm. >>>>>> >>>>>> After some time, the file storage is terrible on a hard drive (SSDs >>>>>> take a bit more time to degrade) because of fragmentation. One solution >>>>>> to >>>>>> help the disks cope is to overprovision themif they're SSDs, and you can >>>>>> try different advices in the file storage definition in the command line >>>>>> (last parameter, after granularity). >>>>>> >>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too. >>>>>> >>>>>> 40K bans is a lot, are they ban-lurker friendly? >>>>>> >>>>>> -- >>>>>> Guillaume Quintard >>>>>> >>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hello. >>>>>>> >>>>>>> I am having a critical problem with Varnish Cache in production for >>>>>>> over a month and any help will be appreciated. >>>>>>> The problem is that Varnish child process is recurrently being >>>>>>> restarted after 10~20h of use, with the following message: >>>>>>> >>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not >>>>>>> responding to CLI, killed it. >>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from >>>>>>> ping: 400 CLI communication error >>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died >>>>>>> signal=9 >>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete >>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started >>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said >>>>>>> Child starts >>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said >>>>>>> SMF.s0 mmap'ed 483183820800 bytes of 483183820800 >>>>>>> >>>>>>> The following link is the varnishstat output just 1 minute before a >>>>>>> restart: >>>>>>> >>>>>>> https://pastebin.com/g0g5RVTs >>>>>>> >>>>>>> Environment: >>>>>>> >>>>>>> varnish-5.1.2 revision 6ece695 >>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0) >>>>>>> Installed using pre-built package from official repo at >>>>>>> packagecloud.io >>>>>>> CPU 2x2.9 GHz >>>>>>> Mem 3.69 GiB >>>>>>> Running inside a Docker container >>>>>>> NFILES=131072 >>>>>>> MEMLOCK=82000 >>>>>>> >>>>>>> Additional info: >>>>>>> >>>>>>> - I need to cache a large number of objets and the cache should last >>>>>>> for almost a week, so I have set up a 450G storage space, I don't know >>>>>>> if >>>>>>> this is a problem; >>>>>>> - I use ban a lot. There was about 40k bans in the system just >>>>>>> before the last crash. I really don't know if this is too much or may >>>>>>> have >>>>>>> anything to do with it; >>>>>>> - No registered CPU spikes (almost always by 30%); >>>>>>> - No panic is reported, the only info I can retrieve is from syslog; >>>>>>> - During all the time, event moments before the crashes, everything >>>>>>> is okay and requests are being responded very fast. >>>>>>> >>>>>>> Best, >>>>>>> Stefano Baldo >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> varnish-misc mailing list >>>>>>> [email protected] >>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>
_______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
