Hey guys, I've seen something I'd like to share with you, perhaps it could be seen as a bug in varnishstat.
Yesterday I opened ssh sessions to my 4 balancers, to run some scripts, and then I opened varnishstat to monitor them. A while later I had to leave in a rush and closed my laptop's lid, and in that process killed my vpn tunnel and ssh sessions. However, the varnishstat process (apparently) keeps running. (FreeBSD 7.2 x64) Just a few hours ago (so around 16 hours later), I had one balancer die on my (become completely unresponsive, refuse connections to port 80). I immediately restarted varnishd, and I also saw a varnishstat instance eat 100% cpu, which I killed. Now when I just looked on the other balancers, I see the varnishstat instance using up a lot of CPU (only one out of 4 cores though): last pid: 77863; load averages: 1.40, 1.48, 1.47 up 105+00:24:26 14:56:40 166 processes: 2 running, 164 sleeping CPU: 27.1% user, 0.0% nice, 4.2% system, 1.9% interrupt, 66.8% idle Mem: 6430M Active, 550M Inact, 709M Wired, 189M Cache, 399M Buf, 32M Free Swap: 4096M Total, 228M Used, 3868M Free, 5% Inuse PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 69587 root 1 112 0 95640K 1044K CPU3 3 19.1H 77.20% varnishstat 76211 haproxy 1 4 0 48928K 18944K kqread 1 16:34 3.17% haproxy 68762 www 116 44 0 8756M 6412M select 0 0:01 0.39% varnishd 31203 root 1 44 0 176M 5476K select 2 439:16 0.00% snmpd 69527 root 1 8 0 94312K 83384K nanslp 0 11:59 0.00% varnishncsa 37934 root 1 4 0 66244K 3164K kqread 0 8:46 0.00% squid 1912 root 1 44 0 10484K 724K select 0 7:50 0.00% ntpd 2036 root 1 44 0 85732K 3528K select 1 4:12 0.00% httpd 56664 root 1 44 0 5692K 616K select 2 0:51 0.00% syslogd 2056 root 1 8 0 6748K 392K nanslp 2 0:33 0.00% cron 2023 root 1 4 0 5808K 428K kqread 0 0:23 0.00% master 2031 postfix 1 4 0 5808K 408K kqread 0 0:22 0.00% qmgr 76181 www 1 4 0 85732K 3732K kqread 3 0:01 0.00% httpd 76182 www 1 20 0 85732K 3716K lockf 3 0:01 0.00% httpd 76185 www 1 20 0 85732K 3696K lockf 2 0:01 0.00% httpd 76298 www 1 20 0 85732K 3868K lockf 3 0:01 0.00% httpd So it seems running varnishstat for a long time, it will use more and more resources, and in my case, even cause varnishd to fail somehow (it could be a coincidence, but I don't think so). After killing varnishstat, load went back from 1.5 to 0.2, around the usual. -- With kind regards, Angelo Höngens Systems Administrator ------------------------------------------ NetMatch tourism internet software solutions Ringbaan Oost 2b 5013 CA Tilburg T: +31 (0)13 5811088 F: +31 (0)13 5821239 mailto:[email protected] http://www.netmatch.nl ------------------------------------------ _______________________________________________ varnish-misc mailing list [email protected] http://lists.varnish-cache.org/mailman/listinfo/varnish-misc
