Hi - We run 3 Varnish instance in EC2 behind Load Balancer - This setup has performed solidly since it's installation.
This morning I came in to find that 2 of the 3 instances thrashing the CPU and failing to serve requests. Our monitoring shows that there was a notable (20%) CPU steal in availability zones A & B starting at 6:30, but I note that this has also occurred in the past and has not caused us any issues previously. We've restarted one of the problem instances and dropped the other out the load balancer to perform root cause analysis. The dropped host is not not serving any requests but is still maxing out the CPU. There are 50 varnish threads running, and ps thread dump reveals a single thread spinning at >90%. root 1235 1 1235 0 1 May03 ? 00:00:00 /bin/bash /etc/rc2.d/S20varnishlog-backend start root 1236 1235 1236 0 1 May03 ? 00:59:11 varnishlog -u -i Backend_health root 1240 1235 1240 0 1 May03 ? 00:00:26 logger -t varnishlog root 12688 1 12688 0 1 Jul17 ? 00:00:09 /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f /etc/varnish/varnish.vcl -s malloc,6800M <snip> nobody 13015 12688 21589 0 48 06:30 ? 00:00:00 /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f /etc/varnish/varnish.vcl -s malloc,6800M nobody 13015 12688 21611 0 48 06:30 ? 00:00:00 /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f /etc/varnish/varnish.vcl -s malloc,6800M nobody 13015 12688 21612 93 48 06:30 ? 03:39:06 /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f /etc/varnish/varnish.vcl -s malloc,6800M nobody 13015 12688 21614 0 48 06:30 ? 00:00:00 /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f /etc/varnish/varnish.vcl -s malloc,6800M <snip> Can anyone recommend next steps in terms of dianosing what's going on here? I'm at a loss! Thanks in advance, Neil Saunders _______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
