On Wed, Mar 04, 2009 at 06:16:10PM +1300, Ross Brown wrote: > When this problem happens, the backend is still reachable and happily > serving images. It is not a particularly busy period for us (600 > requests/sec/Varnish server - approx 350Mbps outbound each - we got up to > nearly 3 times that level without incident previously) but for some > reason unknown to us, the servers just suddenly stop processing requests > and worker processes increase dramatically. > > After the lockup happened last time, I tried firing up varnishlog and > hitting the server directly - my requests were not showing up at all. The > *only* entries in the varnish log were related to worker processes being > killed over time - no PINGs, PONGs, load balancer healthchecks or > anything related to 'normal' varnish activity. It's as if varnishd has > completely locked up, but we can't understand what causes both our > varnish servers to exhibit this behaviour at exactly the same time, nor > why varnish does not detect it and attempt a restart. After a restart, > varnish is fine and behaves itself. > > There is nothing to indicate an error with the backend, nor anything in > syslog to indicate a Varnish problem. Pointers of any kind would be > appreciated :)
Have you checked dmesg? Do you have any estimate of how simultaneous these freezes are? (seconds, minutes or tens of minutes apart for instance?). Your hit rate is quite low (78%ish) and it doesn't seem like you have grace enabled, which I strongly recommend. If dmesg doesn't reveal any troubles, I'd start by setting up grace (req.grace = 30s; and obj.grace = 30s; will get you far) and focusing on getting that hit rate up. If all you're serving is images, chances are that you should be able to top 99% which would make Varnish considerably more resilient to hiccoughs from backends. You also have a few backend failures which could easily trigger Bad Things with no grace and a low hit rate. You should also consider starting with -p cli_timeout=20 or similar, as the default can be far too aggressive on a busy site. Any entries in the syslog or varnishlog entries related to this would be helpful for further debugging. -- Kristian Lyngstøl Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497
pgpbxBHWDOkBk.pgp
Description: PGP signature
_______________________________________________ varnish-misc mailing list [email protected] http://projects.linpro.no/mailman/listinfo/varnish-misc
