Same here. I have encounter this problem after migrating from linux+2.0.1 to solaris+2.0.2.
With 2.0.2 it happens randomly. Sometimes varnish runs flawlessly for days, and sometimes locks up couple of times in a short period of time. Maybe you should try 2.0.1 on one of your test servers and compare how they behave? Best regards, Bartek 2009/3/4 Ross Brown <[email protected]> > Hi all > > We are hoping to use Varnish for serving image content on our reasonably > busy auction site here in New Zealand, but are having an interesting problem > during testing. > > We are using latest Varnish (2.0.3) on Ubuntu 8.10 server (64-bit) and have > built two servers for testing - both are located in the same datacentre and > situated behind an F5 hardware load balancer. We want to keep all images > cached in RAM and are using Varnish with jemalloc to achieve this. For the > most part, Varnish is working well for us and performance is great. > > However, we have seen both our Varnish servers lock up at precisely the > same time and stop processing incoming HTTP requests until Varnishd is > manually restarted. This has happened twice and seems to occur at random - > the last time was after 5 days of uptime and a significant amount of > processed traffic (<1TB). > > When this problem happens, the backend is still reachable and happily > serving images. It is not a particularly busy period for us (600 > requests/sec/Varnish server - approx 350Mbps outbound each - we got up to > nearly 3 times that level without incident previously) but for some reason > unknown to us, the servers just suddenly stop processing requests and worker > processes increase dramatically. > > After the lockup happened last time, I tried firing up varnishlog and > hitting the server directly - my requests were not showing up at all. The > *only* entries in the varnish log were related to worker processes being > killed over time - no PINGs, PONGs, load balancer healthchecks or anything > related to 'normal' varnish activity. It's as if varnishd has completely > locked up, but we can't understand what causes both our varnish servers to > exhibit this behaviour at exactly the same time, nor why varnish does not > detect it and attempt a restart. After a restart, varnish is fine and > behaves itself. > > There is nothing to indicate an error with the backend, nor anything in > syslog to indicate a Varnish problem. Pointers of any kind would be > appreciated :) > > Best regards > > Ross Brown > Trade Me > www.trademe.co.nz > > *** Startup Options (as per hints in wiki for caching millions of objects): > -a 0.0.0.0:80 -f /usr/local/etc/default.net.vcl -T 0.0.0.0:8021 -t 86400 > -h classic,1200007 -p thread_pool_max=4000 -p thread_pools=4 -p > listen_depth=4096 -p lru_interval=3600 -p obj_workspace=4096 -s malloc,10G > > *** Running VCL: > backend default { > .host = "10.10.10.10"; > .port = "80"; > } > > sub vcl_recv { > # Don't cache objects requested with query string in URI. > # Needed for newsletter headers (openrate) and health checks. > if (req.url ~ "\?.*") { > pass; > } > > # Force lookup if the request is a no-cache request from the client. > if (req.http.Cache-Control ~ "no-cache") { > unset req.http.Cache-Control; > lookup; > } > > # By default, Varnish will not serve requests that come with a > cookie from its cache. > unset req.http.cookie; > unset req.http.authenticate; > > # No action here, continue into default vcl_recv{} > } > > > ***Stats > 458887 Client connections accepted > 170714631 Client requests received > 133012763 Cache hits > 3715 Cache hits for pass > 27646213 Cache misses > 37700868 Backend connections success > 0 Backend connections not attempted > 0 Backend connections too many > 40 Backend connections failures > 37512808 Backend connections reuses > 37514682 Backend connections recycles > 0 Backend connections unused > 1339 N struct srcaddr > 16 N active struct srcaddr > 756 N struct sess_mem > 12 N struct sess > 761152 N struct object > 761243 N struct objecthead > 0 N struct smf > 0 N small free smf > 0 N large free smf > 322 N struct vbe_conn > 345 N struct bereq > 20 N worker threads > 2331 N worker threads created > 0 N worker threads not created > 0 N worker threads limited > 0 N queued work requests > 35249 N overflowed work requests > 0 N dropped work requests > 1 N backends > 44 N expired objects > 26886639 N LRU nuked objects > 0 N LRU saved objects > 15847787 N LRU moved objects > 0 N objects on deathrow > 3 HTTP header overflows > 0 Objects sent with sendfile > 164595318 Objects sent with write > 0 Objects overflowing workspace > 458886 Total Sessions > 170715215 Total Requests > 306 Total pipe > 10054413 Total pass > 37700586 Total fetch > 49458782160 Total header bytes > 1151144727614 Total body bytes > 89464 Session Closed > 0 Session Pipeline > 0 Session Read Ahead > 0 Session Linger > 170622902 Session herd > 7875546129 SHM records > 380705819 SHM writes > 138 SHM flushes due to overflow > 763205 SHM MTX contention > 2889 SHM cycles through buffer > 0 allocator requests > 0 outstanding allocations > 0 bytes allocated > 0 bytes free > 101839895 SMA allocator requests > 1519005 SMA outstanding allocations > 10736616112 SMA outstanding bytes > 562900737623 SMA bytes allocated > 552164121511 SMA bytes free > 56 SMS allocator requests > 0 SMS outstanding allocations > 0 SMS outstanding bytes > 25712 SMS bytes allocated > 25712 SMS bytes freed > 37700490 Backend requests made > 3 N vcl total > 3 N vcl available > 0 N vcl discarded > 1 N total active purges > 1 N new purges added > 0 N old purges deleted > 0 N objects tested > 0 N regexps tested against > 0 N duplicate purges removed > 0 HCB Lookups without lock > 0 HCB Lookups with lock > 0 HCB Inserts > 0 Objects ESI parsed (unlock) > 0 ESI parse errors (unlock) > > > > _______________________________________________ > varnish-misc mailing list > [email protected] > http://projects.linpro.no/mailman/listinfo/varnish-misc >
_______________________________________________ varnish-misc mailing list [email protected] http://projects.linpro.no/mailman/listinfo/varnish-misc
