Hi Matt, Oh, sorry, I didn't notice the worker thread count at 3000. I would suggest switching to -smalloc,38G (total mem * 0.8). If you have a lot of small objects or objects of the same size, you could be encountering some excessive nuking.
Also, what are your backends doing when this happens? Is the nuking a coincidence or is there an issue further down the stack? Damon On Thu, Sep 29, 2011 at 5:42 PM, Matt Schurenko <[email protected]> wrote: > Sorry. I forgot to mention that I already adjusted thread behaviour via > varniahadm while varnish was running. I had it set to use min 50, max 3000 > and a thread timeout of 120s. I think the reason why n_wrk_overflow and > n_wrk_drop are so high is due to this problem. Before the lru nuke happens > the number of worker threads are ~ 100. As soon as it starts to nuke the > number of threads jumps to the max. I am monitoring some stats with MRTG. I > seem to remember that on the other varnish server it would being to lru nuke > long before the cache got full. For this one there is no lru nuke activity > until it reaches a certain point and then boom. 3000 threads are used up and > no more new clients can connect. **** > > ** ** > > > Matt Schurenko > Systems Administrator > **** > > *air**G**®** **Share Your World* > Suite 710, 1133 Melville Street**** > > Vancouver, BC V6E 4E5**** > > P: +1.604.408.2228 **** > > F: +1.866.874.8136**** > > E: [email protected]**** > > W: www.airg.com**** > > *From:* Damon Snyder [mailto:[email protected]] > *Sent:* September-29-11 5:30 PM > *To:* Matt Schurenko > *Cc:* [email protected] > *Subject:* Re: lru nuke causes varnish to stop respnding to client > requests**** > > ** ** > > Hi Matt,**** > > It looks like you really need to bump up the number of worker threads. From > your stats:**** > > ** ** > > n_wrk_queue 2861 0.02 N queued work requests**** > > n_wrk_overflow 83534 0.52 N overflowed work requests**** > > n_wrk_drop 10980 0.07 N dropped work requests**** > > ** ** > > You have a lot of requests that are on the queue waiting for a worker and > you have a lot of requests where varnish has given up trying to fullfil with > a worker. You can bump the number of works up using the -w command line > option to varnishd. I would suggest something like -w 400,1000,120 to start > with (the default is -w2,500,300). This says use 400 at a minimum, 1000 as > the maximum, and set the timeout to 120s. According to the stats > explanation doc <https://www.varnish-cache.org/trac/wiki/StatsExplained> your > n_wrk_queue and n_wrk_drop should be 0. If you see these numbers going up > again, use -w 500,2000,120 or something like that.**** > > ** ** > > Hope this helps,**** > > Damon**** > > ** ** > > On Thu, Sep 29, 2011 at 4:34 PM, Matt Schurenko <[email protected]> > wrote:**** > > I’ve been having this problem for a couple weeks now on one of our varnish > servers. I have posted a couple times already. What happens is that the > server in questions runs fine until the cache gets full. When it starts to > lru nuke the number of worker threads jumps up to thread_pool_max and > varnish stops responding to any client requests. I have tried this with > Centos5.4, 5.7 and now Slackware (all 64-bit ) and the behaviour is the > same. **** > > **** > > I am using varnish version 2.1.5 on a Dell C6105 with 48GB of RAM.**** > > **** > > Here is my varnishd command line:**** > > **** > > /usr/local/sbin/varnishd -s file,/tmp/varnish-cache,48g -T 127.0.0.1:2000-a > 0.0.0.0:80 -t 604800 -f /usr/local/etc/varnish/default.vcl -p http_headers > 384 -p connect_timeout 4.0**** > > **** > > Here is the output from varnishstat -1:**** > > **** > > client_conn 38582763 240.38 Client connections accepted**** > > client_drop 10950 0.07 Connection dropped, no sess/wrk > **** > > client_req 38298994 238.61 Client requests received**** > > cache_hit 32513762 202.57 Cache hits**** > > cache_hitpass 0 0.00 Cache hits for pass**** > > cache_miss 5784476 36.04 Cache misses**** > > backend_conn 5725540 35.67 Backend conn. success**** > > backend_unhealthy 0 0.00 Backend conn. not attempted*** > * > > backend_busy 0 0.00 Backend conn. too many**** > > backend_fail 1383 0.01 Backend conn. failures**** > > backend_reuse 60837 0.38 Backend conn. reuses**** > > backend_toolate 33 0.00 Backend conn. was closed**** > > backend_recycle 60870 0.38 Backend conn. recycles**** > > backend_unused 0 0.00 Backend conn. unused**** > > fetch_head 6 0.00 Fetch head**** > > fetch_length 93631 0.58 Fetch with Length**** > > fetch_chunked 5689433 35.45 Fetch chunked**** > > fetch_eof 0 0.00 Fetch EOF**** > > fetch_bad 0 0.00 Fetch had bad headers**** > > fetch_close 107 0.00 Fetch wanted close**** > > fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed**** > > fetch_zero 0 0.00 Fetch zero len**** > > fetch_failed 1 0.00 Fetch failed**** > > n_sess_mem 7138 . N struct sess_mem**** > > n_sess 6970 . N struct sess**** > > n_object 5047123 . N struct object**** > > n_vampireobject 0 . N unresurrected objects**** > > n_objectcore 5048435 . N struct objectcore**** > > n_objecthead 4955641 . N struct objecthead**** > > n_smf 10139770 . N struct smf**** > > n_smf_frag 295671 . N small free smf**** > > n_smf_large 0 . N large free smf**** > > n_vbe_conn 2997 . N struct vbe_conn**** > > n_wrk 3000 . N worker threads**** > > n_wrk_create 5739 0.04 N worker threads created**** > > n_wrk_failed 0 0.00 N worker threads not created*** > * > > n_wrk_max 4063 0.03 N worker threads limited**** > > n_wrk_queue 2861 0.02 N queued work requests**** > > n_wrk_overflow 83534 0.52 N overflowed work requests**** > > n_wrk_drop 10980 0.07 N dropped work requests**** > > n_backend 2 . N backends**** > > n_expired 2179 . N expired objects**** > > n_lru_nuked 862615 . N LRU nuked objects**** > > n_lru_saved 0 . N LRU saved objects**** > > n_lru_moved 27156180 . N LRU moved objects**** > > n_deathrow 0 . N objects on deathrow**** > > losthdr 0 0.00 HTTP header overflows**** > > n_objsendfile 0 0.00 Objects sent with sendfile**** > > n_objwrite 37294888 232.35 Objects sent with write**** > > n_objoverflow 0 0.00 Objects overflowing workspace** > ** > > s_sess 38566049 240.27 Total Sessions**** > > s_req 38298994 238.61 Total Requests**** > > s_pipe 0 0.00 Total pipe**** > > s_pass 266 0.00 Total pass**** > > s_fetch 5783176 36.03 Total fetch**** > > s_hdrbytes 12570989864 78319.53 Total header bytes**** > > s_bodybytes 151327304604 942796.38 Total body bytes**** > > sess_closed 34673984 216.03 Session Closed**** > > sess_pipeline 187 0.00 Session Pipeline**** > > sess_readahead 321 0.00 Session Read Ahead**** > > sess_linger 3929378 24.48 Session Linger**** > > sess_herd 3929559 24.48 Session herd**** > > shm_records 2025645664 12620.14 SHM records**** > > shm_writes 169640580 1056.89 SHM writes**** > > shm_flushes 41 0.00 SHM flushes due to overflow**** > > shm_cont 580515 3.62 SHM MTX contention**** > > shm_cycles 933 0.01 SHM cycles through buffer**** > > sm_nreq 12431620 77.45 allocator requests**** > > sm_nobj 9844099 . outstanding allocations**** > > sm_balloc 43855261696 . bytes allocated**** > > sm_bfree 7684345856 . bytes free**** > > sma_nreq 0 0.00 SMA allocator requests**** > > sma_nobj 0 . SMA outstanding allocations**** > > sma_nbytes 0 . SMA outstanding bytes**** > > sma_balloc 0 . SMA bytes allocated**** > > sma_bfree 0 . SMA bytes free**** > > sms_nreq 1566 0.01 SMS allocator requests**** > > sms_nobj 0 . SMS outstanding allocations**** > > sms_nbytes 0 . SMS outstanding bytes**** > > sms_balloc 656154 . SMS bytes allocated**** > > sms_bfree 656154 . SMS bytes freed**** > > backend_req 5786381 36.05 Backend requests made**** > > n_vcl 1 0.00 N vcl total**** > > n_vcl_avail 1 0.00 N vcl available**** > > n_vcl_discard 0 0.00 N vcl discarded**** > > n_purge 218 . N total active purges**** > > n_purge_add 218 0.00 N new purges added**** > > n_purge_retire 0 0.00 N old purges deleted**** > > n_purge_obj_test 588742 3.67 N objects tested**** > > n_purge_re_test 120444323 750.39 N regexps tested against**** > > n_purge_dups 0 0.00 N duplicate purges removed**** > > hcb_nolock 38301670 238.63 HCB Lookups without lock**** > > hcb_lock 5786309 36.05 HCB Lookups with lock**** > > hcb_insert 5786305 36.05 HCB Inserts**** > > esi_parse 0 0.00 Objects ESI parsed (unlock)**** > > esi_errors 0 0.00 ESI parse errors (unlock)**** > > accept_fail 0 0.00 Accept failures**** > > client_drop_late 30 0.00 Connection dropped late**** > > uptime 160509 1.00 Client uptime**** > > backend_retry 25 0.00 Backend conn. retry**** > > dir_dns_lookups 0 0.00 DNS director lookups**** > > dir_dns_failed 0 0.00 DNS director failed lookups**** > > dir_dns_hit 0 0.00 DNS director cached lookups hit > **** > > dir_dns_cache_full 0 0.00 DNS director full dnscache*** > * > > fetch_1xx 0 0.00 Fetch no body (1xx)**** > > fetch_204 0 0.00 Fetch no body (204)**** > > fetch_304 0 0.00 Fetch no body (304)**** > > **** > > Even though I have removed the server from our load balancer there are > still a lot of requests going to the backend. Maybe these are all queued up > requests that varnish is trying to fulfill? Here is some output from > varnishlog –c when I try to connect with curl:**** > > **** > > root@mvp14:~# varnishlog -c**** > > 26 SessionOpen c 192.168.8.41 41942 0.0.0.0:80**** > > 26 ReqStart c 192.168.8.41 41942 2108342803**** > > 26 RxRequest c GET**** > > 26 RxURL c /**** > > 26 RxProtocol c HTTP/1.1**** > > 26 RxHeader c User-Agent: curl/7.21.4 (x86_64-unknown-linux-gnu) > libcurl/7.21.4 OpenSSL/0.9.8n zlib/1.2.5 libidn/1.19**** > > 26 RxHeader c Host: mvp14.airg.com**** > > 26 RxHeader c Accept: */***** > > 26 VCL_call c recv**** > > 26 VCL_return c lookup**** > > 26 VCL_call c hash**** > > 26 VCL_return c hash**** > > **** > > **** > > The connection just hangs here until it times out.**** > > **** > > Any help would be appreciated. We are trying to replace our squid caching > layer with varnish. However if I can’t resolve this issue we will have to go > back to squid.**** > > **** > > Thanks,**** > > **** > > **** > > **** > > *Matt Schurenko* > Systems Administrator **** > > > _______________________________________________ > varnish-misc mailing list > [email protected] > https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc**** > > ** ** >
_______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
