I have seen this happen. I have a similar hardware setup, though I changed the multi-ssd raid into 3 separate cache file arguments.
We had roughly 240GB storage space total, after about 2-3 weeks and sm_bfree reached ~20GB. lru_nuked started incrementing, sm_bfree climbed to ~60GB, but lru_nuking never stopped. On Wed, Feb 24, 2010 at 8:15 PM, Barry Abrahamson <ba...@automattic.com> wrote: > Howdy, > > We are finally getting around to upgrading to the latest version of varnish > and are running into quite a weird problem. Everything works fine for a bit > (1+day) , then all of a sudden Varnish starts nuking all of the objects from > the cache: > > About 4 hours ago there were 1 million objects in the cache, now there are > just about 172k. This looks a bit weird to me: > > sms_nbytes 18446744073709548694 . SMS outstanding bytes > > Here are the options I am passing to varnishd: > > /usr/local/sbin/varnishd -a 0.0.0.0:8888 -f /etc/varnish/varnish.vcl -P > /var/run/varnishd.pid -T 0.0.0.0:47200 -t 600 -w 1,200,300 -p thread_pools 4 > -p thread_pool_add_delay 2 -p lru_interval 60 -h classic,500009 -p > obj_workspace 4096 -s file,/varnish/cache,150G > > /varnish is 2 x 80GB Intel X-25M SSDs in a software RAID 0 array. OS is > Debian Lenny 64-bit. There is plenty of space: > > /dev/md0 149G 52G 98G 35% /varnish > > Here is the output of varnishstat -1 > > uptime 134971 . Child uptime > client_conn 12051037 89.29 Client connections accepted > client_drop 0 0.00 Connection dropped, no sess > client_req 12048672 89.27 Client requests received > cache_hit 10161272 75.28 Cache hits > cache_hitpass 133244 0.99 Cache hits for pass > cache_miss 1750857 12.97 Cache misses > backend_conn 1824594 13.52 Backend conn. success > backend_unhealthy 0 0.00 Backend conn. not attempted > backend_busy 0 0.00 Backend conn. too many > backend_fail 3644 0.03 Backend conn. failures > backend_reuse 0 0.00 Backend conn. reuses > backend_toolate 0 0.00 Backend conn. was closed > backend_recycle 0 0.00 Backend conn. recycles > backend_unused 0 0.00 Backend conn. unused > fetch_head 5309 0.04 Fetch head > fetch_length 1816422 13.46 Fetch with Length > fetch_chunked 0 0.00 Fetch chunked > fetch_eof 0 0.00 Fetch EOF > fetch_bad 0 0.00 Fetch had bad headers > fetch_close 0 0.00 Fetch wanted close > fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed > fetch_zero 0 0.00 Fetch zero len > fetch_failed 16 0.00 Fetch failed > n_srcaddr 0 . N struct srcaddr > n_srcaddr_act 0 . N active struct srcaddr > n_sess_mem 578 . N struct sess_mem > n_sess 414 . N struct sess > n_object 172697 . N struct object > n_objecthead 173170 . N struct objecthead > n_smf 471310 . N struct smf > n_smf_frag 62172 . N small free smf > n_smf_large 67978 . N large free smf > n_vbe_conn 18446744073709551611 . N struct vbe_conn > n_bereq 315 . N struct bereq > n_wrk 76 . N worker threads > n_wrk_create 3039 0.02 N worker threads created > n_wrk_failed 0 0.00 N worker threads not created > n_wrk_max 0 0.00 N worker threads limited > n_wrk_queue 0 0.00 N queued work requests > n_wrk_overflow 25136 0.19 N overflowed work requests > n_wrk_drop 0 0.00 N dropped work requests > n_backend 4 . N backends > n_expired 771687 . N expired objects > n_lru_nuked 744693 . N LRU nuked objects > n_lru_saved 0 . N LRU saved objects > n_lru_moved 8675178 . N LRU moved objects > n_deathrow 0 . N objects on deathrow > losthdr 25 0.00 HTTP header overflows > n_objsendfile 0 0.00 Objects sent with sendfile > n_objwrite 11749415 87.05 Objects sent with write > n_objoverflow 0 0.00 Objects overflowing workspace > s_sess 12051007 89.29 Total Sessions > s_req 12050184 89.28 Total Requests > s_pipe 2661 0.02 Total pipe > s_pass 134858 1.00 Total pass > s_fetch 1821721 13.50 Total fetch > s_hdrbytes 3932274894 29134.22 Total header bytes > s_bodybytes 894452020319 6626994.10 Total body bytes > sess_closed 12050925 89.29 Session Closed > sess_pipeline 0 0.00 Session Pipeline > sess_readahead 0 0.00 Session Read Ahead > sess_linger 0 0.00 Session Linger > sess_herd 160 0.00 Session herd > shm_records 610011852 4519.58 SHM records > shm_writes 50380066 373.27 SHM writes > shm_flushes 9387 0.07 SHM flushes due to overflow > shm_cont 47763 0.35 SHM MTX contention > shm_cycles 226 0.00 SHM cycles through buffer > sm_nreq 4449213 32.96 allocator requests > sm_nobj 341160 . outstanding allocations > sm_balloc 11373072384 . bytes allocated > sm_bfree 116602589184 . bytes free > sma_nreq 0 0.00 SMA allocator requests > sma_nobj 0 . SMA outstanding allocations > sma_nbytes 0 . SMA outstanding bytes > sma_balloc 0 . SMA bytes allocated > sma_bfree 0 . SMA bytes free > sms_nreq 63997 0.47 SMS allocator requests > sms_nobj 0 . SMS outstanding allocations > sms_nbytes 18446744073709548694 . SMS outstanding bytes > sms_balloc 31161028 . SMS bytes allocated > sms_bfree 31162489 . SMS bytes freed > backend_req 1821961 13.50 Backend requests made > n_vcl 1 0.00 N vcl total > n_vcl_avail 1 0.00 N vcl available > n_vcl_discard 0 0.00 N vcl discarded > n_purge 1 . N total active purges > n_purge_add 1 0.00 N new purges added > n_purge_retire 0 0.00 N old purges deleted > n_purge_obj_test 0 0.00 N objects tested > n_purge_re_test 0 0.00 N regexps tested against > n_purge_dups 0 0.00 N duplicate purges removed > hcb_nolock 0 0.00 HCB Lookups without lock > hcb_lock 0 0.00 HCB Lookups with lock > hcb_insert 0 0.00 HCB Inserts > esi_parse 0 0.00 Objects ESI parsed (unlock) > esi_errors 0 0.00 ESI parse errors (unlock) > > Obviously, this isn't good for my cache hit rate :) It is also using a lot > of CPU. Has anyone seen this happen before? > > -- > Barry Abrahamson | Systems Wrangler | Automattic > Blog: http://barry.wordpress.com > > > > _______________________________________________ > varnish-misc mailing list > varnish-misc@projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc > _______________________________________________ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc