Hey guys, I'm running Varnish 2.1 in two m2.xlarge ec2 instances (17G of RAM + linux kernel 2.6.21.7-2.fc8xen-ec2-v1.0). Those two servers have been running for 2 months now almost without trouble. But I've noticed some crazy spikes in cpu usage (mostly in kernel land) once in a while. A few days ago I saw kswapd0 consuming 100% of a cpu core and varnishd consuming 100% of the other cpu core. I could strace varnish for a few seconds and everything looked normal, but then it crashed and left a zombie process eating most of the cpu so I had to restart the server. Today, exactly the same thing happened in the other server, and this is starting to scare me out.
We're running varnish with the following params: varnishd -P /var/run/varnishd.pid -a 0.0.0.0:2000 -T 127.0.0.1:6082 -w 200,2000 -s malloc,12G -p lru_interval=20 -f /etc/varnish/varnish.vcl We don't have swap enabled on these servers. Here's varnishstat -1 when varnish was freaking out: client_conn 9592723 7.82 Client connections accepted client_drop 0 0.00 Connection dropped, no sess/wrk client_req 67302765 54.84 Client requests received cache_hit 50571130 41.20 Cache hits cache_hitpass 0 0.00 Cache hits for pass cache_miss 16050808 13.08 Cache misses backend_conn 16029200 13.06 Backend conn. success backend_unhealthy 0 0.00 Backend conn. not attempted backend_busy 0 0.00 Backend conn. too many backend_fail 20649 0.02 Backend conn. failures backend_reuse 12352 0.01 Backend conn. reuses backend_toolate 0 0.00 Backend conn. was closed backend_recycle 12352 0.01 Backend conn. recycles backend_unused 0 0.00 Backend conn. unused fetch_head 0 0.00 Fetch head fetch_length 12764170 10.40 Fetch with Length fetch_chunked 3272791 2.67 Fetch chunked fetch_eof 0 0.00 Fetch EOF fetch_bad 0 0.00 Fetch had bad headers fetch_close 49 0.00 Fetch wanted close fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed fetch_zero 0 0.00 Fetch zero len fetch_failed 3272895 2.67 Fetch failed n_sess_mem 587 . N struct sess_mem n_sess 465 . N struct sess n_object 659083 . N struct object n_vampireobject 0 . N unresurrected objects n_objectcore 659439 . N struct objectcore n_objecthead 907405 . N struct objecthead n_smf 0 . N struct smf n_smf_frag 0 . N small free smf n_smf_large 0 . N large free smf n_vbe_conn 277 . N struct vbe_conn n_wrk 400 . N worker threads n_wrk_create 458 0.00 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 0 0.00 N worker threads limited n_wrk_queue 0 0.00 N queued work requests n_wrk_overflow 874 0.00 N overflowed work requests n_wrk_drop 0 0.00 N dropped work requests n_backend 2 . N backends n_expired 112662 . N expired objects n_lru_nuked 11954429 . N LRU nuked objects n_lru_saved 0 . N LRU saved objects n_lru_moved 46618517 . N LRU moved objects n_deathrow 0 . N objects on deathrow losthdr 2 0.00 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 60192420 49.04 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 9592577 7.82 Total Sessions s_req 67302765 54.84 Total Requests s_pipe 110 0.00 Total pipe s_pass 1691 0.00 Total pass s_fetch 12764115 10.40 Total fetch s_hdrbytes 21558035591 17564.97 Total header bytes s_bodybytes 1162454990977 947140.58 Total body bytes sess_closed 7687689 6.26 Session Closed sess_pipeline 0 0.00 Session Pipeline sess_readahead 0 0.00 Session Read Ahead sess_linger 61236267 49.89 Session Linger sess_herd 16659649 13.57 Session herd shm_records 3395953253 2766.94 SHM records shm_writes 131371160 107.04 SHM writes shm_flushes 661 0.00 SHM flushes due to overflow shm_cont 114836 0.09 SHM MTX contention shm_cycles 1378 0.00 SHM cycles through buffer sm_nreq 0 0.00 allocator requests sm_nobj 0 . outstanding allocations sm_balloc 0 . bytes allocated sm_bfree 0 . bytes free sma_nreq 37442974 30.51 SMA allocator requests sma_nobj 1318091 . SMA outstanding allocations sma_nbytes 12884892751 . SMA outstanding bytes sma_balloc 250925494011 . SMA bytes allocated sma_bfree 238040601260 . SMA bytes free sms_nreq 3967048 3.23 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 18446744073709527064 . SMS outstanding bytes sms_balloc 1895595320 . SMS bytes allocated sms_bfree 1895619352 . SMS bytes freed backend_req 16043889 13.07 Backend requests made n_vcl 1 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_purge 26155 . N total active purges n_purge_add 678663 0.55 N new purges added n_purge_retire 652508 0.53 N old purges deleted n_purge_obj_test 47484518 38.69 N objects tested n_purge_re_test 41413683761 33742.88 N regexps tested against n_purge_dups 485656 0.40 N duplicate purges removed hcb_nolock 50605455 41.23 HCB Lookups without lock hcb_lock 566 0.00 HCB Lookups with lock hcb_insert 16016509 13.05 HCB Inserts esi_parse 0 0.00 Objects ESI parsed (unlock) esi_errors 0 0.00 ESI parse errors (unlock) accept_fail 0 0.00 Accept failures client_drop_late 0 0.00 Connection dropped late uptime 1227331 1.00 Client uptime Have anyone experienced something similar? Thanks, Augusto _______________________________________________ varnish-misc mailing list [email protected] http://lists.varnish-cache.org/mailman/listinfo/varnish-misc
