Hi, I use Varnish for at least one year and in the past days began to get really strange behavior (currently using 2.1.3). Sometimes I get some "Connection reset by peer" requesting pages. When this happens, I look at the stats and see something like these, for example:
client_conn 1565694 187.17 Client connections accepted client_drop 271 0.03 Connection dropped, no sess/wrk client_req 2790588 333.60 Client requests received cache_hit 1813984 216.85 Cache hits cache_hitpass 0 0.00 Cache hits for pass cache_miss 546088 65.28 Cache misses backend_conn 32042 3.83 Backend conn. success backend_unhealthy 1351 0.16 Backend conn. not attempted backend_busy 0 0.00 Backend conn. too many backend_fail 1 0.00 Backend conn. failures backend_reuse 889261 106.31 Backend conn. reuses backend_toolate 157 0.02 Backend conn. was closed backend_recycle 889425 106.33 Backend conn. recycles backend_unused 0 0.00 Backend conn. unused fetch_head 759 0.09 Fetch head fetch_length 920288 110.02 Fetch with Length fetch_chunked 0 0.00 Fetch chunked fetch_eof 0 0.00 Fetch EOF fetch_bad 0 0.00 Fetch had bad headers fetch_close 0 0.00 Fetch wanted close fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed fetch_zero 0 0.00 Fetch zero len fetch_failed 0 0.00 Fetch failed n_sess_mem 65783 . N struct sess_mem n_sess 54012 . N struct sess n_object 217320 . N struct object n_vampireobject 0 . N unresurrected objects n_objectcore 218999 . N struct objectcore n_objecthead 57686 . N struct objecthead n_smf 0 . N struct smf n_smf_frag 0 . N small free smf n_smf_large 0 . N large free smf n_vbe_conn 28 . N struct vbe_conn n_wrk 2283 . N worker threads n_wrk_create 2326 0.28 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 0 0.00 N worker threads limited n_wrk_queue 0 0.00 N queued work requests n_wrk_overflow 25533 3.05 N overflowed work requests n_wrk_drop 286 0.03 N dropped work requests n_backend 14 . N backends n_expired 0 . N expired objects n_lru_nuked 157040 . N LRU nuked objects n_lru_saved 0 . N LRU saved objects n_lru_moved 74969 . N LRU moved objects n_deathrow 0 . N objects on deathrow losthdr 0 0.00 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 2494603 298.22 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 1565320 187.13 Total Sessions s_req 2790588 333.60 Total Requests s_pipe 0 0.00 Total pipe s_pass 376511 45.01 Total pass s_fetch 921039 110.11 Total fetch s_hdrbytes 844190338 100919.35 Total header bytes s_bodybytes 40983737008 4899430.60 Total body bytes sess_closed 125532 15.01 Session Closed sess_pipeline 788 0.09 Session Pipeline sess_readahead 810 0.10 Session Read Ahead sess_linger 2656690 317.60 Session Linger sess_herd 2874808 343.67 Session herd shm_records 146433038 17505.44 SHM records shm_writes 12159673 1453.64 SHM writes shm_flushes 0 0.00 SHM flushes due to overflow shm_cont 51273 6.13 SHM MTX contention shm_cycles 69 0.01 SHM cycles through buffer sm_nreq 0 0.00 allocator requests sm_nobj 0 . outstanding allocations sm_balloc 0 . bytes allocated sm_bfree 0 . bytes free sma_nreq 1440966 172.26 SMA allocator requests sma_nobj 433212 . SMA outstanding allocations sma_nbytes 5368687547 . SMA outstanding bytes sma_balloc 10635681699 . SMA bytes allocated sma_bfree 5266994152 . SMA bytes free sms_nreq 1206 0.14 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 0 . SMS outstanding bytes sms_balloc 472992 . SMS bytes allocated sms_bfree 472992 . SMS bytes freed backend_req 921115 110.12 Backend requests made n_vcl 1 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_purge 1 . N total active purges n_purge_add 1 0.00 N new purges added n_purge_retire 0 0.00 N old purges deleted n_purge_obj_test 0 0.00 N objects tested n_purge_re_test 0 0.00 N regexps tested against n_purge_dups 0 0.00 N duplicate purges removed hcb_nolock 0 0.00 HCB Lookups without lock hcb_lock 0 0.00 HCB Lookups with lock hcb_insert 0 0.00 HCB Inserts esi_parse 0 0.00 Objects ESI parsed (unlock) esi_errors 0 0.00 ESI parse errors (unlock) accept_fail 0 0.00 Accept failures client_drop_late 15 0.00 Connection dropped late uptime 8365 1.00 Client uptime There are lots of worrisome stats like n_wrk (2283), n_wrk_overflow (25533), n_wrk_drop (286), client_drop (271) ... The number of active threads (2283) is very high for no apparent reason since the normal usage is like 300 threads or so. The server has 7.5GB RAM and is using malloc with 5GB size, running only a Varnish instance on an Ubuntu Server 10.10. Tried going to 3GB malloc size, increasing thread_pool_min to 500 with 2 pools (1000 idle threads total), while max is 5000 (from 3000), session_linger to 50, 100 and 150ms, connection_timeout to 5s, between_bytes_timeout to 5s, but sometimes the workers go way up crazy and the entire ram plus 3gb of swap get in use. Any ideas? Maybe something hanging up the sessions (network latency between varnish and backend? or someone doing this on purpose?) Here are my settings: acceptor_sleep_decay 0.900000 [] acceptor_sleep_incr 0.001000 [s] acceptor_sleep_max 0.050000 [s] auto_restart on [bool] ban_lurker_sleep 30.000000 [s] between_bytes_timeout 50.000000 [s] cache_vbe_conns off [bool] cc_command "exec cc -fpic -shared -Wl,-x -o %o %s" cli_buffer 8192 [bytes] cli_timeout 25 [seconds] clock_skew 10 [s] connect_timeout 50.000000 [s] critbit_cooloff 180.000000 [s] default_grace 10 [seconds] default_ttl 120 [seconds] diag_bitmap 0x0 [bitmap] err_ttl 0 [seconds] esi_syntax 0 [bitmap] fetch_chunksize 128 [kilobytes] first_byte_timeout 25.000000 [s] group varnish (112) http_headers 64 [header lines] http_range_support off [bool] listen_address :80 listen_depth 8192 [connections] log_hashstring off [bool] log_local_address off [bool] lru_interval 60 [seconds] max_esi_includes 5 [includes] max_restarts 4 [restarts] overflow_max 100 [%] ping_interval 3 [seconds] pipe_timeout 60 [seconds] prefer_ipv6 off [bool] purge_dups on [bool] rush_exponent 3 [requests per request] saintmode_threshold 10 [objects] send_timeout 6000 [seconds] sess_timeout 10 [seconds] sess_workspace 524288 [bytes] session_linger 50 [ms] session_max 100000 [sessions] shm_reclen 255 [bytes] shm_workspace 262144 [bytes] syslog_cli_traffic on [bool] thread_pool_add_delay 2 [milliseconds] thread_pool_add_threshold 2 [requests] thread_pool_fail_delay 200 [milliseconds] thread_pool_max 5000 [threads] thread_pool_min 500 [threads] thread_pool_purge_delay 1000 [milliseconds] thread_pool_stack unlimited [bytes] thread_pool_timeout 300 [seconds] thread_pools 2 [pools] thread_stats_rate 10 [requests] user varnish (106) vcl_trace off [bool] waiter default (epoll, poll) -- []'s Hugo _______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
