Hi When testing varnish throughput and scalability I have found strange varnish behavior.
using 2.0.4 with cache_acceptor_epoll.c patch : http://varnish.projects.linpro.no/ticket/492 When testing number of clients scalability I am able to get varnish into a state when it stops responding but in varnishlog it looks like doing some requests from the past ... detailed : I start few instances of ab with keepalive on and with -r option (Don't exit on socket receive errors) with concurency aprox. 2000 "ab -r -k -n 100000 -c 2000 http://127.0.0.1/&" doesn't mater if from localhost or from other hosts(even distributed each instance on diferent host); when i run so many instances that connections on varnish raise aprox above 10K, varnish stucks, and stops responding.In varnishstat "Client requests received" drops to "0", In that time varnish starts to spawn new threads, but not dramatically ( for example from 2,4K to 3K ). Then I stop all the ab instances and try to fetch something ... /opt/httpd2.2/bin/ab -k -n 1 -c 1 http://127.0.0.1/ but get timeout, when I try after few minutes when varnish certainly does nothing i get timeout too. In such state there is in varnishlog every 30s request like this but nothing else: 10342 SessionOpen c 127.0.0.1 28988 0.0.0.0:80 10342 ReqStart c 127.0.0.1 28988 1170523327 10342 RxRequest c GET 10342 RxURL c / 10342 RxProtocol c HTTP/1.0 10342 RxHeader c Connection: Keep-Alive 10342 RxHeader c Host: 127.0.0.1 10342 RxHeader c User-Agent: ApacheBench/2.3 10342 RxHeader c Accept: */* 10342 VCL_call c recv lookup 10342 VCL_call c hash hash 10342 VCL_call c miss fetch 10342 Backend c 20041 default default 10342 ObjProtocol c HTTP/1.1 10342 ObjStatus c 200 10342 ObjResponse c OK 10342 ObjHeader c Date: Tue, 03 Nov 2009 10:28:33 GMT 10342 ObjHeader c Server: Apache 10342 ObjHeader c Last-Modified: Sat, 20 Nov 2004 20:16:24 GMT 10342 ObjHeader c ETag: "fc76-2c-3e9564c23b600" 10342 ObjHeader c Content-Type: text/html 10342 TTL c 1170523327 RFC 5 1257244113 0 0 0 0 10342 VCL_call c fetch 10342 VCL_info c XID 1170523327: obj.prefetch (-30) less than ttl (5.00406), ignored. 10342 VCL_return c deliver 10342 Length c 44 10342 VCL_call c deliver deliver 10342 TxProtocol c HTTP/1.1 10342 TxStatus c 200 10342 TxResponse c OK 10342 TxHeader c Server: Apache 10342 TxHeader c Last-Modified: Sat, 20 Nov 2004 20:16:24 GMT 10342 TxHeader c ETag: "fc76-2c-3e9564c23b600" 10342 TxHeader c Content-Type: text/html 10342 TxHeader c Content-Length: 44 10342 TxHeader c Date: Tue, 03 Nov 2009 10:28:33 GMT 10342 TxHeader c X-Varnish: 1170523327 10342 TxHeader c Age: 0 10342 TxHeader c Via: 1.1 varnish 10342 TxHeader c Connection: keep-alive 10342 ReqEnd c 1170523327 1257244113.149692297 1257244113.153866768 526.840385675 0.004138470 0.000036001 There is nothing strange in syslog or varnishlog, the only thing that helps recover varnish is restarting it. Bad is that it has already happened in production traffic. Is there anything in my settings what should i check? my setings: accept_fd_holdoff 50 [ms] acceptor default (epoll, poll) auto_restart on [bool] backend_http11 on [bool] between_bytes_timeout 60.000000 [s] cache_vbe_conns off [bool] cc_command "exec cc -fpic -shared -Wl,-x -o %o %s" cli_buffer 8192 [bytes] cli_timeout 15 [seconds] client_http11 off [bool] clock_skew 10 [s] connect_timeout 0.400000 [s] default_grace 10 default_ttl 60 [seconds] diag_bitmap 0x0 [bitmap] err_ttl 0 [seconds] esi_syntax 0 [bitmap] fetch_chunksize 128 [kilobytes] first_byte_timeout 60.000000 [s] group nogroup (65534) listen_address 0.0.0.0:80 listen_depth 40960 [connections] log_hashstring off [bool] log_local_address off [bool] lru_interval 2 [seconds] max_esi_includes 5 [includes] max_restarts 4 [restarts] obj_workspace 8192 [bytes] overflow_max 300 [%] ping_interval 3 [seconds] pipe_timeout 60 [seconds] prefer_ipv6 off [bool] purge_dups off [bool] purge_hash on [bool] rush_exponent 3 [requests per request] send_timeout 5 [seconds] sess_timeout 1 [seconds] sess_workspace 16384 [bytes] session_linger 0 [ms] shm_reclen 255 [bytes] shm_workspace 8192 [bytes] srcaddr_hash 1049 [buckets] srcaddr_ttl 0 [seconds] thread_pool_add_delay 2 [milliseconds] thread_pool_add_threshold 2 [requests] thread_pool_fail_delay 200 [milliseconds] thread_pool_max 10000 [threads] thread_pool_min 200 [threads] thread_pool_purge_delay 1000 [milliseconds] thread_pool_timeout 10 [seconds] thread_pools 12 [pools] user nobody (65534) vcl_trace off [bool] vcl.conf: just backend declaration nothing else varnishstat -1 in stuck state: uptime 3450 . Child uptime client_conn 620403 179.83 Client connections accepted client_req 1212084 351.33 Client requests received cache_hit 1211297 351.10 Cache hits cache_hitpass 0 0.00 Cache hits for pass cache_miss 140 0.04 Cache misses backend_conn 140 0.04 Backend connections success backend_unhealthy 0 0.00 Backend connections not attempted backend_busy 0 0.00 Backend connections too many backend_fail 0 0.00 Backend connections failures backend_reuse 0 0.00 Backend connections reuses backend_recycle 0 0.00 Backend connections recycles backend_unused 0 0.00 Backend connections unused n_srcaddr 0 . N struct srcaddr n_srcaddr_act 0 . N active struct srcaddr n_sess_mem 20051 . N struct sess_mem n_sess 20277 . N struct sess n_object 2431 . N struct object n_objecthead 2440 . N struct objecthead n_smf 2442 . N struct smf n_smf_frag 4 . N small free smf n_smf_large 16 . N large free smf n_vbe_conn 0 . N struct vbe_conn n_bereq 1 . N struct bereq n_wrk 2467 . N worker threads n_wrk_create 2477 0.72 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 0 0.00 N worker threads limited n_wrk_queue 9891 2.87 N queued work requests n_wrk_overflow 18197 5.27 N overflowed work requests n_wrk_drop 0 0.00 N dropped work requests n_backend 1 . N backends n_expired 139 . N expired objects n_lru_nuked 0 . N LRU nuked objects n_lru_saved 0 . N LRU saved objects n_lru_moved 20 . N LRU moved objects n_deathrow 0 . N objects on deathrow losthdr 0 0.00 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 1212216 351.37 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 610512 176.96 Total Sessions s_req 1212261 351.38 Total Requests s_pipe 0 0.00 Total pipe s_pass 0 0.00 Total pass s_fetch 140 0.04 Total fetch s_hdrbytes 335018181 97106.72 Total header bytes s_bodybytes 53341094 15461.19 Total body bytes sess_closed 600121 173.95 Session Closed sess_pipeline 0 0.00 Session Pipeline sess_readahead 0 0.00 Session Read Ahead sess_linger 0 0.00 Session Linger sess_herd 612115 177.42 Session herd shm_records 40013596 11598.14 SHM records shm_writes 3028734 877.89 SHM writes shm_flushes 0 0.00 SHM flushes due to overflow shm_cont 228833 66.33 SHM MTX contention shm_cycles 10 0.00 SHM cycles through buffer sm_nreq 2710 0.79 allocator requests sm_nobj 2422 . outstanding allocations sm_balloc 19836928 . bytes allocated sm_bfree 9991118848 . bytes free sma_nreq 0 0.00 SMA allocator requests sma_nobj 0 . SMA outstanding allocations sma_nbytes 0 . SMA outstanding bytes sma_balloc 0 . SMA bytes allocated sma_bfree 0 . SMA bytes free sms_nreq 0 0.00 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 0 . SMS outstanding bytes sms_balloc 0 . SMS bytes allocated sms_bfree 0 . SMS bytes freed backend_req 140 0.04 Backend requests made n_vcl 1 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_purge 1 . N total active purges n_purge_add 1 0.00 N new purges added n_purge_retire 0 0.00 N old purges deleted n_purge_obj_test 0 0.00 N objects tested n_purge_re_test 0 0.00 N regexps tested against n_purge_dups 0 0.00 N duplicate purges removed hcb_nolock 0 0.00 HCB Lookups without lock hcb_lock 0 0.00 HCB Lookups with lock hcb_insert 0 0.00 HCB Inserts esi_parse 0 0.00 Objects ESI parsed (unlock) esi_errors 0 0.00 ESI parse errors (unlock) _______________________________________________ varnish-misc mailing list [email protected] http://projects.linpro.no/mailman/listinfo/varnish-misc
