Hi Kristian, Thank you for your suggestions. We've upgraded our Varnish config to 2.1.5 which decreases the thread_pool_add_delay from 20ms to 2ms. I've included a varnishstat listing below. The numbers reflect live testing, (our experiences with synthetic tests is that it is very hard to imitate real life behavior)
> I would typically recommend something closer to minimum 500, pools 2 and > max 5000. Currently we use 8 pools because the server has 2x4 CPU cores. Is there an advantage to use less pools than the number of CPU cores? When we increase the number of threads the problem with "N worker threads limited" is solved! :-) > How many connections (not requests) are you doing during these tests? ls -1 /proc/<varnish pid>/fd | wc -l gives us ~1300 (single load) and ~2600 (double load) file descriptors (=connections?). > Do you use keep-alive and long-lasting connections? You may want to see > if reducing session_linger helps. Requests mostly arrive from web browsers. netstat -tna | wc -l ~12000 tcp connections (single load) Unfortunately Varnish, after facing double load, now gets very 'unresponsive' after a while. Client requests are not answered by varnish resulting in long waiting times (10+ seconds) or timeouts. We do not have bandwidth issues. Is it possible that in our use case we've reached the limit of what Varnish can handle? Greetings and thanks for the help so far! Dennis varnishstat -1 client_conn 696307 177.40 Client connections accepted client_drop 0 0.00 Connection dropped, no sess/wrk client_req 965174 245.90 Client requests received cache_hit 925943 235.91 Cache hits cache_hitpass 5 0.00 Cache hits for pass cache_miss 39125 9.97 Cache misses backend_conn 4568 1.16 Backend conn. success backend_unhealthy 0 0.00 Backend conn. not attempted backend_busy 0 0.00 Backend conn. too many backend_fail 3 0.00 Backend conn. failures backend_reuse 34683 8.84 Backend conn. reuses backend_toolate 79 0.02 Backend conn. was closed backend_recycle 34768 8.86 Backend conn. recycles backend_unused 0 0.00 Backend conn. unused fetch_head 0 0.00 Fetch head fetch_length 24818 6.32 Fetch with Length fetch_chunked 14426 3.68 Fetch chunked fetch_eof 0 0.00 Fetch EOF fetch_bad 0 0.00 Fetch had bad headers fetch_close 1 0.00 Fetch wanted close fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed fetch_zero 0 0.00 Fetch zero len fetch_failed 0 0.00 Fetch failed n_sess_mem 2235 . N struct sess_mem n_sess 1787 . N struct sess n_object 34379 . N struct object n_vampireobject 0 . N unresurrected objects n_objectcore 34516 . N struct objectcore n_objecthead 22424 . N struct objecthead n_smf 0 . N struct smf n_smf_frag 0 . N small free smf n_smf_large 0 . N large free smf n_vbe_conn 6 . N struct vbe_conn n_wrk 280 . N worker threads n_wrk_create 280 0.07 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 9693 2.47 N worker threads limited n_wrk_queue 0 0.00 N queued work requests n_wrk_overflow 0 0.00 N overflowed work requests n_wrk_drop 0 0.00 N dropped work requests n_backend 4 . N backends n_expired 385 . N expired objects n_lru_nuked 0 . N LRU nuked objects n_lru_saved 0 . N LRU saved objects n_lru_moved 370058 . N LRU moved objects n_deathrow 0 . N objects on deathrow losthdr 0 0.00 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 815230 207.70 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 696245 177.39 Total Sessions s_req 965174 245.90 Total Requests s_pipe 4 0.00 Total pipe s_pass 120 0.03 Total pass s_fetch 39245 10.00 Total fetch s_hdrbytes 285675067 72783.46 Total header bytes s_bodybytes 10667879292 2717931.03 Total body bytes sess_closed 30597 7.80 Session Closed sess_pipeline 1238 0.32 Session Pipeline sess_readahead 537 0.14 Session Read Ahead sess_linger 955973 243.56 Session Linger sess_herd 891554 227.15 Session herd shm_records 39223429 9993.23 SHM records shm_writes 4022999 1024.97 SHM writes shm_flushes 0 0.00 SHM flushes due to overflow shm_cont 1578 0.40 SHM MTX contention shm_cycles 15 0.00 SHM cycles through buffer sm_nreq 0 0.00 allocator requests sm_nobj 0 . outstanding allocations sm_balloc 0 . bytes allocated sm_bfree 0 . bytes free sma_nreq 71633 18.25 SMA allocator requests sma_nobj 66455 . SMA outstanding allocations sma_nbytes 608883602 . SMA outstanding bytes sma_balloc 2206748168 . SMA bytes allocated sma_bfree 1597864566 . SMA bytes free sms_nreq 0 0.00 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 0 . SMS outstanding bytes sms_balloc 0 . SMS bytes allocated sms_bfree 0 . SMS bytes freed backend_req 39247 10.00 Backend requests made n_vcl 2 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 1 0.00 N vcl discarded n_purge 1 . N total active purges n_purge_add 1 0.00 N new purges added n_purge_retire 0 0.00 N old purges deleted n_purge_obj_test 0 0.00 N objects tested n_purge_re_test 0 0.00 N regexps tested against n_purge_dups 0 0.00 N duplicate purges removed hcb_nolock 0 0.00 HCB Lookups without lock hcb_lock 0 0.00 HCB Lookups with lock hcb_insert 0 0.00 HCB Inserts esi_parse 0 0.00 Objects ESI parsed (unlock) esi_errors 0 0.00 ESI parse errors (unlock) accept_fail 0 0.00 Accept failures client_drop_late 0 0.00 Connection dropped late uptime 3925 1.00 Client uptime backend_retry 2 0.00 Backend conn. retry dir_dns_lookups 0 0.00 DNS director lookups dir_dns_failed 0 0.00 DNS director failed lookups dir_dns_hit 0 0.00 DNS director cached lookups hit dir_dns_cache_full 0 0.00 DNS director full dnscache fetch_1xx 0 0.00 Fetch no body (1xx) fetch_204 0 0.00 Fetch no body (204) fetch_304 0 0.00 Fetch no body (304) On Fri, 2011-06-10 at 16:29 +0200, Kristian Lyngstol wrote: > Greetings, > > On Fri, Jun 10, 2011 at 08:32:11AM +0200, Dennis Hendriksen wrote: > > We're running Varnish 2.0.6 on a dual quad core server which is doing > > about 500 req/s with a 97% hit ratio serving mostly images with. When we > > increase the load to about 800 req/s than we encounter two problems that > > seem to be related with the thread pool increase. > > You really should see if you can't move to at least Varnish 2.1.5. > > > When we double the varnish load then the "N worker threads limited" > > increases rapidly (100k+) while the "N worker threads created" does not > > increase (8 pools, min pool size 25, max pool size 1000). Varnish is > > unresponsive and client connections hang. > > That'll give you 200 threads at startup. > > I would typically recommend something closer to minimum 500, pools 2 and > max 5000. > > You also want to reduce the thread_pool_add_delay from the (2.0.6) > default 20ms to 2ms for instance. That will limit the rate that threads > are started at, and 20ms is often way too slow. > > How many connections (not requests) are you doing during these tests? > > > At other times we see the number of worker threads increasing but again > > connections 'hang' while Varnish doesn't show any dropped connections > > (only overflows). > > Do you use keep-alive and long-lasting connections? You may want to see > if reducing session_linger helps. > > Are you testing with real traffic or synthetic tests? > > If possible, varnishstat -1 output would be useful. > > - Kristian > _______________________________________________ varnish-misc mailing list [email protected] http://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
