Thank you to all who at least put thought to my inquiry. Here is what I was able to discover:
1 ) The ASA firewall kills idle sessions after 1 hour 2) Varnish doesn't set the keepalive flag for backend connections 3) Once the tcp session is dropped by the firewall, varnish doesn't detect this, and keeps trying to use this backend connection until it times out 4) With multiple established (and failed) tcp sessions, the fetch from backend will lag severely waiting for timeouts until it either hits a usable tcp session or creates a new one 4) Health probes aren't able to detect this, since they create and close a tcp session for each probe I ended up having to put apache in-between varnish and my backend since Apache does have a keepalive option in ProxyPass. So my question is, should Varnish have an option to set the keepalive flag on backend connections? Should I make a feature request? This seems relatively simple. Nate -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Nathan Warren Sent: Thursday, June 13, 2013 12:56 PM To: [email protected] Subject: varnish using dropped backend connections? Good day, I have a varnish cache serving requests to two play framework instances with a cisco asa firewall in-between. Everything works great at first - cache is working, acceptable cache hit ratio, fast, etc. But then after about an hour (without hits to the service), when the cache misses and pulls from the backend, there is no reply - the pound server times out while varnish hangs. Eventually varnish gives a 503 meditation, but pound doesn't wait that long to time out. Packet traces on the varnish box show packets being sent from varnish, but packet traces never show it reaching the backend server. The health checks keep polling successfully throughout. My suspicion was that varnish was creating a keepalive session to the play server that was persistent. This seems to be the case with tcp sessions visible (via netstat) on both sides when no traffic is passing through. It seems as though the firewall was dropping the keepalive session, but varnish kept using it as if it were there. Then it kept re-using it over and over. After refreshing the page a few times (in error), the page will start working. I found that cisco asa firewalls drop idle sessions after one hour, and that my version of linux has a 7200 time to wait before interrogating a possibly failed tcp session. I dropped this to 1800 via /proc/sys/net/ipv4/tcp_keepalive_time and restarted varnish. After waiting for an hour, the problem continues as before. I would appreciate any insight you can provide, even if it's a problem with my config that doesn't seem to be related to this problem. Thanks, Nate #### default.vcl #### backend backend11 { .host = "10.1.1.20"; .port = "9010"; .probe = { .url = "/sys-status"; .interval = 10s; .timeout = 1s; .window = 5; .threshold = 3; } } backend backend12 { .host = "10.1.1.21"; .port = "9010"; .probe = { .url = "/sys-status"; .interval = 10s; .timeout = 1s; .window = 5; .threshold = 3; } } director backend1 random { .retries = 5; { .backend = backend11; .weight = 3; } { .backend = backend12; .weight = 3; } } backend backend21 { .host = "10.1.1.20"; .port = "9003"; .probe = { .url = "/ping"; .interval = 10s; .timeout = 2s; .window = 5; .threshold = 3; } } backend backend22 { .host = "10.1.1.21"; .port = "9003"; .probe = { .url = "/ping"; .interval = 10s; .timeout = 1s; .window = 5; .threshold = 3; } } director backend2 random { { .backend = backend11; .weight = 3; } { .backend = backend12; .weight = 3; } } sub vcl_recv { #serve stale data is ok set req.grace = 55m; #we don't care about any cookies unset req.http.Cookie; # secure redirects if (req.http.host ~ "^server1.nazarene.org$" && req.http.X-Forwarded-Proto !~ "(?i)https" ) { set req.http.x-Redir-Url = "https://" + req.http.host + req.url; error 750 req.http.x-Redir-Url; } if (req.url ~ "(noaccess|notallowed)" ) { error 404 "Not Found"; } # send requests to server1 to go to backend1 if (req.http.host ~ "^server1.domain.local$") { set req.backend = backend1; #if the request is going to /foo/p/blah/blah then we can't cache, so don't try if (req.url ~ "^/foo/p/.*") { return (pipe); } return (lookup); } # send requests to server2.domain.local if ( req.http.host ~ "^server2.domain.local$" ) { # but only for certain urls if ( req.url ~ "^/foo/(bar|foobar)" ) { set req.backend = backend2; return (lookup); } } # If you're not served at this point, I don't have anything for you error 404 "Not Found"; } sub vcl_fetch { set beresp.grace = 60m; #strip cookies from backend so varnish will cache unset beresp.http.Set-Cookie; # Properly handle different encoding types if (req.http.Accept-Encoding) { if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") { # No point in compressing these remove req.http.Accept-Encoding; } elsif (req.http.Accept-Encoding ~ "gzip") { set req.http.Accept-Encoding = "gzip"; } elsif (req.http.Accept-Encoding ~ "deflate") { set req.http.Accept-Encoding = "deflate"; } else { # unkown algorithm remove req.http.Accept-Encoding; } } if (beresp.http.content-type ~ "text") { set beresp.do_gzip = true; } } sub vcl_error { #redirect the client to the specified location if (obj.status == 750) { set obj.http.Location = obj.response; set obj.status = 302; return (deliver); } } ##### END default.vcl ###### user@host:~$ sudo varnishstat -1 -n varnish client_conn 21 0.00 Client connections accepted client_drop 0 0.00 Connection dropped, no sess/wrk client_req 146 0.02 Client requests received cache_hit 111 0.01 Cache hits cache_hitpass 0 0.00 Cache hits for pass cache_miss 35 0.00 Cache misses backend_conn 10 0.00 Backend conn. success backend_unhealthy 0 0.00 Backend conn. not attempted backend_busy 0 0.00 Backend conn. too many backend_fail 0 0.00 Backend conn. failures backend_reuse 27 0.00 Backend conn. reuses backend_toolate 0 0.00 Backend conn. was closed backend_recycle 30 0.00 Backend conn. recycles backend_retry 5 0.00 Backend conn. retry fetch_head 0 0.00 Fetch head fetch_length 30 0.00 Fetch with Length fetch_chunked 0 0.00 Fetch chunked fetch_eof 0 0.00 Fetch EOF fetch_bad 0 0.00 Fetch had bad headers fetch_close 0 0.00 Fetch wanted close fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed fetch_zero 0 0.00 Fetch zero len fetch_failed 0 0.00 Fetch failed fetch_1xx 0 0.00 Fetch no body (1xx) fetch_204 0 0.00 Fetch no body (204) fetch_304 0 0.00 Fetch no body (304) n_sess_mem 14 . N struct sess_mem n_sess 0 . N struct sess n_object 16 . N struct object n_vampireobject 0 . N unresurrected objects n_objectcore 18 . N struct objectcore n_objecthead 21 . N struct objecthead n_waitinglist 7 . N struct waitinglist n_vbc 3 . N struct vbc n_wrk 800 . N worker threads n_wrk_create 800 0.09 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 0 0.00 N worker threads limited n_wrk_lqueue 0 0.00 work request queue length n_wrk_queued 0 0.00 N queued work requests n_wrk_drop 0 0.00 N dropped work requests n_backend 4 . N backends n_expired 14 . N expired objects n_lru_nuked 0 . N LRU nuked objects n_lru_moved 39 . N LRU moved objects losthdr 0 0.00 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 25 0.00 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 21 0.00 Total Sessions s_req 146 0.02 Total Requests s_pipe 0 0.00 Total pipe s_pass 0 0.00 Total pass s_fetch 30 0.00 Total fetch s_hdrbytes 45932 5.26 Total header bytes s_bodybytes 66053 7.57 Total body bytes sess_closed 6 0.00 Session Closed sess_pipeline 0 0.00 Session Pipeline sess_readahead 0 0.00 Session Read Ahead sess_linger 141 0.02 Session Linger sess_herd 90 0.01 Session herd shm_records 17801 2.04 SHM records shm_writes 10492 1.20 SHM writes shm_flushes 0 0.00 SHM flushes due to overflow shm_cont 0 0.00 SHM MTX contention shm_cycles 0 0.00 SHM cycles through buffer sms_nreq 5 0.00 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 0 . SMS outstanding bytes sms_balloc 2090 . SMS bytes allocated sms_bfree 2090 . SMS bytes freed backend_req 37 0.00 Backend requests made n_vcl 1 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_ban 1 . N total active bans n_ban_add 1 0.00 N new bans added n_ban_retire 0 0.00 N old bans deleted n_ban_obj_test 0 0.00 N objects tested n_ban_re_test 0 0.00 N regexps tested against n_ban_dups 0 0.00 N duplicate bans removed hcb_nolock 146 0.02 HCB Lookups without lock hcb_lock 18 0.00 HCB Lookups with lock hcb_insert 18 0.00 HCB Inserts esi_errors 0 0.00 ESI parse errors (unlock) esi_warnings 0 0.00 ESI parse warnings (unlock) accept_fail 0 0.00 Accept failures client_drop_late 0 0.00 Connection dropped late uptime 8729 1.00 Client uptime dir_dns_lookups 0 0.00 DNS director lookups dir_dns_failed 0 0.00 DNS director failed lookups dir_dns_hit 0 0.00 DNS director cached lookups hit dir_dns_cache_full 0 0.00 DNS director full dnscache vmods 0 . Loaded VMODs n_gzip 9 0.00 Gzip operations n_gunzip 0 0.00 Gunzip operations LCK.sms.creat 1 0.00 Created locks LCK.sms.destroy 0 0.00 Destroyed locks LCK.sms.locks 15 0.00 Lock Operations LCK.sms.colls 0 0.00 Collisions LCK.smp.creat 0 0.00 Created locks LCK.smp.destroy 0 0.00 Destroyed locks LCK.smp.locks 0 0.00 Lock Operations LCK.smp.colls 0 0.00 Collisions LCK.sma.creat 2 0.00 Created locks LCK.sma.destroy 0 0.00 Destroyed locks LCK.sma.locks 103 0.01 Lock Operations LCK.sma.colls 0 0.00 Collisions LCK.smf.creat 0 0.00 Created locks LCK.smf.destroy 0 0.00 Destroyed locks LCK.smf.locks 0 0.00 Lock Operations LCK.smf.colls 0 0.00 Collisions LCK.hsl.creat 0 0.00 Created locks LCK.hsl.destroy 0 0.00 Destroyed locks LCK.hsl.locks 0 0.00 Lock Operations LCK.hsl.colls 0 0.00 Collisions LCK.hcb.creat 1 0.00 Created locks LCK.hcb.destroy 0 0.00 Destroyed locks LCK.hcb.locks 71 0.01 Lock Operations LCK.hcb.colls 0 0.00 Collisions LCK.hcl.creat 0 0.00 Created locks LCK.hcl.destroy 0 0.00 Destroyed locks LCK.hcl.locks 0 0.00 Lock Operations LCK.hcl.colls 0 0.00 Collisions LCK.vcl.creat 1 0.00 Created locks LCK.vcl.destroy 0 0.00 Destroyed locks LCK.vcl.locks 9 0.00 Lock Operations LCK.vcl.colls 0 0.00 Collisions LCK.stat.creat 1 0.00 Created locks LCK.stat.destroy 0 0.00 Destroyed locks LCK.stat.locks 14 0.00 Lock Operations LCK.stat.colls 0 0.00 Collisions LCK.sessmem.creat 1 0.00 Created locks LCK.sessmem.destroy 0 0.00 Destroyed locks LCK.sessmem.locks 37 0.00 Lock Operations LCK.sessmem.colls 0 0.00 Collisions LCK.wstat.creat 1 0.00 Created locks LCK.wstat.destroy 0 0.00 Destroyed locks LCK.wstat.locks 26237 3.01 Lock Operations LCK.wstat.colls 0 0.00 Collisions LCK.herder.creat 1 0.00 Created locks LCK.herder.destroy 0 0.00 Destroyed locks LCK.herder.locks 1 0.00 Lock Operations LCK.herder.colls 0 0.00 Collisions LCK.wq.creat 2 0.00 Created locks LCK.wq.destroy 0 0.00 Destroyed locks LCK.wq.locks 18452 2.11 Lock Operations LCK.wq.colls 0 0.00 Collisions LCK.objhdr.creat 25 0.00 Created locks LCK.objhdr.destroy 4 0.00 Destroyed locks LCK.objhdr.locks 624 0.07 Lock Operations LCK.objhdr.colls 0 0.00 Collisions LCK.exp.creat 1 0.00 Created locks LCK.exp.destroy 0 0.00 Destroyed locks LCK.exp.locks 8773 1.01 Lock Operations LCK.exp.colls 0 0.00 Collisions LCK.lru.creat 2 0.00 Created locks LCK.lru.destroy 0 0.00 Destroyed locks LCK.lru.locks 30 0.00 Lock Operations LCK.lru.colls 0 0.00 Collisions LCK.cli.creat 1 0.00 Created locks LCK.cli.destroy 0 0.00 Destroyed locks LCK.cli.locks 2922 0.33 Lock Operations LCK.cli.colls 0 0.00 Collisions LCK.ban.creat 1 0.00 Created locks LCK.ban.destroy 0 0.00 Destroyed locks LCK.ban.locks 8774 1.01 Lock Operations LCK.ban.colls 0 0.00 Collisions LCK.vbp.creat 1 0.00 Created locks LCK.vbp.destroy 0 0.00 Destroyed locks LCK.vbp.locks 3488 0.40 Lock Operations LCK.vbp.colls 0 0.00 Collisions LCK.vbe.creat 1 0.00 Created locks LCK.vbe.destroy 0 0.00 Destroyed locks LCK.vbe.locks 17 0.00 Lock Operations LCK.vbe.colls 0 0.00 Collisions LCK.backend.creat 4 0.00 Created locks LCK.backend.destroy 0 0.00 Destroyed locks LCK.backend.locks 185 0.02 Lock Operations LCK.backend.colls 0 0.00 Collisions SMA.s0.c_req 62 0.01 Allocator requests SMA.s0.c_fail 0 0.00 Allocator failures SMA.s0.c_bytes 1798968 206.09 Bytes allocated SMA.s0.c_freed 1548284 177.37 Bytes freed SMA.s0.g_alloc 30 . Allocations outstanding SMA.s0.g_bytes 250684 . Bytes outstanding SMA.s0.g_space 268184772 . Bytes available SMA.Transient.c_req 0 0.00 Allocator requests SMA.Transient.c_fail 0 0.00 Allocator failures SMA.Transient.c_bytes 0 0.00 Bytes allocated SMA.Transient.c_freed 0 0.00 Bytes freed SMA.Transient.g_alloc 0 . Allocations outstanding SMA.Transient.g_bytes 0 . Bytes outstanding SMA.Transient.g_space 0 . Bytes available VBE.backend11(10.1.1.20,,9010).vcls 1 . VCL references VBE.backend11(10.1.1.20,,9010).happy4611687117937967103 . Happy health probes VBE.backend12(10.1.1.21,,9010).vcls 1 . VCL references VBE.backend12(10.1.1.21,,9010).happy4611687117937967103 . Happy health probes VBE.backend21(10.1.1.20,,9003).vcls 1 . VCL references VBE.backend21(10.1.1.20,,9003).happy 18446744073709551615 . Happy health probes VBE.backend22(10.1.1.21,,9003).vcls 1 . VCL references VBE.backend22(10.1.1.21,,9003).happy 18446744073709551615 . Happy health probes _______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc _______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
