Re: varnish 2.0.4 backend errors
2009/7/16 Lazy : > 2009/7/16 Ken Brownfield : >> Yeah, this one might be one for the Varnish devs. Only last thing I can >> think of is that Apache is returning a response that's larger than >> Content-Length: and Varnish is just closing the connection... > apache didn't set Content-Length on thiese requests > > Yesterday i switched the machine to squid, and errors came back as > httpAppendBody: Request not yet fully sent "POST in squid cache log > > Finally I realized that there are many tcp checksum errors in the dump, > I turned off tx and rx checksum offloading and now the checksums are ok, > there was so far no error in 25 minutes (before it was happening in about > 15 minute intervals). Maybe the fact that local backend is bound to eth0's ip > address made a difference. I will investigate later. Maybe it will be > worth mentioning > in the wiki if it's proven. > > After all it wasn't a varnish error at all. > > Thank You all for your time and expertise. Arghh said too soon, traffic went up and there are still thiese errors, but no so many ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish 2.0.4 backend errors
2009/7/16 Ken Brownfield : > Yeah, this one might be one for the Varnish devs. Only last thing I can > think of is that Apache is returning a response that's larger than > Content-Length: and Varnish is just closing the connection... apache didn't set Content-Length on thiese requests Yesterday i switched the machine to squid, and errors came back as httpAppendBody: Request not yet fully sent "POST in squid cache log Finally I realized that there are many tcp checksum errors in the dump, I turned off tx and rx checksum offloading and now the checksums are ok, there was so far no error in 25 minutes (before it was happening in about 15 minute intervals). Maybe the fact that local backend is bound to eth0's ip address made a difference. I will investigate later. Maybe it will be worth mentioning in the wiki if it's proven. After all it wasn't a varnish error at all. Thank You all for your time and expertise. -- Michal Grzedzicki ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Fwd: varnish 2.0.4 backend errors
On Wed, Jul 15, 2009 at 12:14:21PM +0200, Lazy wrote: > 2009/7/15 Kristian Lyngstol : > > You could try to turn on vcl-trace. (-p vcl_trace=on). > > right now i have something 100-200 req/s won't it kill the server and > I can't restart varnish now so I will have try to enable it at runtime Ok, I'm not entirely sure if that'll work, but it's worth a try (and it should cause any damage anyway). > > > > > Also, could you post your VCL in it's entirety? (If you already did, I > > must've missed it.) > > sub vcl_recv { (...) > if (req.http.Cache-Control ~ "no-cache") { > pass; You probably do not want this; It will mean that every time a _client_ sends no-cache, it's a pass. That'll be quite often (on refresh, for instance), and the refreshed item wont get cached, so it wont refresh the cache. (I'll have a look at the cap when I find some time. Could take a few hours). -- Kristian Lyngstøl Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 pgpH7KLi3pVfs.pgp Description: PGP signature ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Fwd: varnish 2.0.4 backend errors
2009/7/15 Kristian Lyngstol : > On Wed, Jul 15, 2009 at 10:01:08AM +0200, Lazy wrote: > (...) >> setting 500 as min threads didn't make any diference, funny thing is >> that failed requests are in logged in apache as succesful > > Sorry if I didn't make myself clear: The output from varnishstat didn't > indicate that threads was the issue causing 503s. It was more of a > tuning-comment than a shot at solving the real problem. Sorry about the > confusion. > >> 09:23:50.563934 IP x.x.x.x.51235 > x.x.x.x.88: S >> 1066103134:1066103134(0) win 32792 > 552835403 0,nop,wscale 7> > > () > > Is it possible for you to mail me the raw cap-file? (Feel free to obfuscate > the IPs, but I'm really more of an ethereal-man than tcpdump-output...). > Feel free to drop it directly to me, if that makes it easier. ok, it will take a minute i have a 50MB file to chop > >> successful POST in apache logs, >> >> in varnishlog >> >> 14 Backend c 15 default default > > (...) > > The request must've started earlier, as this is just the backend request. > Do you have the entire transaction available? yes 14 VCL_call c error 14 VCL_return c restart 14 VCL_call c recv 14 VCL_return c pass 14 VCL_call c pass 14 VCL_return c pass 15 BackendOpen b default xxx 51219 xxx 88 14 Backend c 15 default default 15 TxRequestb POST 15 TxURLb /quiz/question 15 TxProtocol b HTTP/1.1 15 TxHeader b x-requested-with: XMLHttpRequest 15 TxHeader b Accept-Language: pl 15 TxHeader b Accept: text/html, */* 15 TxHeader b Content-Type: application/x-www-form-urlencoded 15 TxHeader b UA-CPU: x86 15 TxHeader b Accept-Encoding: gzip, deflate 15 TxHeader b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30) 15 TxHeader b X-Varnish: 782633863 15 TxHeader b X-Forwarded-For: 82.177. 15 BackendClose b default 14 VCL_call c error 14 VCL_return c restart 14 VCL_call c recv 14 VCL_return c pass 14 VCL_call c pass 14 VCL_return c pass 15 BackendOpen b default xxx 51235 xxx 88 14 Backend c 15 default default > > (...) > >> restart shouldn't make another backend connection if current one got closed ? > > It should, but it should be noted that: 1. We don't know _why_ it fails. 2. > Restart in vcl_error is brand new, so there might still be some issues. > > You could try to turn on vcl-trace. (-p vcl_trace=on). right now i have something 100-200 req/s won't it kill the server and I can't restart varnish now so I will have try to enable it at runtime > > Also, could you post your VCL in it's entirety? (If you already did, I > must've missed it.) backend default { .host = ""; .port = "88"; } sub vcl_recv { if (req.url ~ "\.(png|gif|jpg|swf|css|js|ico)$") { lookup; } if (req.url ~ "landing/") { lookup; } if (req.http.Cache-Control ~ "no-cache") { pass; } } sub vcl_fetch { if (req.url ~ "\.(png|gif|jpg|swf)$") { unset obj.http.set-cookie; } if (obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache") { pass; } } sub vcl_error { if (req.restarts < 30) { restart; } } ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Fwd: varnish 2.0.4 backend errors
On Wed, Jul 15, 2009 at 10:01:08AM +0200, Lazy wrote: (...) > setting 500 as min threads didn't make any diference, funny thing is > that failed requests are in logged in apache as succesful Sorry if I didn't make myself clear: The output from varnishstat didn't indicate that threads was the issue causing 503s. It was more of a tuning-comment than a shot at solving the real problem. Sorry about the confusion. > 09:23:50.563934 IP x.x.x.x.51235 > x.x.x.x.88: S > 1066103134:1066103134(0) win 32792 552835403 0,nop,wscale 7> () Is it possible for you to mail me the raw cap-file? (Feel free to obfuscate the IPs, but I'm really more of an ethereal-man than tcpdump-output...). Feel free to drop it directly to me, if that makes it easier. > successful POST in apache logs, > > in varnishlog > > 14 Backend c 15 default default (...) The request must've started earlier, as this is just the backend request. Do you have the entire transaction available? (...) > restart shouldn't make another backend connection if current one got closed ? It should, but it should be noted that: 1. We don't know _why_ it fails. 2. Restart in vcl_error is brand new, so there might still be some issues. You could try to turn on vcl-trace. (-p vcl_trace=on). Also, could you post your VCL in it's entirety? (If you already did, I must've missed it.) -- Kristian Lyngstøl Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 pgp6OC1UYDneP.pgp Description: PGP signature ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Fwd: varnish 2.0.4 backend errors
2009/7/15 Lazy : > 2009/7/14 Ken Brownfield : >> On Jul 14, 2009, at 3:05 AM, Kristian Lyngstol wrote: >>> On Tue, Jul 14, 2009 at 11:46:58AM +0200, Lazy wrote: the site is usually not so busy, but it has sometimes spikes of static traffic (about 50Mbps) that's why i upped the thread limit, 3000 was to low >>> >>> I seriously doubt 3k was too low. More likely, the min threads was >>> hurting >>> you. Three thousand threads is quite a bit. Remember that these are >>> actual >>> requests being handled, not keep-alive connections and the like. >> >> I just wanted to humbly second this good advice; if you're familiar >> with Apache, this is akin to making sure your MinSpareServers is set >> to a high enough level to handle any transient spikes by avoiding the >> cost of spawning new processes. Varnish will handle 10x the traffic >> you're seeing in <64 threads. Anything you're seeing is more likely a > > but if backend is slow, (there will be many POSTS form ajax app) > threads wil get used fast if php starts lagging behind, > now there are 200 threads running, without raises in overflowed work > requests, and still there are 503 errors, > i will uppp min threads later and see if it helps > >> concurrency spike causing a temporary slowdown while threads are >> spawned, or Varnish is simply passing on fail or slowness from your >> back-end. > i hope this is it, i will find out tommorrow setting 500 as min threads didn't make any diference, funny thing is that failed requests are in logged in apache as succesful 09:23:50.563934 IP x.x.x.x.51235 > x.x.x.x.88: S 1066103134:1066103134(0) win 32792 09:23:50.563940 IP x.x.x.x.88 > x.x.x.x.51235: S 1060616380:1060616380(0) ack 1066103135 win 32768 09:23:50.563946 IP x.x.x.x.51235 > x.x.x.x.88: . ack 1 win 257 09:23:54.163755 IP x.x.x.x.88 > x.x.x.x.51235: S 1060616380:1060616380(0) ack 1066103135 win 32768 09:23:54.163765 IP x.x.x.x.51235 > x.x.x.x.88: . ack 1 win 257 09:23:55.563738 IP x.x.x.x.51235 > x.x.x.x.88: P 1:886(885) ack 1 win 257 09:23:55.563756 IP x.x.x.x.88 > x.x.x.x.51235: . ack 886 win 270 09:23:55.563838 IP x.x.x.x.51235 > x.x.x.x.88: F 886:886(0) ack 1 win 257 09:23:55.567177 IP x.x.x.x.88 > x.x.x.x.51235: P 1:2882(2881) ack 887 win 270 09:23:55.567196 IP x.x.x.x.51235 > x.x.x.x.88: R 1066104021:1066104021(0) win 0 successful POST in apache logs, in varnishlog 14 Backend c 15 default default 15 TxRequest b POST 15 TxURL b /quiz/question 15 TxProtocol b HTTP/1.1 15 TxHeader b x-requested-with: XMLHttpRequest 15 TxHeader b Accept-Language: pl 15 TxHeader b Accept: text/html, */* 15 TxHeader b Content-Type: application/x-www-form-urlencoded 15 TxHeader b UA-CPU: x86 15 TxHeader b Accept-Encoding: gzip, deflate 15 TxHeader b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30) 15 TxHeader b Host: 15 TxHeader b Content-Length: 8 15 TxHeader b Cookie: xxx 15 TxHeader b X-Varnish: 782633863 15 TxHeader b X-Forwarded-For: xxx 15 BackendClose b default 14 VCL_call c error 14 VCL_return c restart 14 VCL_call c recv 14 VCL_return c pass 14 VCL_call c error 14 VCL_return c restart 14 VCL_call c recv ... 14 Length c 465 14 VCL_call c deliver 14 VCL_return c deliver 14 TxProtocol c HTTP/1.1 14 TxStatus c 503 14 TxResponse c Service Unavailable 14 TxHeader c Server: Varnish 14 TxHeader c Retry-After: 0 14 TxHeader c Content-Type: text/html; charset=utf-8 14 TxHeader c Content-Length: 465 14 TxHeader c Date: Wed, 15 Jul 2009 07:23:55 GMT 14 TxHeader c X-Varnish: 782633863 14 TxHeader c Age: 20 14 TxHeader c Via: 1.1 varnish 14 TxHeader c Connection: close 14 ReqEnd c 782633863 1247642615.566063166 1247642635.564052582 0.006083488 19.997969866 0.19550 14 SessionClose c error restart shouldn't make another backend connection if current one got closed ? a can't find any subsequent retries in tcpdump and sometime's restart works -- Michal Grzedzicki ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish 2.0.4 backend errors
On Jul 14, 2009, at 3:05 AM, Kristian Lyngstol wrote: > On Tue, Jul 14, 2009 at 11:46:58AM +0200, Lazy wrote: >> the site is usually not so busy, but it has sometimes spikes of >> static >> traffic (about 50Mbps) that's why i upped the thread limit, 3000 was >> to low > > I seriously doubt 3k was too low. More likely, the min threads was > hurting > you. Three thousand threads is quite a bit. Remember that these are > actual > requests being handled, not keep-alive connections and the like. I just wanted to humbly second this good advice; if you're familiar with Apache, this is akin to making sure your MinSpareServers is set to a high enough level to handle any transient spikes by avoiding the cost of spawning new processes. Varnish will handle 10x the traffic you're seeing in <64 threads. Anything you're seeing is more likely a concurrency spike causing a temporary slowdown while threads are spawned, or Varnish is simply passing on fail or slowness from your back-end. I'm wondering if this could also be a large object that's taking a while to cache and blocking other children for a while? (rush_exponent) -- Ken. ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish 2.0.4 backend errors
On Tue, Jul 14, 2009 at 11:46:58AM +0200, Lazy wrote: > 2009/7/14 Kristian Lyngstol : > > 6000 threads is too much. Since it's per pool, it'll cause up to 12 000 > > threads to start. That's not likely to go over all that well. If you have > > that sort of traffic, you need to scale out. Also, 10 thread minimum is > > pretty low. > > > > I typically recommend setting the minimum thread count to what you expect > > your normal traffic to be at peak hours. It's probably a dedicated > > machines, and idle threads have barely any overhead, while creating new > > threads can take some time. > > at first i had 3000 threads set and varnish ocassionly droped > connections, so I doubled it > > so what whould be a recomended values ? > will -w 1024,1024 -p thread_pools=6 whould be ok ? 6 thread pools is overkill. And the number of threads is multiplied with the thread pools, so in this case you're essentially writing -w 6k,6k... I'd advice something like -w 200,1200 -p thread_pools=2 to begin with. Just watch the overflows in varnishstat (it'll increase a good bit during startup since it takes a little time to create the 400 threads). It should stay fairly static after startup. > the site is usually not so busy, but it has sometimes spikes of static > traffic (about 50Mbps) that's why i upped the thread limit, 3000 was > to low I seriously doubt 3k was too low. More likely, the min threads was hurting you. Three thousand threads is quite a bit. Remember that these are actual requests being handled, not keep-alive connections and the like. > is it safe to change thread_pools on runtime ? Safe; I'd assume so. But I don't know if it actually takes effect. I've yet to see any good reason to change it from the default. -- Kristian Lyngstøl Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 pgpGG7y6WlKUh.pgp Description: PGP signature ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish 2.0.4 backend errors
2009/7/14 Kristian Lyngstol : > On Sat, Jul 11, 2009 at 12:21:38AM +0200, Lazy wrote: >> We are having hard time figuring out what's cosing varnish 503 error, >> our backend is apache is debian 5 default, os is linux x86_64 2.6.26, >> everything is running on a single machine >> >> /usr/local/sbin/varnishd -a 0.0.0.0:80 -f >> /usr/local/etc/varnish/default.vcl -s malloc -T localhost: -w >> 10,6000,300 -u nobody > > 6000 threads is too much. Since it's per pool, it'll cause up to 12 000 > threads to start. That's not likely to go over all that well. If you have > that sort of traffic, you need to scale out. Also, 10 thread minimum is > pretty low. > > I typically recommend setting the minimum thread count to what you expect > your normal traffic to be at peak hours. It's probably a dedicated > machines, and idle threads have barely any overhead, while creating new > threads can take some time. at first i had 3000 threads set and varnish ocassionly droped connections, so I doubled it so what whould be a recomended values ? will -w 1024,1024 -p thread_pools=6 whould be ok ? the site is usually not so busy, but it has sometimes spikes of static traffic (about 50Mbps) that's why i upped the thread limit, 3000 was to low is it safe to change thread_pools on runtime ? > >> running with a single backend >> .connect_timeout = 1s; added to the backend definition > > Any particular reason for adding that? originally it wasn't there i added it trying to go around the issue > >> I added >> >> sub vcl_error { >> if (req.restarts < 10) { >> restart; >> } >> } >> >> (is it possible to add a pause before doing restart ?) > > No. This is also a dirty workaround for a fundamental problem. > >> In about 0.1% of request we get >> >> 10 TxRequest b POST >> 10 TxURL b /php >> 10 TxProtocol b HTTP/1.1 >> 10 TxHeader b x-requested-with: XMLHttpRequest >> 10 TxHeader b Accept-Language: pl >> 10 TxHeader b Referer: http://www.x/php >> 10 TxHeader b Accept: text/html, */* >> 10 TxHeader b Content-Type: application/x-www-form-urlencoded >> 10 TxHeader b UA-CPU: x86 >> 10 TxHeader b Accept-Encoding: gzip, deflate >> 10 TxHeader b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; >> Windows NT 5.1) >> 10 TxHeader b Content-Length: 8 >> 10 TxHeader b Cookie: _.1 >> 10 TxHeader b X-NovINet: v1.2 >> 10 TxHeader b X-Varnish: 603437812 >> 10 TxHeader b X-Forwarded-For: 79.162.xxx >> 10 BackendClose b default >> 31 VCL_call c error >> 31 VCL_return c deliver >> 31 Length c 465 >> 31 VCL_call c deliver >> 31 VCL_return c deliver >> 31 TxProtocol c HTTP/1.1 >> 31 TxStatus c 503 >> >> machine is not overloaded, there are 150 apache running 80% of them is idle >> >> what does >> 31 VCL_call c error mean , a connection error, apache returned >> invalid response ? > > No, it just means that vcl_error is called. BackendClose notes that the > connection to the backend was closed. > >> can I get some more information about this error using some syslog in >> vcl_error or mayby in some other way ? > > Possibly, but using syslog in vcl is the last thing I'd recommend. > > Does your syslog say anything meaningful? Like assert-errors... no, only info about admin commands > (...) >> 60064 Backend connections failures >> this is old and it's not changing now > > Did the error-rate go down once you solved this? What was causing these > problems? it was related to load testing, in production it went away when i upped maxclients on apache > >> 20 N worker threads >> 4152 N worker threads created >> 0 N worker threads not created >> 0 N worker threads limited >> 0 N queued work requests >> 226847 N overflowed work requests > > This is what I mean with -w 10,6000 being wrong. After the initial startup, > overflowed work requests shouldn't grow much, and you're currently running > at only 20 threads (the minimum), which will cause overflows very fast > (consider how many connections a single client will use to fetch a front > page... You can easily imagine overflowing with just 3-4 concurrent > clients.) > > But that's not really causing any 503s. Just delays while threads are > created (and removed). tcpdump of another 503 (apache is running on port 88), 11:09:50.187842 IP x.x.x.x.50780 > x.x.x.x.88: S 88526893:88526893(0) win 32792 11:09:50.187851 IP x.x.x.x.88 > x.x.x.x.50780: S 81484078:81484078(0) ack 88526894 win 32768 11:09:50.187867 IP x.x.x.x.50780 > x.x.x.x.88: . ack 1 win 257 11:09:53.187730 IP x.x.x.x.88 > x.x.x.x.50780: S 81484078:81484078(0) ack 88526894 win 32768 11:09:53.187740 IP x.x.x.x.50780 > x.x.x.x.88: . ack 1 win 257 11:09:59.191730 IP x.x.x.x.88 > x.x.x.x.50780: S 81484078:81484078(0) ack 88526894 win 32768 11:09:59.191744 IP x.x.x.x.50780 > x.x.x.x.88: . ack 1 win 257 1
Re: varnish 2.0.4 backend errors
On Sat, Jul 11, 2009 at 12:21:38AM +0200, Lazy wrote: > We are having hard time figuring out what's cosing varnish 503 error, > our backend is apache is debian 5 default, os is linux x86_64 2.6.26, > everything is running on a single machine > > /usr/local/sbin/varnishd -a 0.0.0.0:80 -f > /usr/local/etc/varnish/default.vcl -s malloc -T localhost: -w > 10,6000,300 -u nobody 6000 threads is too much. Since it's per pool, it'll cause up to 12 000 threads to start. That's not likely to go over all that well. If you have that sort of traffic, you need to scale out. Also, 10 thread minimum is pretty low. I typically recommend setting the minimum thread count to what you expect your normal traffic to be at peak hours. It's probably a dedicated machines, and idle threads have barely any overhead, while creating new threads can take some time. > running with a single backend > .connect_timeout = 1s; added to the backend definition Any particular reason for adding that? > I added > > sub vcl_error { > if (req.restarts < 10) { > restart; > } > } > > (is it possible to add a pause before doing restart ?) No. This is also a dirty workaround for a fundamental problem. > In about 0.1% of request we get > >10 TxRequestb POST >10 TxURLb /php >10 TxProtocol b HTTP/1.1 >10 TxHeader b x-requested-with: XMLHttpRequest >10 TxHeader b Accept-Language: pl >10 TxHeader b Referer: http://www.x/php >10 TxHeader b Accept: text/html, */* >10 TxHeader b Content-Type: application/x-www-form-urlencoded >10 TxHeader b UA-CPU: x86 >10 TxHeader b Accept-Encoding: gzip, deflate >10 TxHeader b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; > Windows NT 5.1) >10 TxHeader b Content-Length: 8 >10 TxHeader b Cookie: _.1 >10 TxHeader b X-NovINet: v1.2 >10 TxHeader b Host: www.kinograj.cinema-city.pl >10 TxHeader b X-Varnish: 603437812 >10 TxHeader b X-Forwarded-For: 79.162.xxx >10 BackendClose b default >31 VCL_call c error >31 VCL_return c deliver >31 Length c 465 >31 VCL_call c deliver >31 VCL_return c deliver >31 TxProtocol c HTTP/1.1 >31 TxStatus c 503 > > machine is not overloaded, there are 150 apache running 80% of them is idle > > what does > 31 VCL_call c error mean , a connection error, apache returned > invalid response ? No, it just means that vcl_error is called. BackendClose notes that the connection to the backend was closed. > can I get some more information about this error using some syslog in > vcl_error or mayby in some other way ? Possibly, but using syslog in vcl is the last thing I'd recommend. Does your syslog say anything meaningful? Like assert-errors... (...) >60064 Backend connections failures > this is old and it's not changing now Did the error-rate go down once you solved this? What was causing these problems? > 20 N worker threads > 4152 N worker threads created >0 N worker threads not created >0 N worker threads limited >0 N queued work requests > 226847 N overflowed work requests This is what I mean with -w 10,6000 being wrong. After the initial startup, overflowed work requests shouldn't grow much, and you're currently running at only 20 threads (the minimum), which will cause overflows very fast (consider how many connections a single client will use to fetch a front page... You can easily imagine overflowing with just 3-4 concurrent clients.) But that's not really causing any 503s. Just delays while threads are created (and removed). -- Kristian Lyngstøl Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 pgpGupl4jmDwn.pgp Description: PGP signature ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish 2.0.4 backend errors
2009/7/13 Lazy : > One more thing > I set the processor affinity for all varnish threads so it will run on > a single cpu, to check if it's not some race condition, so far I > failed to trigger any errors by using ab, I will see tommorow when > some real traffic will show up. didn't make any diference ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish 2.0.4 backend errors
2009/7/14 Ken Brownfield : > The progression from your dump is: > > Varnish Apache > SYN > SYN+ACK > ACK > ...4.4 seconds later... > SYN+ACK > ACK > PSH+ACK > ACK > FIN+ACK (???) > PSH+ACK > RST right i missed the 4second gap, so varnish may be hitting a timeout > > It looks like the ACK from the Varnish side is getting lost on its way to > Apache, and Apache retransmits. This itself would imply that you have some > packet loss between Varnish and Apache. That 4.4 second delay could easily > be running you against the sess_timeout (or cli_timeout, maybe). Bumping > those up to 10 might clear the issue for you. > > Obviously, packet loss will cause issues in general, so I'd investigate that > anyway. it's a local interface, not lo but the traffic is going threw loopback interface > > More bizarrely, that FIN+ACK is missing a FIN from Apache. Are you sure > this stream is correct? I have no idea what would cause a spurious FIN+ACK > (besides a spurious FIN) and the issue would be highly unlikely a Varnish > issue; more likely a kernel TCP stack issue. > > Is there anything between Varnish and Apache in your config, besides two > machines on a shared switch? Proxies? Firewall rules? Switch ACLs? What > OSes/versions are you running on each side? it's only 1 machine ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish 2.0.4 backend errors
One more thing I set the processor affinity for all varnish threads so it will run on a single cpu, to check if it's not some race condition, so far I failed to trigger any errors by using ab, I will see tommorow when some real traffic will show up. 2009/7/13 Ken Brownfield : > I would try correlating these 503's with actual Apache log lines. That way, > you'll see what Apache told Varnish (if it made it to Apache at all). > > Also, try tcpdumping the backend port to a file, and then use wireshark to > go back to the traffic at the time you see another 503. > > Varnish should be logging failed health checks, which could explain periodic > 503s. If the health checks are succeeding, then Apache is likely > periodically closing a connection or sending bad data. > > You could also try turning off HTTP 1.1 to Apache from Varnish > (backend_http11=off) to see if that does anything. With Varnish fronting > Apache, doing 1.1 to the backend isn't necessarily a huge win. > > In the 503 case specifically, I'm not sure if restarts are respected. Does > this happen if you relax the connect_timeout, or if you remove the restart? > Does this happen only with POST? Do you have any custom settings in > vcl_pipe(), or do you bail out of vcl_recv() without falling through to the > default VCL? > > Finally, did you edit the Cookie header? It's malformed, which in theory > could cause either Apache or Varnish to dump it. > -- > Ken. > > On Jul 10, 2009, at 3:21 PM, Lazy wrote: > >> Hi, >> >> We are having hard time figuring out what's cosing varnish 503 error, >> our backend is apache is debian 5 default, os is linux x86_64 2.6.26, >> everything is running on a single machine >> >> /usr/local/sbin/varnishd -a 0.0.0.0:80 -f >> /usr/local/etc/varnish/default.vcl -s malloc -T localhost: -w >> 10,6000,300 -u nobody >> >> running with a single backend >> .connect_timeout = 1s; added to the backend definition >> >> I added >> >> sub vcl_error { >> if (req.restarts < 10) { >> restart; >> } >> } >> >> (is it possible to add a pause before doing restart ?) >> >> which helps in some cases but not always >> >> >> In about 0.1% of request we get >> >> 10 TxRequest b POST >> 10 TxURL b /php >> 10 TxProtocol b HTTP/1.1 >> 10 TxHeader b x-requested-with: XMLHttpRequest >> 10 TxHeader b Accept-Language: pl >> 10 TxHeader b Referer: http://www.x/php >> 10 TxHeader b Accept: text/html, */* >> 10 TxHeader b Content-Type: application/x-www-form-urlencoded >> 10 TxHeader b UA-CPU: x86 >> 10 TxHeader b Accept-Encoding: gzip, deflate >> 10 TxHeader b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; >> Windows NT 5.1) >> 10 TxHeader b Content-Length: 8 >> 10 TxHeader b Cookie: _.1 >> 10 TxHeader b X-NovINet: v1.2 >> 10 TxHeader b Host: www.kinograj.cinema-city.pl >> 10 TxHeader b X-Varnish: 603437812 >> 10 TxHeader b X-Forwarded-For: 79.162.xxx >> 10 BackendClose b default >> 31 VCL_call c error >> 31 VCL_return c deliver >> 31 Length c 465 >> 31 VCL_call c deliver >> 31 VCL_return c deliver >> 31 TxProtocol c HTTP/1.1 >> 31 TxStatus c 503 >> >> machine is not overloaded, there are 150 apache running 80% of them is >> idle >> >> what does >> 31 VCL_call c error mean , a connection error, apache returned >> invalid response ? >> >> can I get some more information about this error using some syslog in >> vcl_error or mayby in some other way ? >> >> my param.show >> accept_fd_holdoff 50 [ms] >> acceptor default (epoll, poll) >> auto_restart on [bool] >> backend_http11 on [bool] >> between_bytes_timeout 60.00 [s] >> cache_vbe_conns off [bool] >> cc_command "exec cc -fpic -shared -Wl,-x -o %o %s" >> cli_buffer 8192 [bytes] >> cli_timeout 5 [seconds] >> client_http11 off [bool] >> clock_skew 10 [s] >> connect_timeout 1.00 [s] >> default_grace 10 >> default_ttl 120 [seconds] >> diag_bitmap 0x0 [bitmap] >> err_ttl 0 [seconds] >> esi_syntax 0 [bitmap] >> fetch_chunksize 128 [kilobytes] >> first_byte_timeout 60.00 [s] >> group nogroup (65534) >> listen_address 0.0.0.0:80 >> listen_depth 1024 [connections] >> log_hashstring off [bool] >> log_local_address off [bool] >> lru_interval 2 [seconds] >> max_esi_includes 5 [includes] >> max_restarts 4 [restarts] >> obj_workspace 8192 [bytes] >> overflow_max 100 [%] >> ping_interval 3 [seconds] >> pipe_timeout 60 [seconds] >> prefer_ipv6 off [bool] >> purge_dups off [bool] >> purge_hash on [bool] >> rush_exponent
Fwd: varnish 2.0.4 backend errors
2009/7/13 Ken Brownfield : > I would try correlating these 503's with actual Apache log lines. That way, > you'll see what Apache told Varnish (if it made it to Apache at all). there are no sign of thiese requests in apache logs > > Also, try tcpdumping the backend port to a file, and then use wireshark to > go back to the traffic at the time you see another 503. it happens with or without keep-alive (it was set before and I disabled it recently) it's likelly the same issue as in "tcp reset problem with varnish 2.0.4 o n Solaris 10 (SPARC)" thread it happens on 0,09% of request's, mostly POST'S but not only now I rechecked the logs and I found that varnish closed the connection just after sending the request tcpdump follows, there are no backend connection errors 01:03:50.181412 IP x.x.x.x.48563 > x.x.x.x.88: S 3999574413:3999574413(0) win 32792 0x: 4500 003c 6586 4000 4006 3b56 566f f680 E.. x.x.x.x.48563: S 4006066350:4006066350(0) ack 3999574414 win 32768 0x: 4500 003c 4000 4006 a0dc 566f f680 E..<@.@...Vo.. 0x0010: 566f f680 0058 bdb3 eec7 b8ae ee64 a98e Vo...X...d.. 0x0020: a012 8000 2776 0204 400c 0402 080a 'v@. 0x0030: 1b5a cc0b 1b5a cc0b 0103 0307 .Z...Z.. 01:03:50.181433 IP x.x.x.x.48563 > x.x.x.x.88: . ack 1 win 257 0x: 4500 0034 6587 4000 4006 3b5d 566f f680 e.@.@.;]Vo.. 0x0010: 566f f680 bdb3 0058 ee64 a98e eec7 b8af Vo.X.d.. 0x0020: 8010 0101 0f9a 0101 080a 1b5a cc0b .Z.. 0x0030: 1b5a cc0b .Z.. 01:03:54.581224 IP x.x.x.x.88 > x.x.x.x.48563: S 4006066350:4006066350(0) ack 3999574414 win 32768 0x: 4500 003c 4000 4006 a0dc 566f f680 E..<@.@...Vo.. 0x0010: 566f f680 0058 bdb3 eec7 b8ae ee64 a98e Vo...X...d.. 0x0020: a012 8000 232a 0204 400c 0402 080a #*@. 0x0030: 1b5a d057 1b5a cc0b 0103 0307 .Z.W.Z.. 01:03:54.581245 IP x.x.x.x.48563 > x.x.x.x.88: . ack 1 win 257 0x: 4500 0040 6588 4000 4006 3b50 566f f680 ...@e.@@.;PVo.. 0x0010: 566f f680 bdb3 0058 ee64 a98e eec7 b8af Vo.X.d.. 0x0020: b010 0101 81fc 0101 080a 1b5a d057 .Z.W 0x0030: 1b5a d057 0101 050a eec7 b8ae eec7 b8af .Z.W 01:03:55.181239 IP x.x.x.x.48563 > x.x.x.x.88: P 1:896(895) ack 1 win 257 0x: 4500 03b3 6589 4000 4006 37dc 566f f680 e.@.@.7.Vo.. 0x0010: 566f f680 bdb3 0058 ee64 a98e eec7 b8af Vo.X.d.. 0x0020: 8018 0101 9d85 0101 080a 1b5a d0ed .Z.. 0x0030: 1b5a d057 504f 5354 202f 7175 697a 2f71 .Z.WPOST./quiz/q 0x0040: 7565 7374 696f 6e20 4854 5450 2f31 2e31 uestion.HTTP/1.1 0x0050: 0d0a 782d 7265 7175 6573 7465 642d 7769 ..x-requested-wi 0x0060: 7468 3a20 584d 4c48 7474 7052 6571 7565 th:.XMLHttpReque .. request data 01:03:55.181256 IP x.x.x.x.88 > x.x.x.x.48563: . ack 896 win 270 0x: 4500 0034 6842 4000 4006 38a2 566f f680 e.@.@.8.Vo.. 0x0010: 566f f680 0058 bdb3 eec7 b8af ee64 ad0d Vo...X...d.. 0x0020: 8010 010e 024a 0101 080a 1b5a d0ed .J...Z.. 0x0030: 1b5a d0ed .Z.. ... acked 01:03:55.181302 IP x.x.x.x.48563 > x.x.x.x.88: F 896:896(0) ack 1 win 257 0x: 4500 0034 658a 4000 4006 3b5a 566f f680 e.@.@.;ZVo.. 0x0010: 566f f680 bdb3 0058 ee64 ad0d eec7 b8af Vo.X.d.. 0x0020: 8011 0101 0256 0101 080a 1b5a d0ed .V...Z.. 0x0030: 1b5a d0ed .Z.. ... connection closed ??? 01:03:55.185068 IP x.x.x.x.88 > x.x.x.x.48563: P 1:2485(2484) ack 897 win 270 0x: 4500 09e8 6843 4000 4006 2eed 566f f680 e.@.@...Vo.. 0x0010: 566f f680 0058 bdb3 eec7 b8af ee64 ad0e Vo...X...d.. 0x0020: 8018 010e a3ba 0101 080a 1b5a d0ed .Z.. 0x0030: 1b5a d0ed 4854 5450 2f31 2e31 2032 3030 .Z..HTTP/1.1.200 0x0040: 204f 4b0d 0a44 6174 653a 2046 7269 2c20 .OK..Date:.Fri,. 0x0050: 3130 204a 756c 2032 3030 3920 3233 3a30 10.Jul.2009.23:0 0x0060: 333a 3535 2047 4d54 0d0a 5365 7276 6572 3:55.GMT..Server 0x0070: 3a20 4170 6163 6865 0d0a 4578 7069 7265 :.Apache..Expire 0x0080: 733a 2054 6875 2c20 3139 204e 6f76 2031 s:.Thu,.19.Nov.1 0x0090: 3938 3120 3038 3a35 323a 3030 2047 4d54 981.08:52:00.GMT 0x00a0: 0d0a 4361 6368 652d 436f 6e74 726f 6c3a ..Cache-Control:
varnish 2.0.4 backend errors
Hi, We are having hard time figuring out what's cosing varnish 503 error, our backend is apache is debian 5 default, os is linux x86_64 2.6.26, everything is running on a single machine /usr/local/sbin/varnishd -a 0.0.0.0:80 -f /usr/local/etc/varnish/default.vcl -s malloc -T localhost: -w 10,6000,300 -u nobody running with a single backend .connect_timeout = 1s; added to the backend definition I added sub vcl_error { if (req.restarts < 10) { restart; } } (is it possible to add a pause before doing restart ?) which helps in some cases but not always In about 0.1% of request we get 10 TxRequestb POST 10 TxURLb /php 10 TxProtocol b HTTP/1.1 10 TxHeader b x-requested-with: XMLHttpRequest 10 TxHeader b Accept-Language: pl 10 TxHeader b Referer: http://www.x/php 10 TxHeader b Accept: text/html, */* 10 TxHeader b Content-Type: application/x-www-form-urlencoded 10 TxHeader b UA-CPU: x86 10 TxHeader b Accept-Encoding: gzip, deflate 10 TxHeader b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1) 10 TxHeader b Content-Length: 8 10 TxHeader b Cookie: _.1 10 TxHeader b X-NovINet: v1.2 10 TxHeader b Host: www.kinograj.cinema-city.pl 10 TxHeader b X-Varnish: 603437812 10 TxHeader b X-Forwarded-For: 79.162.xxx 10 BackendClose b default 31 VCL_call c error 31 VCL_return c deliver 31 Length c 465 31 VCL_call c deliver 31 VCL_return c deliver 31 TxProtocol c HTTP/1.1 31 TxStatus c 503 machine is not overloaded, there are 150 apache running 80% of them is idle what does 31 VCL_call c error mean , a connection error, apache returned invalid response ? can I get some more information about this error using some syslog in vcl_error or mayby in some other way ? my param.show accept_fd_holdoff 50 [ms] acceptor default (epoll, poll) auto_restart on [bool] backend_http11 on [bool] between_bytes_timeout 60.00 [s] cache_vbe_connsoff [bool] cc_command "exec cc -fpic -shared -Wl,-x -o %o %s" cli_buffer 8192 [bytes] cli_timeout5 [seconds] client_http11 off [bool] clock_skew 10 [s] connect_timeout1.00 [s] default_grace 10 default_ttl120 [seconds] diag_bitmap0x0 [bitmap] err_ttl0 [seconds] esi_syntax 0 [bitmap] fetch_chunksize128 [kilobytes] first_byte_timeout 60.00 [s] group nogroup (65534) listen_address 0.0.0.0:80 listen_depth 1024 [connections] log_hashstring off [bool] log_local_address off [bool] lru_interval 2 [seconds] max_esi_includes 5 [includes] max_restarts 4 [restarts] obj_workspace 8192 [bytes] overflow_max 100 [%] ping_interval 3 [seconds] pipe_timeout 60 [seconds] prefer_ipv6off [bool] purge_dups off [bool] purge_hash on [bool] rush_exponent 3 [requests per request] send_timeout 600 [seconds] sess_timeout 5 [seconds] sess_workspace 16384 [bytes] session_linger 0 [ms] shm_reclen 255 [bytes] shm_workspace 8192 [bytes] srcaddr_hash 1049 [buckets] srcaddr_ttl30 [seconds] thread_pool_add_delay 20 [milliseconds] thread_pool_add_threshold 2 [requests] thread_pool_fail_delay 200 [milliseconds] thread_pool_max6000 [threads] thread_pool_min10 [threads] thread_pool_purge_delay1000 [milliseconds] thread_pool_timeout300 [seconds] thread_pools 2 [pools] user nobody (65534) vcl_trace off [bool] stats 4571680 Client connections accepted 13074671 Client requests received 9246516 Cache hits 8084 Cache hits for pass 159743 Cache misses 3768909 Backend connections success 0 Backend connections not attempted 0 Backend connections too many 60064 Backend connections failures this is old and it's not changing now 2610440 Backend connections reuses 3471493 Backend connections recycles 0 Backend connections unused 1053 N struct srcaddr 6 N active struct srcaddr 1017 N struct sess_mem 60 N struct sess 852 N struct object 4183 N struct objecthead 0 N struct smf 0 N small free smf 0 N large free smf 22 N struct vbe_conn 494 N struct bereq 20 N worker threads 4152 N worker threads created 0 N worker thr