One more thing I set the processor affinity for all varnish threads so it will run on a single cpu, to check if it's not some race condition, so far I failed to trigger any errors by using ab, I will see tommorow when some real traffic will show up.
2009/7/13 Ken Brownfield <[email protected]>: > I would try correlating these 503's with actual Apache log lines. That way, > you'll see what Apache told Varnish (if it made it to Apache at all). > > Also, try tcpdumping the backend port to a file, and then use wireshark to > go back to the traffic at the time you see another 503. > > Varnish should be logging failed health checks, which could explain periodic > 503s. If the health checks are succeeding, then Apache is likely > periodically closing a connection or sending bad data. > > You could also try turning off HTTP 1.1 to Apache from Varnish > (backend_http11=off) to see if that does anything. With Varnish fronting > Apache, doing 1.1 to the backend isn't necessarily a huge win. > > In the 503 case specifically, I'm not sure if restarts are respected. Does > this happen if you relax the connect_timeout, or if you remove the restart? > Does this happen only with POST? Do you have any custom settings in > vcl_pipe(), or do you bail out of vcl_recv() without falling through to the > default VCL? > > Finally, did you edit the Cookie header? It's malformed, which in theory > could cause either Apache or Varnish to dump it. > -- > Ken. > > On Jul 10, 2009, at 3:21 PM, Lazy wrote: > >> Hi, >> >> We are having hard time figuring out what's cosing varnish 503 error, >> our backend is apache is debian 5 default, os is linux x86_64 2.6.26, >> everything is running on a single machine >> >> /usr/local/sbin/varnishd -a 0.0.0.0:80 -f >> /usr/local/etc/varnish/default.vcl -s malloc -T localhost:9999 -w >> 10,6000,300 -u nobody >> >> running with a single backend >> .connect_timeout = 1s; added to the backend definition >> >> I added >> >> sub vcl_error { >> if (req.restarts < 10) { >> restart; >> } >> } >> >> (is it possible to add a pause before doing restart ?) >> >> which helps in some cases but not always >> >> >> In about 0.1% of request we get >> >> 10 TxRequest b POST >> 10 TxURL b /php >> 10 TxProtocol b HTTP/1.1 >> 10 TxHeader b x-requested-with: XMLHttpRequest >> 10 TxHeader b Accept-Language: pl >> 10 TxHeader b Referer: http://www.xxxxx/php >> 10 TxHeader b Accept: text/html, */* >> 10 TxHeader b Content-Type: application/x-www-form-urlencoded >> 10 TxHeader b UA-CPU: x86 >> 10 TxHeader b Accept-Encoding: gzip, deflate >> 10 TxHeader b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; >> Windows NT 5.1) >> 10 TxHeader b Content-Length: 8 >> 10 TxHeader b Cookie: _.1 >> 10 TxHeader b X-NovINet: v1.2 >> 10 TxHeader b Host: www.kinograj.cinema-city.pl >> 10 TxHeader b X-Varnish: 603437812 >> 10 TxHeader b X-Forwarded-For: 79.162.xxx >> 10 BackendClose b default >> 31 VCL_call c error >> 31 VCL_return c deliver >> 31 Length c 465 >> 31 VCL_call c deliver >> 31 VCL_return c deliver >> 31 TxProtocol c HTTP/1.1 >> 31 TxStatus c 503 >> >> machine is not overloaded, there are 150 apache running 80% of them is >> idle >> >> what does >> 31 VCL_call c error mean , a connection error, apache returned >> invalid response ? >> >> can I get some more information about this error using some syslog in >> vcl_error or mayby in some other way ? >> >> my param.show >> accept_fd_holdoff 50 [ms] >> acceptor default (epoll, poll) >> auto_restart on [bool] >> backend_http11 on [bool] >> between_bytes_timeout 60.000000 [s] >> cache_vbe_conns off [bool] >> cc_command "exec cc -fpic -shared -Wl,-x -o %o %s" >> cli_buffer 8192 [bytes] >> cli_timeout 5 [seconds] >> client_http11 off [bool] >> clock_skew 10 [s] >> connect_timeout 1.000000 [s] >> default_grace 10 >> default_ttl 120 [seconds] >> diag_bitmap 0x0 [bitmap] >> err_ttl 0 [seconds] >> esi_syntax 0 [bitmap] >> fetch_chunksize 128 [kilobytes] >> first_byte_timeout 60.000000 [s] >> group nogroup (65534) >> listen_address 0.0.0.0:80 >> listen_depth 1024 [connections] >> log_hashstring off [bool] >> log_local_address off [bool] >> lru_interval 2 [seconds] >> max_esi_includes 5 [includes] >> max_restarts 4 [restarts] >> obj_workspace 8192 [bytes] >> overflow_max 100 [%] >> ping_interval 3 [seconds] >> pipe_timeout 60 [seconds] >> prefer_ipv6 off [bool] >> purge_dups off [bool] >> purge_hash on [bool] >> rush_exponent 3 [requests per request] >> send_timeout 600 [seconds] >> sess_timeout 5 [seconds] >> sess_workspace 16384 [bytes] >> session_linger 0 [ms] >> shm_reclen 255 [bytes] >> shm_workspace 8192 [bytes] >> srcaddr_hash 1049 [buckets] >> srcaddr_ttl 30 [seconds] >> thread_pool_add_delay 20 [milliseconds] >> thread_pool_add_threshold 2 [requests] >> thread_pool_fail_delay 200 [milliseconds] >> thread_pool_max 6000 [threads] >> thread_pool_min 10 [threads] >> thread_pool_purge_delay 1000 [milliseconds] >> thread_pool_timeout 300 [seconds] >> thread_pools 2 [pools] >> user nobody (65534) >> vcl_trace off [bool] >> >> stats >> 4571680 Client connections accepted >> 13074671 Client requests received >> 9246516 Cache hits >> 8084 Cache hits for pass >> 159743 Cache misses >> 3768909 Backend connections success >> 0 Backend connections not attempted >> 0 Backend connections too many >> 60064 Backend connections failures >> this is old and it's not changing now >> >> 2610440 Backend connections reuses >> 3471493 Backend connections recycles >> 0 Backend connections unused >> 1053 N struct srcaddr >> 6 N active struct srcaddr >> 1017 N struct sess_mem >> 60 N struct sess >> 852 N struct object >> 4183 N struct objecthead >> 0 N struct smf >> 0 N small free smf >> 0 N large free smf >> 22 N struct vbe_conn >> 494 N struct bereq >> 20 N worker threads >> 4152 N worker threads created >> 0 N worker threads not created >> 0 N worker threads limited >> 0 N queued work requests >> 226847 N overflowed work requests >> 0 N dropped work requests >> 2 N backends >> 159680 N expired objects >> 0 N LRU nuked objects >> 0 N LRU saved objects >> 4008654 N LRU moved objects >> 0 N objects on deathrow >> 0 HTTP header overflows >> 0 Objects sent with sendfile >> 5482793 Objects sent with write >> 0 Objects overflowing workspace >> 4564281 Total Sessions >> 13075788 Total Requests >> 25 Total pipe >> 3669512 Total pass >> 3766737 Total fetch >> 3167362228 Total header bytes >> 129249988603 Total body bytes >> 446282 Session Closed >> 22017 Session Pipeline >> 78623 Session Read Ahead >> 0 Session Linger >> 12628704 Session herd >> 650794078 SHM records >> 44487143 SHM writes >> 627 SHM flushes due to overflow >> 81635 SHM MTX contention >> 283 SHM cycles through buffer >> 0 allocator requests >> 0 outstanding allocations >> 0 bytes allocated >> 0 bytes free >> 7558668 SMA allocator requests >> 122 SMA outstanding allocations >> 1781820 SMA outstanding bytes >> 198293321655 SMA bytes allocated >> 198291539835 SMA bytes free >> 62427 SMS allocator requests >> 0 SMS outstanding allocations >> 18446744073709546036 SMS outstanding bytes >> 29026230 SMS bytes allocated >> 29028555 SMS bytes freed >> 3766947 Backend requests made >> 4 N vcl total >> 4 N vcl available >> 0 N vcl discarded >> 1 N total active purges >> 1 N new purges added >> 0 N old purges deleted >> 0 N objects tested >> 0 N regexps tested against >> 0 N duplicate purges removed >> 0 HCB Lookups without lock >> 0 HCB Lookups with lock >> 0 HCB Inserts >> 0 Objects ESI parsed (unlock) >> 0 ESI parse errors (unlock) >> >> >> I would be grateful if anyone gave me some pointers about where to look >> next. >> _______________________________________________ >> varnish-misc mailing list >> [email protected] >> http://projects.linpro.no/mailman/listinfo/varnish-misc > > _______________________________________________ varnish-misc mailing list [email protected] http://projects.linpro.no/mailman/listinfo/varnish-misc
