503 errors when services available and other connections succeed to same service during failure window
Hello haproxy list: We are running haproxy on Ubuntu 10.04 LTS 32bit with the stock kernel and are experiencing an intermittent problem on the order of once per minute where a connection attempt to the backend servers is unable to be established and a 503 is sent to the end application or user browser. While the problem is happening with a given connection, other connections to the same application are executing successfully on the backend servers from both the perspective of haproxy and the backend applications. I am getting someone else to re-verify this, but we are not seeing the SYN packets without corresponding SYN/ACK to establish the connection to the backend servers and suspect that the packets are not being sent despite the system calls to the contrary or that we are looking for the wrong part of the conversation. We've been trying to track down the source of these problems for a few days, any assistance will be greatly appreciated. -Jerry Jul 31 03:36:45 localhost haproxy[24182]: 172.17.48.32:60377 [31/Jul/2010:03:36:35.160] ssbe-core internal_core/pn1 0/10067/-1/-1/10068 503 212 - - SC-- 16/16/15/7/+10 0/0 POST /observations HTTP/1.1 Jul 31 03:36:45 localhost haproxy[24182]: 172.17.48.32:60379 [31/Jul/2010:03:36:35.252] ssbe-core internal_core/pn2 0/10084/-1/-1/10085 503 212 - - SC-- 15/15/14/7/+10 0/0 GET /clients/H206923/hosts HTTP/1.1 *Jul 31 03:36:55 localhost haproxy[24182]: 172.17.48.31:44757 [31/Jul/2010:03:36:45.498] ssbe-core internal_core/pn1 0/10089/-1/-1/10090 503 212 - - SC-- 15/15/13/7/+10 0/0 GET /hosts/45/metrics HTTP/1.1* Strace usage lb1: ~ 16 - strace -p 24182 -t -e trace=network -o haproxy.strace Here are the relevant lines from the strace (we're only looking at the request for /hosts/45/metrics here, don't be fooled by the coincident 03:36:45 timestamps of the other two SC entries above). 03:36:45 accept(12, {sa_family=AF_INET, sin_port=htons(44757), sin_addr=inet_addr(172.17.48.31)}, [16]) = 34 03:36:45 setsockopt(34, SOL_TCP, TCP_NODELAY, [1], 4) = 0 03:36:45 recvfrom(34, GET /hosts/45/metrics HTTP/1.1\r\n..., 8192, MSG_NOSIGNAL, NULL, NULL) = 428 03:36:45 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 35 03:36:45 setsockopt(35, SOL_TCP, TCP_NODELAY, [1], 4) = 0 03:36:45 connect(35, {sa_family=AF_INET, sin_port=htons(44757), sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in progress) 03:36:45 sendto(35, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) 03:36:45 recvfrom(34, 0xb5f4b6, 15930, 16384, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 03:36:45 sendto(35, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused) 03:36:46 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 40 03:36:46 setsockopt(40, SOL_TCP, TCP_NODELAY, [1], 4) = 0 03:36:46 connect(40, {sa_family=AF_INET, sin_port=htons(44757), sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in progress) 03:36:46 sendto(40, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) 03:36:46 recvfrom(39, GET /clients/H205604/hosts HTTP/..., 8192, MSG_NOSIGNAL, NULL, NULL) = 893 03:36:46 sendto(40, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused) 03:36:47 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 39 03:36:47 setsockopt(39, SOL_TCP, TCP_NODELAY, [1], 4) = 0 03:36:47 connect(39, {sa_family=AF_INET, sin_port=htons(44757), sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in progress) 03:36:47 sendto(39, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) 03:36:47 sendto(39, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused) 03:36:48 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 57 03:36:48 setsockopt(57, SOL_TCP, TCP_NODELAY, [1], 4) = 0 03:36:48 connect(57, {sa_family=AF_INET, sin_port=htons(44757), sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in progress) 03:36:48 sendto(57, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) 03:36:48 sendto(57, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused) [ repeat connection attempt once per second for 10 total seconds ...] 03:36:55 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 23 03:36:55 setsockopt(23, SOL_TCP, TCP_NODELAY, [1], 4) = 0 03:36:55 connect(23, {sa_family=AF_INET, sin_port=htons(44757), sin_addr=inet_addr(172.17.48.31)}, 16) = -1 EINPROGRESS (Operation now in progress) 03:36:55 sendto(23, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
Re: HAProxy, Apache, Ruby Server Timeout Errors (sH)
Willy, You rock. Thanks for the help. Good news is that it shows that apache is genuinely taking more then 50 seconds to respond (see my example below). What's interesting is that when I look at the ruby production log... I see about a convenient 50 second gap. That said, it's clear that it's a communication issue between Apache and Ruby. Definitely not an issue with HAProxy. I'll dig in and ferret out the issue, but I wanted to close the loop. Thanks again, PhilD --- Log Entries --- LB01 - HAProxy Error Log (/var/log/haproxy.err): Jul 31 19:10:31 localhost haproxy[17404]: 72.148.201.189:50318 [31/Jul/2010:19:09:41.950] http_proxy primary_farm/WS01 10/0/0/-1/50014 504 194 - - sH-- 0/0/0/0/0 0/0 GET /tidbits.json?API=5Tb44zB-OfBWAyIQYOIOversion=2.5platform_id=631daac59543ed898fb00713ccc4523377af38ffcity_short_name=atlantastate=GA HTTP/1.1 WS01 - Apache Log (/var/log/apache2/access.log): 72.148.201.189 - - [31/Jul/2010:19:09:41 -0400] GET /tidbits.json?API=5Tb44zB-OfBWAyIQYOIOversion=2.5platform_id=631daac59543ed898fb00713ccc4523377af38ffcity_short_name=atlantastate=GA HTTP/1.1 200 0 - Appcelerator Titanium/1.3.3 (iPhone/4.0.1; iPhone OS; en_US;) *50042434 * WS01-VanSkeevy - Request-Log-Analyzer - Ruby Production Log: Jul 31 19:09:41 WS01 rails[23423]: Completed in 111ms (View: 74, DB: 29) | 200 OK [ http://api.scoutmob.com/unexpired/f2ad0a459da29b4d83c86e8134a2315ff2364a884a27e26dee74f374704c0335.json?API=5Tb44zB-OfBWAyIQYOIOversion=2.5platform_id=9f1f94e32ff79c816549263979cc2fccc7c655dacity_short_name=atlantastate=GA ] Jul 31 19:10:31 WS01 rails[23423]: Processing DealsController#unexpired to json (for 72.148.201.189 at 2010-07-31 19:10:31) [GET] PhilD http://www.scoutmob.com/*Phil Dupont* p: 404-273-7079 e: ph...@scoutmob.com In walking, just walk. In sitting, just sit. Above all, don't wobble. -Yun-Men On Fri, Jul 30, 2010 at 5:28 PM, Phil Dupont ph...@scoutmob.com wrote: Thanks Willy. When things quiet down this weekend I'll make some changes to capture the data in the logs and see what that turns up. PhilD On Fri, Jul 30, 2010 at 5:07 PM, Willy Tarreau w...@1wt.eu wrote: On Fri, Jul 30, 2010 at 04:33:16PM -0400, Phil Dupont wrote: I guess that's my point... I don't know where else to check. There's nothing between the HAProxy box and the webservers but a switch. Certainly nothing I can get logs from. If I drill into an example, I end up with nothing. See below for the logs of one scenario I tried to trace. But essentially: - I see an HAProxy 504 error for someone trying to load http://scoutmob.com/new-york/howitworks - Then I check the apache access logs (not the error logs because the error logs don't contain the user's IP addess, just the proxy's IP address) and see the connection was successful (passed a 200) Unfortunately there aren't any timer in your apache logs. What you see is that apache correctly received the request and correctly responded. But you don't know how much time it took for that. If it takes more than 50s you have configured on your haproxy as the server timeout, the response will never get back to the client. From the doc below, it seems it's %D that you need to add to CustomLog to add the timer in the logs : http://httpd.apache.org/docs/2.2/mod/mod_log_config.html - Then I run request-log-analyzer against the ruby log and see that the method that creates the /new-york/howitworks takes a max of .01s to load. Most often when you see a timeout between two levels of proxies and a ruby server, it's because it took ages to the ruby server to accept the request (for whatever reasons, starting with mongrel's 1-connection limit). Thus, while the second proxy waits for the connection to establish, the first one waits for the second proxy to respond, as it does not know it can't even connect. Sometimes even the second proxy can connect but its request remains for ages in a queue, waiting for being dispatched. Your ruby logs just indicate it took 10ms to process the request once it got there. There's a similar problem with some Java-based application servers that make use of thread pools with a front dispatched using a deep queue. But if you look below, there's no doubt it really took 50 seconds to timeout : Jul 29 20:21:47 localhost haproxy[17405]: 68.174.74.20:55350[29/Jul/2010:20:20:57.478] http_proxy primary_farm/WS01 4/0/1/-1/50006 504 194 - - sH-- 14/14/14/13/0 0/0 GET /new-york/howitworks HTTP/1.1 Here, all we know is that apache received the request immediately after haproxy, at the same second : 68.174.74.20 - - [29/Jul/2010:20:20:57 -0400] GET /new-york/howitworks HTTP/1.1 200 0 http://scoutmob.com/new-york/deal; Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4 But we don't know when it got the response. Also, have you noticed how the response was empty ?
Re: 503 errors when services available and other connections succeed to same service during failure window
To rule out bug fixes since 1.3.22, I upgraded to the latest stable release and the problem persists. Is it possible that the switching to epoll will fix this problem? -Jerry r...@lb1:/opt/haproxy/share/man# /opt/haproxy/sbin/haproxy -vv HA-Proxy version 1.4.8 2010/06/16 Copyright 2000-2010 Willy Tarreau w...@1wt.eu Build options : TARGET = linux26 CPU = generic CC = gcc CFLAGS = -O2 -g OPTIONS = USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Available polling systems : sepoll : pref=400, test result OK epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 4 (4 usable), will use sepoll. Jerry Champlin Absolute Performance Inc. -- Enabling businesses to deliver critical applications at lower cost and higher value to their customers.
BALASAN
Dear, Perkenalkan nama saya Ria Citra Saya ringgal di London namun berasal dari Indonesia . Saya bertemu seorang bankir. Mr Clement berniat untuk datang berinvestasi di Indonesia.Dia menasihati saya untuk mencari seorang teman di Indonesia yang dapat membantu dia berinvestasi. Apakaha Anda bisa membantu Mr Clement? Mr Clement sendiri adalah seorang Direktur Executive yang mengurus pengkreditan dan Nota Luar Negeri di Bank Of Citibank (UK) PLC. Beralamatkan di 11 Earl Grey Street Edinburgh EH3 9B. Dia ingin mentransfer uang sebanyak 5,000,000.00 pounds dan menggunakannya untuk berinvestasi di Indonesia . Apabila anda berminat membantu, mohon anda bisa membantu juga dalam: 1. Membantu dalam hal transfer dana yang dimaksud 2. Memberikan arahan dalam hal-hal yang patut diinvestasikan 3. Membantu membeli harta perumahan di Indonesia Apabila anda berniat untuk membantu dia dalam tranksaksi ini, anda diberikan 30% dari jumlah dana 5, 000,000.00 Pounds. Mr Clement akan sangat senang apabila anda bisa membantu dengan memberikan NOMOR TELEPON anda. Ini supaya Mr Clement bisa memberikan informasi lengkap tentang cara, prosedur dan berbagai macam hal yang berhubungan agar dana ini bisa ditransfer atau tersedia untuk anda. Saya menunggu balasan dan reaksi positif anda. Apabila anda BERMINAT membantu saya dan Mr.Clement, silahkan memberikan BALASAN BERUPA EMAIL: KE citra.ria2...@hotmail.com http://uk.mc284.mail.yahoo.com/mc/compose?to=citra.ria2...@hotmail.com Hormat Ria
503 errors when services available and other connections succeed to same service during failure window
I just checked the archive and my original problem statement was not there - just the message with additional info. Since sending this message, I have upgraded to 1.4.8 and tried disabling poll, epoll and sepoll. Non worked. -Jerry -- Forwarded message -- From: Jerry Champlin j...@absolute-performance.com Date: Sun, Aug 1, 2010 at 4:16 AM Subject: 503 errors when services available and other connections succeed to same service during failure window To: haproxy@formilux.org Hello haproxy list: We are running haproxy on Ubuntu 10.04 LTS 32bit with the stock kernel and are experiencing an intermittent problem on the order of once per minute where a connection attempt to the backend servers is unable to be established and a 503 is sent to the end application or user browser. While the problem is happening with a given connection, other connections to the same application are executing successfully on the backend servers from both the perspective of haproxy and the backend applications. I am getting someone else to re-verify this, but we are not seeing the SYN packets without corresponding SYN/ACK to establish the connection to the backend servers and suspect that the packets are not being sent despite the system calls to the contrary or that we are looking for the wrong part of the conversation. We've been trying to track down the source of these problems for a few days, any assistance will be greatly appreciated. -Jerry Jul 31 03:36:45 localhost haproxy[24182]: 172.17.48.32:60377 [31/Jul/2010:03:36:35.160] ssbe-core internal_core/pn1 0/10067/-1/-1/10068 503 212 - - SC-- 16/16/15/7/+10 0/0 POST /observations HTTP/1.1 Jul 31 03:36:45 localhost haproxy[24182]: 172.17.48.32:60379 [31/Jul/2010:03:36:35.252] ssbe-core internal_core/pn2 0/10084/-1/-1/10085 503 212 - - SC-- 15/15/14/7/+10 0/0 GET /clients/H206923/hosts HTTP/1.1 *Jul 31 03:36:55 localhost haproxy[24182]: 172.17.48.31:44757 [31/Jul/2010:03:36:45.498] ssbe-core internal_core/pn1 0/10089/-1/-1/10090 503 212 - - SC-- 15/15/13/7/+10 0/0 GET /hosts/45/metrics HTTP/1.1* Strace usage lb1: ~ 16 - strace -p 24182 -t -e trace=network -o haproxy.strace Here are the relevant lines from the strace (we're only looking at the request for /hosts/45/metrics here, don't be fooled by the coincident 03:36:45 timestamps of the other two SC entries above). 03:36:45 accept(12, {sa_family=AF_INET, sin_port=htons(44757), sin_addr=inet_addr(172.17.48.31)}, [16]) = 34 03:36:45 setsockopt(34, SOL_TCP, TCP_NODELAY, [1], 4) = 0 03:36:45 recvfrom(34, GET /hosts/45/metrics HTTP/1.1\r\n..., 8192, MSG_NOSIGNAL, NULL, NULL) = 428 03:36:45 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 35 03:36:45 setsockopt(35, SOL_TCP, TCP_NODELAY, [1], 4) = 0 03:36:45 connect(35, {sa_family=AF_INET, sin_port=htons(44757), sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in progress) 03:36:45 sendto(35, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) 03:36:45 recvfrom(34, 0xb5f4b6, 15930, 16384, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 03:36:45 sendto(35, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused) 03:36:46 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 40 03:36:46 setsockopt(40, SOL_TCP, TCP_NODELAY, [1], 4) = 0 03:36:46 connect(40, {sa_family=AF_INET, sin_port=htons(44757), sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in progress) 03:36:46 sendto(40, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) 03:36:46 recvfrom(39, GET /clients/H205604/hosts HTTP/..., 8192, MSG_NOSIGNAL, NULL, NULL) = 893 03:36:46 sendto(40, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused) 03:36:47 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 39 03:36:47 setsockopt(39, SOL_TCP, TCP_NODELAY, [1], 4) = 0 03:36:47 connect(39, {sa_family=AF_INET, sin_port=htons(44757), sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in progress) 03:36:47 sendto(39, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) 03:36:47 sendto(39, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused) 03:36:48 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 57 03:36:48 setsockopt(57, SOL_TCP, TCP_NODELAY, [1], 4) = 0 03:36:48 connect(57, {sa_family=AF_INET, sin_port=htons(44757), sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in progress) 03:36:48 sendto(57, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) 03:36:48 sendto(57, GET /hosts/45/metrics HTTP/1.1\r\n..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1
Re: List of sites using haproxy ?
We use haproxy to handle web traffic for peruvian news sites: elcomercio.pe, peru21.pe, etc. Around 2k session rate in peak hours. Haproxy is the most reliable part of our architecture. Héctor Paz Systems Administrator Grupo El Comercio On Wed, Jul 28, 2010 at 11:42 AM, Angelo Höngens a.hong...@netmatch.nl wrote: We use it for www.zoover.nl, www.weeronline.nl, www.snp.nl, etc. (in combination with either squid or varnish caches though). --