503 errors when services available and other connections succeed to same service during failure window

2010-08-01 Thread Jerry Champlin
Hello haproxy list:

We are running haproxy on Ubuntu 10.04 LTS 32bit with the stock kernel and
are experiencing an intermittent problem on the order of once per minute
where a connection attempt to the backend servers is unable to be
established and a 503 is sent to the end application or user browser.  While
the problem is happening with a given connection, other connections to the
same application are executing successfully on the backend servers from both
the perspective of haproxy and the backend applications.  I am getting
someone else to re-verify this, but we are not seeing the SYN packets
without corresponding SYN/ACK to establish the connection to the backend
servers and suspect that the packets are not being sent despite the system
calls to the contrary or that we are looking for the wrong part of the
conversation.  We've been trying to track down the source of these problems
for a few days, any assistance will be greatly appreciated.

-Jerry

Jul 31 03:36:45 localhost haproxy[24182]: 172.17.48.32:60377
[31/Jul/2010:03:36:35.160]
ssbe-core internal_core/pn1 0/10067/-1/-1/10068 503 212 - - SC--
16/16/15/7/+10 0/0 POST /observations HTTP/1.1
Jul 31 03:36:45 localhost haproxy[24182]: 172.17.48.32:60379
[31/Jul/2010:03:36:35.252]
ssbe-core internal_core/pn2 0/10084/-1/-1/10085 503 212 - - SC--
15/15/14/7/+10 0/0 GET /clients/H206923/hosts HTTP/1.1
*Jul 31 03:36:55 localhost haproxy[24182]: 172.17.48.31:44757
[31/Jul/2010:03:36:45.498]
ssbe-core internal_core/pn1 0/10089/-1/-1/10090 503 212 - - SC--
15/15/13/7/+10 0/0 GET /hosts/45/metrics HTTP/1.1*

Strace usage
lb1: ~ 16 - strace -p 24182 -t -e trace=network -o haproxy.strace

Here are the relevant lines from the strace (we're only looking at the
request for /hosts/45/metrics here, don't be fooled by the coincident
03:36:45 timestamps of the other two SC entries above).

03:36:45 accept(12, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr(172.17.48.31)}, [16]) = 34
03:36:45 setsockopt(34, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:45 recvfrom(34, GET /hosts/45/metrics HTTP/1.1\r\n..., 8192,
MSG_NOSIGNAL, NULL, NULL) = 428

03:36:45 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 35
03:36:45 setsockopt(35, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:45 connect(35, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in
progress)
03:36:45 sendto(35, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily
unavailable)
03:36:45 recvfrom(34, 0xb5f4b6, 15930, 16384, 0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
03:36:45 sendto(35, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)

03:36:46 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 40
03:36:46 setsockopt(40, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:46 connect(40, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in
progress)
03:36:46 sendto(40, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily
unavailable)
03:36:46 recvfrom(39, GET /clients/H205604/hosts HTTP/..., 8192,
MSG_NOSIGNAL, NULL, NULL) = 893
03:36:46 sendto(40, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)

03:36:47 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 39
03:36:47 setsockopt(39, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:47 connect(39, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in
progress)
03:36:47 sendto(39, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily
unavailable)
03:36:47 sendto(39, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)

03:36:48 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 57
03:36:48 setsockopt(57, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:48 connect(57, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in
progress)
03:36:48 sendto(57, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily
unavailable)
03:36:48 sendto(57, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)

[ repeat connection attempt once per second for 10 total seconds ...]

03:36:55 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 23
03:36:55 setsockopt(23, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:55 connect(23, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr(172.17.48.31)}, 16) = -1 EINPROGRESS (Operation now in
progress)
03:36:55 sendto(23, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily
unavailable)

Re: HAProxy, Apache, Ruby Server Timeout Errors (sH)

2010-08-01 Thread Phil Dupont
Willy,

You rock.  Thanks for the help.

Good news is that it shows that apache is genuinely taking more then 50
seconds to respond (see my example below).  What's interesting is that when
I look at the ruby production log... I see about a convenient 50 second gap.

That said, it's clear that it's a communication issue between Apache and
Ruby.  Definitely not an issue with HAProxy.

I'll dig in and ferret out the issue, but I wanted to close the loop.

Thanks again,
PhilD


--- Log Entries ---

LB01 - HAProxy Error Log (/var/log/haproxy.err):


Jul 31 19:10:31 localhost haproxy[17404]: 72.148.201.189:50318
[31/Jul/2010:19:09:41.950]
http_proxy primary_farm/WS01 10/0/0/-1/50014 504 194 - - sH-- 0/0/0/0/0 0/0
GET
/tidbits.json?API=5Tb44zB-OfBWAyIQYOIOversion=2.5platform_id=631daac59543ed898fb00713ccc4523377af38ffcity_short_name=atlantastate=GA
HTTP/1.1



WS01 - Apache Log (/var/log/apache2/access.log):

72.148.201.189 - - [31/Jul/2010:19:09:41 -0400] GET
/tidbits.json?API=5Tb44zB-OfBWAyIQYOIOversion=2.5platform_id=631daac59543ed898fb00713ccc4523377af38ffcity_short_name=atlantastate=GA
HTTP/1.1 200 0 - Appcelerator Titanium/1.3.3 (iPhone/4.0.1; iPhone OS;
en_US;) *50042434 *



WS01-VanSkeevy - Request-Log-Analyzer - Ruby Production Log:

Jul 31 19:09:41 WS01 rails[23423]: Completed in 111ms (View: 74, DB: 29) |
200 OK [
http://api.scoutmob.com/unexpired/f2ad0a459da29b4d83c86e8134a2315ff2364a884a27e26dee74f374704c0335.json?API=5Tb44zB-OfBWAyIQYOIOversion=2.5platform_id=9f1f94e32ff79c816549263979cc2fccc7c655dacity_short_name=atlantastate=GA
]

Jul 31 19:10:31 WS01 rails[23423]: Processing DealsController#unexpired to
json (for 72.148.201.189 at 2010-07-31 19:10:31) [GET]


PhilD

http://www.scoutmob.com/*Phil Dupont*
p: 404-273-7079
e: ph...@scoutmob.com

In walking, just walk. In sitting, just sit. Above all, don't wobble.
-Yun-Men




On Fri, Jul 30, 2010 at 5:28 PM, Phil Dupont ph...@scoutmob.com wrote:

 Thanks Willy.  When things quiet down this weekend I'll make some changes
 to capture the data in the logs and see what that turns up.

 PhilD


 On Fri, Jul 30, 2010 at 5:07 PM, Willy Tarreau w...@1wt.eu wrote:

 On Fri, Jul 30, 2010 at 04:33:16PM -0400, Phil Dupont wrote:
  I guess that's my point... I don't know where else to check.  There's
  nothing between the HAProxy box and the webservers but a switch.
  Certainly
  nothing I can get logs from.
 
  If I drill into an example, I end up with nothing.  See below for the
 logs
  of one scenario I tried to trace.  But essentially:
 
 - I see an HAProxy 504 error for someone trying to load
 http://scoutmob.com/new-york/howitworks
 - Then I check the apache access logs (not the error logs because the
 error logs don't contain the user's IP addess, just the proxy's IP
 address)
 and see the connection was successful (passed a 200)

 Unfortunately there aren't any timer in your apache logs. What you see is
 that apache correctly received the request and correctly responded. But
 you don't know how much time it took for that. If it takes more than 50s
 you have configured on your haproxy as the server timeout, the response
 will never get back to the client. From the doc below, it seems it's %D
 that you need to add to CustomLog to add the timer in the logs :

http://httpd.apache.org/docs/2.2/mod/mod_log_config.html

 - Then I run request-log-analyzer against the ruby log and see that
 the
 method that creates the /new-york/howitworks takes a max of .01s to
 load.

 Most often when you see a timeout between two levels of proxies and a ruby
 server, it's because it took ages to the ruby server to accept the request
 (for whatever reasons, starting with mongrel's 1-connection limit). Thus,
 while the second proxy waits for the connection to establish, the first
 one
 waits for the second proxy to respond, as it does not know it can't even
 connect. Sometimes even the second proxy can connect but its request
 remains
 for ages in a queue, waiting for being dispatched. Your ruby logs just
 indicate it took 10ms to process the request once it got there. There's a
 similar problem with some Java-based application servers that make use of
 thread pools with a front dispatched using a deep queue.

 But if you look below, there's no doubt it really took 50 seconds to
 timeout :

  Jul 29 20:21:47 localhost haproxy[17405]: 
  68.174.74.20:55350[29/Jul/2010:20:20:57.478]
 http_proxy primary_farm/WS01 4/0/1/-1/50006 504 194 - - sH-- 14/14/14/13/0
 0/0 GET /new-york/howitworks HTTP/1.1

 Here, all we know is that apache received the request immediately after
 haproxy,
 at the same second :

  68.174.74.20 - - [29/Jul/2010:20:20:57 -0400] GET /new-york/howitworks
 HTTP/1.1 200 0 http://scoutmob.com/new-york/deal; Mozilla/5.0
 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/533.4 (KHTML, like
 Gecko) Chrome/5.0.375.99 Safari/533.4

 But we don't know when it got the response. Also, have you noticed how the
 response was empty ? 

Re: 503 errors when services available and other connections succeed to same service during failure window

2010-08-01 Thread Jerry Champlin
To rule out bug fixes since 1.3.22, I upgraded to the latest stable release
and the problem persists.  Is it possible that the switching to epoll will
fix this problem?

-Jerry

r...@lb1:/opt/haproxy/share/man# /opt/haproxy/sbin/haproxy -vv
HA-Proxy version 1.4.8 2010/06/16
Copyright 2000-2010 Willy Tarreau w...@1wt.eu

Build options :
  TARGET  = linux26
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g
  OPTIONS = USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes

Available polling systems :
 sepoll : pref=400,  test result OK
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 4 (4 usable), will use sepoll.

Jerry Champlin
Absolute Performance Inc.
--
Enabling businesses to deliver critical applications at lower cost and
higher value to their customers.


BALASAN

2010-08-01 Thread Ria Citra
Dear,  Perkenalkan nama saya Ria Citra Saya ringgal di London namun berasal 
dari Indonesia . Saya bertemu seorang bankir. Mr Clement berniat untuk datang 
berinvestasi di Indonesia.Dia menasihati saya untuk mencari seorang teman di 
Indonesia yang dapat membantu dia berinvestasi. Apakaha Anda bisa membantu Mr 
Clement?   Mr Clement sendiri adalah seorang Direktur Executive yang mengurus 
pengkreditan dan Nota Luar Negeri di Bank Of Citibank (UK) PLC. Beralamatkan di 
 11 Earl Grey Street Edinburgh EH3 9B. Dia ingin mentransfer uang sebanyak 
5,000,000.00 pounds dan menggunakannya untuk berinvestasi di Indonesia .  
Apabila anda berminat membantu, mohon anda bisa membantu juga dalam:  1. 
Membantu dalam hal transfer dana yang dimaksud 2. Memberikan arahan dalam 
hal-hal yang patut diinvestasikan 3. Membantu membeli harta perumahan di 
Indonesia   Apabila anda berniat untuk membantu dia dalam tranksaksi ini, anda 
diberikan 30% dari jumlah dana 5,
 000,000.00 Pounds. Mr Clement akan sangat senang apabila anda bisa membantu 
dengan memberikan NOMOR TELEPON anda. Ini supaya Mr Clement bisa memberikan 
informasi lengkap tentang cara, prosedur dan berbagai macam hal yang 
berhubungan agar dana ini bisa ditransfer atau tersedia untuk anda.   Saya 
menunggu balasan dan reaksi positif anda. Apabila anda BERMINAT membantu saya 
dan Mr.Clement, silahkan memberikan  BALASAN BERUPA EMAIL: KE 
citra.ria2...@hotmail.com 
http://uk.mc284.mail.yahoo.com/mc/compose?to=citra.ria2...@hotmail.com   
Hormat Ria






503 errors when services available and other connections succeed to same service during failure window

2010-08-01 Thread Jerry Champlin
I just checked the archive and my original problem statement was not there -
just the message with additional info.

Since sending this message, I have upgraded to 1.4.8 and tried disabling
poll, epoll and sepoll.  Non worked.

-Jerry

-- Forwarded message --
From: Jerry Champlin j...@absolute-performance.com
Date: Sun, Aug 1, 2010 at 4:16 AM
Subject: 503 errors when services available and other connections succeed to
same service during failure window
To: haproxy@formilux.org


Hello haproxy list:

We are running haproxy on Ubuntu 10.04 LTS 32bit with the stock kernel and
are experiencing an intermittent problem on the order of once per minute
where a connection attempt to the backend servers is unable to be
established and a 503 is sent to the end application or user browser.  While
the problem is happening with a given connection, other connections to the
same application are executing successfully on the backend servers from both
the perspective of haproxy and the backend applications.  I am getting
someone else to re-verify this, but we are not seeing the SYN packets
without corresponding SYN/ACK to establish the connection to the backend
servers and suspect that the packets are not being sent despite the system
calls to the contrary or that we are looking for the wrong part of the
conversation.  We've been trying to track down the source of these problems
for a few days, any assistance will be greatly appreciated.

-Jerry

Jul 31 03:36:45 localhost haproxy[24182]: 172.17.48.32:60377
[31/Jul/2010:03:36:35.160]
ssbe-core internal_core/pn1 0/10067/-1/-1/10068 503 212 - - SC--
16/16/15/7/+10 0/0 POST /observations HTTP/1.1
Jul 31 03:36:45 localhost haproxy[24182]: 172.17.48.32:60379
[31/Jul/2010:03:36:35.252]
ssbe-core internal_core/pn2 0/10084/-1/-1/10085 503 212 - - SC--
15/15/14/7/+10 0/0 GET /clients/H206923/hosts HTTP/1.1
*Jul 31 03:36:55 localhost haproxy[24182]: 172.17.48.31:44757
[31/Jul/2010:03:36:45.498]
ssbe-core internal_core/pn1 0/10089/-1/-1/10090 503 212 - - SC--
15/15/13/7/+10 0/0 GET /hosts/45/metrics HTTP/1.1*

Strace usage
lb1: ~ 16 - strace -p 24182 -t -e trace=network -o haproxy.strace

Here are the relevant lines from the strace (we're only looking at the
request for /hosts/45/metrics here, don't be fooled by the coincident
03:36:45 timestamps of the other two SC entries above).

03:36:45 accept(12, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr(172.17.48.31)}, [16]) = 34
03:36:45 setsockopt(34, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:45 recvfrom(34, GET /hosts/45/metrics HTTP/1.1\r\n..., 8192,
MSG_NOSIGNAL, NULL, NULL) = 428

03:36:45 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 35
03:36:45 setsockopt(35, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:45 connect(35, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in
progress)
03:36:45 sendto(35, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily
unavailable)
03:36:45 recvfrom(34, 0xb5f4b6, 15930, 16384, 0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
03:36:45 sendto(35, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)

03:36:46 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 40
03:36:46 setsockopt(40, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:46 connect(40, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in
progress)
03:36:46 sendto(40, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily
unavailable)
03:36:46 recvfrom(39, GET /clients/H205604/hosts HTTP/..., 8192,
MSG_NOSIGNAL, NULL, NULL) = 893
03:36:46 sendto(40, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)

03:36:47 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 39
03:36:47 setsockopt(39, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:47 connect(39, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in
progress)
03:36:47 sendto(39, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily
unavailable)
03:36:47 sendto(39, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)

03:36:48 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 57
03:36:48 setsockopt(57, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:48 connect(57, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr(172.17.48.32)}, 16) = -1 EINPROGRESS (Operation now in
progress)
03:36:48 sendto(57, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily
unavailable)
03:36:48 sendto(57, GET /hosts/45/metrics HTTP/1.1\r\n..., 454,
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 

Re: List of sites using haproxy ?

2010-08-01 Thread Hector Danniel Paz Trillo
We use haproxy to handle web traffic for peruvian news sites:
elcomercio.pe, peru21.pe, etc. Around 2k session rate in peak hours.
Haproxy is the most reliable part of our architecture.

Héctor Paz
Systems Administrator
Grupo El Comercio

On Wed, Jul 28, 2010 at 11:42 AM, Angelo Höngens a.hong...@netmatch.nl wrote:
 We use it for www.zoover.nl, www.weeronline.nl, www.snp.nl, etc. (in 
 combination with either squid or varnish caches though).

 --