Re: SNMP Perl script with Centos 6.0

2011-09-13 Thread Willy Tarreau
On Tue, Sep 13, 2011 at 10:40:18AM +1000, Dwyer, Simon wrote:
 Issue resolved.  I thought i had already turn selinux to permissive.  
 Apparently not :)

Wow, good to know. Thanks for the feedback on this issue, apparently
nobody was able to bring any idea on this point, now we have it in the
ML's archives !

Cheers,
Willy




Re: Stress test

2011-09-13 Thread Willy Tarreau
On Tue, Sep 13, 2011 at 03:13:11PM +1000, Dwyer, Simon wrote:
 Cheers,
 
 I will have a look at ab.  I more just want to make sure it doesnt crash and 
 burn while its in test.  doing more of a proof of concept atm :)

BTW, you must use different machines for the client, the LB and the
server in your tests. Otherwise you'll see very strange patterns
because all of them will fight for CPU and your numbers may vary
*a lot*.

I'd suggest looking at httperf, though it's harder to use than ab.
There is the old inject on my web page which has the advantage of
reporting measurements in realtime instead of running a blind tests
and giving you numbers at the end without letting you know if the
load was regular or not. It does no keep-alive and doesn't scale
well in concurrent connections however.

On the server side, you should probably use something like nginx,
which will be much faster than apache. Apache generally is the
bottleneck when used in a benchmark platform.

Regards,
Willy




Re: Problems with load balancing on cloud servers

2011-09-13 Thread Willy Tarreau
Hi,

On Tue, Sep 13, 2011 at 11:02:26AM +0800, Liong Kok Foo wrote:
 Top for server 70 (load problem)
 top - 10:51:23 up 32 days, 22:21,  1 user,  load average: 3.09, 2.99, 2.50
 Tasks: 115 total,   3 running, 112 sleeping,   0 stopped,   0 zombie
 Cpu(s): 38.5%us, 11.0%sy,  0.0%ni, 48.2%id,  0.0%wa,  0.0%hi,  2.3%si,  
 0.0%st
 Mem:   205k total,  1049708k used,  1000292k free,   264264k buffers
 Swap:  1052248k total,  876k used,  1051372k free,   418272k cached
 
 Sometimes server B's load will shoot up to 20 or more while server A 
 (and the rest remain at around 5).
 
 Would really appreciate any input on this matter.

When you look at the stats, you notice that there is much a higher
retransmit number than for other servers. This almost always translates
to connectivity issues. And if there are connectivity issues, then the
server has more difficulties pushing out responses to the clients and
gets more concurrent processes than the other ones, leading to a higher
load and memory usage.

You should run tests between this server and another one : transfer a
large file (500 MB) several time. You should reach the gbps (118 MB/s).
Do this in different directions. Often you'll notice that one direction
is approximately OK while the other one is terrible. Do this with other
reference servers that work well and if you note that communications
with this server are the only ones affected, then ask the provider to
replace it. Sometimes it's just a cable issue. Sometimes it's switch
port, sometimes it's a NIC. But those issues are quite common in
datacenters.

You can also look at the network statistics :

   $ netstat -s|grep retrans

I suspect that you'll notice more retransmits on this one than on the
other servers. Be careful, those stats are from the last boot, so you
have to take uptime into account.

Regards,
Willy




http_req_first

2011-09-13 Thread Hank A. Paulson

can you provide some valid examples of using http_req_first,
acl aclX http_req_first
or
use_backend beX if http_req_first
does not seem to work for me in 1.4.17
Thanks.



Establishing connection lasts long

2011-09-13 Thread Tim Korves

Hi there,

we're using haproxy 1.4.15 on a Ubuntu 10.04 box. This box is 
virtualised, HW-specs: 1 CPU-core (Xeon 2.00GHz), 512MB RAM, 2x 1GBit 
virtual LAN (these are also two different physical NICs in the HV).


Now we've got the problem, that the initial connect through haproxy 
seems to be delayed. The HTTP-Servers behind haproxy are physical one's 
and they seem to deliver the page quite a lot faster directly then using 
haproxy in front.


Any ideas or recommendations on checking haproxy to be not the source 
of the delay?


Regards, Tim

--
Tim Korves
Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service: serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS: d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net



Re: Establishing connection lasts long

2011-09-13 Thread Christophe Rahier
Hi,

I noticed the same thing, the problem happens at the first call of the
page,


After the result is immediate.


Christophe


Le 13/09/11 13:22, « Tim Korves » t...@whtec.net a écrit :

Hi there,

we're using haproxy 1.4.15 on a Ubuntu 10.04 box. This box is
virtualised, HW-specs: 1 CPU-core (Xeon 2.00GHz), 512MB RAM, 2x 1GBit
virtual LAN (these are also two different physical NICs in the HV).

Now we've got the problem, that the initial connect through haproxy
seems to be delayed. The HTTP-Servers behind haproxy are physical one's
and they seem to deliver the page quite a lot faster directly then using
haproxy in front.

Any ideas or recommendations on checking haproxy to be not the source
of the delay?

Regards, Tim

-- 
Tim Korves
Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service: serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS: d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net







Re: Establishing connection lasts long

2011-09-13 Thread Tim Korves

Hi again,

I noticed the same thing, the problem happens at the first call of 
the

page,


Ok, seems to be a bug? Or what do you think?


After the result is immediate.


I can confirm that.

Any idea?

Thanks, Tim




Hi there,

we're using haproxy 1.4.15 on a Ubuntu 10.04 box. This box is
virtualised, HW-specs: 1 CPU-core (Xeon 2.00GHz), 512MB RAM, 2x 1GBit
virtual LAN (these are also two different physical NICs in the HV).

Now we've got the problem, that the initial connect through haproxy
seems to be delayed. The HTTP-Servers behind haproxy are physical 
one's
and they seem to deliver the page quite a lot faster directly then 
using

haproxy in front.

Any ideas or recommendations on checking haproxy to be not the source
of the delay?

Regards, Tim

--
Tim Korves
Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service: serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS: d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net




--
Tim Korves
Inhaber / Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service: serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS: d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net



Re: Establishing connection lasts long

2011-09-13 Thread Christophe Rahier
Hi,

I don't know!

It's very strange. When I check the server load, it is almost zero.

Christophe 



Le 13/09/11 13:29, « Tim Korves » t...@whtec.net a écrit :

Hi again,

 I noticed the same thing, the problem happens at the first call of
 the
 page,

Ok, seems to be a bug? Or what do you think?

 After the result is immediate.

I can confirm that.

Any idea?

Thanks, Tim


Hi there,

we're using haproxy 1.4.15 on a Ubuntu 10.04 box. This box is
virtualised, HW-specs: 1 CPU-core (Xeon 2.00GHz), 512MB RAM, 2x 1GBit
virtual LAN (these are also two different physical NICs in the HV).

Now we've got the problem, that the initial connect through haproxy
seems to be delayed. The HTTP-Servers behind haproxy are physical
 one's
and they seem to deliver the page quite a lot faster directly then
 using
haproxy in front.

Any ideas or recommendations on checking haproxy to be not the source
of the delay?

Regards, Tim

--
Tim Korves
Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service: serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS: d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net



-- 
Tim Korves
Inhaber / Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service: serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS: d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net







Re: Establishing connection lasts long

2011-09-13 Thread Tim Korves

Hi,


It's very strange. When I check the server load, it is almost zero.


same here... Anyone got information about such an issue?

Regards, Tim


Hi again,


I noticed the same thing, the problem happens at the first call of
the
page,


Ok, seems to be a bug? Or what do you think?


After the result is immediate.


I can confirm that.

Any idea?

Thanks, Tim




Hi there,

we're using haproxy 1.4.15 on a Ubuntu 10.04 box. This box is
virtualised, HW-specs: 1 CPU-core (Xeon 2.00GHz), 512MB RAM, 2x 
1GBit

virtual LAN (these are also two different physical NICs in the HV).

Now we've got the problem, that the initial connect through haproxy
seems to be delayed. The HTTP-Servers behind haproxy are physical
one's
and they seem to deliver the page quite a lot faster directly then
using
haproxy in front.

Any ideas or recommendations on checking haproxy to be not the 
source

of the delay?

Regards, Tim

--
Tim Korves
Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service: serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS: d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net




--
Tim Korves
Inhaber / Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service: serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS: d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net




--
Tim Korves
Inhaber / Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service: serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS: d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net



haproxy / Python 'tornado' framework - digging into 502/504 errors

2011-09-13 Thread Alex Davies
Hi,

I am not a haproxy expert, but have been using it in production for some
time with excellent results and I wonder if I can seek some expert advice on
running the fairly fast application server http://www.tornadoweb.org/ behind
HAproxy (haproxy-1.3.23 using the EPEL RPM (-1) on RHEL6 x86_64). Haproxy is
working very well for me, but i'm looking for help understanding how I can
diagnose problems with the Tornado application I have running behind it.

I have ~8 tornado processes running on two servers. Its important that one
is active and the other is failover (some state is stored in memory). The
parts of my haproxy configuration relevant to my question are below. I
notice a large number of entries in the logs like this:

502 errors:
Sep 13 12:42:45 localhost haproxy[15128]:
188.222.50.208:61001[13/Sep/2011:12:42:43.881] main
python_8001/python_8001_fe1 10/0/0/-1/1527
502 204 - - SH-- 6676/6676/2082/2082/0 0/0 POST /xxx/chat/status/updates
HTTP/1.1
Sep 13 12:42:45 localhost haproxy[15128]:
81.246.46.162:29456[13/Sep/2011:12:42:14.289] main
python_8001/python_8001_fe1 28/0/0/-1/31118
502 204 - - SH-- 6675/6675/2081/2081/0 0/0 POST /xxx/chat/status/updates
HTTP/1.1

504 errors:
Sep 13 12:43:08 localhost haproxy[15128]:
180.234.122.248:52888[13/Sep/2011:12:38:08.822] main
python_9004/python_9004_fe1 45/0/0/-1/300045
504 194 - - sH-- 6607/6607/697/697/0 0/0 POST /xxx/chat/message/4/updates
HTTP/1.1
Sep 13 12:43:09 localhost haproxy[15128]:
82.26.136.198:61758[13/Sep/2011:12:38:09.071] main
python_8001/python_8001_fe1 19/0/0/-1/300020
504 194 - - sH-- 6569/6569/2085/2085/0 0/0 POST /xxx/chat/status/updates
HTTP/1.1

It seems to me that all of these involve 0 seconds waiting in a queue, 0
seconds to make a connection to the final app server and then a aborted
connection to the app server before a complete response could be received
The total time in milliseconds between accept and last close seems to be
~300 seconds for most of the requests (although far from all of them, as the
first entry shows). If I *restart* (not reload) haproxy, I still get lines
with the fifth of these numbers (Tt in the docs) as ~300,000 (the timeout
server value in the config at the moment I copied the logs above), a few
seconds after the haproxy process starts. I also get lots that seem to end
on almost exactly 300k even when I change both timeout client and timeout
server to very different numbers. It is possible that the application
(jQuery) has a 300s timeout hardcoded, but in any case I do not understand
why the haproxy logs show connections with a connection of 300k failing when
I stop haproxy, increase timeout server by a order of magnitude and start
it again.

Looking at the next part of the log entries it seems that bytes_read is
always 204 for 504 errors and 194 for 504 errors. This does seem to be a
fairly regular pattern:
[root@frontend2 log]# cat /var/log/haproxy.log | grep  504 194 | wc -l
1975
[root@frontend2 log]# cat /var/log/haproxy.log | grep  502 204 | wc -l
12401
[root@frontend2 log]# cat /var/log/haproxy.log | wc -l
18721

 My second question is how do I find out exactly what is being returned
(easily), i.e. what are those 194/204 bytes? This might give me a hint as to
what is going wrong or timing out on the application server. I guess I could
try to tcpdump but I might struggle to actually filter down the correct data
(there are large numbers of successful connections going on)

The next part of the logs are most interesting; ignoring the two cookie
fields we see that the the server-side timeout expired for the 504 errors
and the TCP session was unexpectedly aborted by the server, or the server
explicitly refused it in the case of the 502. Subject to my questions above
I have a theory that the 504s are caused by the long-pooling application,
but I do not understand why in the case of the 502 haproxy is not retrying
the TCP connection before returning a 502 - I thought that the option
redispatch and retries 10 would ensure another go.

If anybody is able to shed some thoughts on my two questions I would be very
grateful!

Many thanks,

Alex

# haproxy.conf
global
log 127.0.0.1 local4 debug

chroot  /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 5
userhaproxy
group   haproxy
daemon

defaults
mode  http
optionhttplog
#option tcplog
optiondontlognull
optiondontlog-normal

log   global
retries   10
maxconn   5
timeout connect   20
contimeout20
clitimeout9
option forwardfor except 127.0.0.1/32 # Apache running https on
localhost
option httpclose # Required for REMOTE HEADER
option redispatch
timeout connect 1
timeout client  30
timeout server  30

frontend  main *:80
acl url_py_8001   path_beg   -i /url1
acl url_py_8002   path_beg   -i 

Re: haproxy / Python 'tornado' framework - digging into 502/504 errors

2011-09-13 Thread Cyril Bonté
Hi Alex,

Sorry I won't have time to help you now, but...

Le mardi 13 septembre 2011 14:26:04, Alex Davies a écrit :
 The total time in milliseconds between accept and last close seems to be
 ~300 seconds for most of the requests (although far from all of them, as the
 first entry shows). If I *restart* (not reload) haproxy, I still get lines
 with the fifth of these numbers (Tt in the docs) as ~300,000 (the timeout
 server value in the config at the moment I copied the logs above), a few
 seconds after the haproxy process starts. I also get lots that seem to end
 on almost exactly 300k even when I change both timeout client and timeout
 server to very different numbers.

Have you noticed that your configuration declared the same timeouts several 
times ?
The configuration mixed deprecated syntax and new keywords (clitimeout vs 
timeout client / contimeout vs timeout connect / srvtimeout vs timeout server).

If you tried to modify the srvtimeout and clitimeout values, that can explain 
why you still see those 300s timeouts.

 # haproxy.conf
 defaults
 timeout connect   20
 contimeout20
 clitimeout9
 option forwardfor except 127.0.0.1/32 # Apache running https on
 localhost
 option httpclose # Required for REMOTE HEADER
 option redispatch
 timeout connect 1
 timeout client  30
 timeout server  30

Latest values declared will apply : 
 timeout connect 1
 timeout client  30
 timeout server  30

Maybe this can help you for the next steps ;-)

-- 
Cyril Bonté



Re: Establishing connection lasts long

2011-09-13 Thread Baptiste
heh,
This has nothing to see with haproxy but more how your hypervisor
manages VMs which doesn't do anything :)

cheers

On Tue, Sep 13, 2011 at 1:35 PM, Tim Korves t...@whtec.net wrote:
 Hi,

 It's very strange. When I check the server load, it is almost zero.

 same here... Anyone got information about such an issue?

 Regards, Tim

 Hi again,

 I noticed the same thing, the problem happens at the first call of
 the
 page,

 Ok, seems to be a bug? Or what do you think?

 After the result is immediate.

 I can confirm that.

 Any idea?

 Thanks, Tim


 Hi there,

 we're using haproxy 1.4.15 on a Ubuntu 10.04 box. This box is
 virtualised, HW-specs: 1 CPU-core (Xeon 2.00GHz), 512MB RAM, 2x 1GBit
 virtual LAN (these are also two different physical NICs in the HV).

 Now we've got the problem, that the initial connect through haproxy
 seems to be delayed. The HTTP-Servers behind haproxy are physical
 one's
 and they seem to deliver the page quite a lot faster directly then
 using
 haproxy in front.

 Any ideas or recommendations on checking haproxy to be not the source
 of the delay?

 Regards, Tim

 --
 Tim Korves
 Administrator

 whTec
 Teutoburger Straße 309
 D-46119 Oberhausen
 Fon: +49 (40) 70 97 50 35 -0
 Fax: +49 (40) 70 97 50 35 -99
 SIP: t.kor...@fon.whtec.net

 ---

 Service:     serv...@whtec.net
 Buchhaltung: buchhalt...@whtec.net
 DNS:         d...@whtec.net

 ACHTUNG:
 Anfragen von BOS bitte über b...@whtec.net
 Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net



 --
 Tim Korves
 Inhaber / Administrator

 whTec
 Teutoburger Straße 309
 D-46119 Oberhausen
 Fon: +49 (40) 70 97 50 35 -0
 Fax: +49 (40) 70 97 50 35 -99
 SIP: t.kor...@fon.whtec.net

 ---

 Service:     serv...@whtec.net
 Buchhaltung: buchhalt...@whtec.net
 DNS:         d...@whtec.net

 ACHTUNG:
 Anfragen von BOS bitte über b...@whtec.net
 Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net



 --
 Tim Korves
 Inhaber / Administrator

 whTec
 Teutoburger Straße 309
 D-46119 Oberhausen
 Fon: +49 (40) 70 97 50 35 -0
 Fax: +49 (40) 70 97 50 35 -99
 SIP: t.kor...@fon.whtec.net

 ---

 Service:     serv...@whtec.net
 Buchhaltung: buchhalt...@whtec.net
 DNS:         d...@whtec.net

 ACHTUNG:
 Anfragen von BOS bitte über b...@whtec.net
 Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net





Re: Establishing connection lasts long

2011-09-13 Thread Tim Korves

Hi there,


This has nothing to see with haproxy but more how your hypervisor
manages VMs which doesn't do anything :)


thanks for the information. Do you have any tips regarding vmWare ESXi 
4.1 ?


Best wishes, Tim


Hi,


It's very strange. When I check the server load, it is almost zero.


same here... Anyone got information about such an issue?

Regards, Tim


Hi again,

I noticed the same thing, the problem happens at the first call 
of

the
page,


Ok, seems to be a bug? Or what do you think?


After the result is immediate.


I can confirm that.

Any idea?

Thanks, Tim




Hi there,

we're using haproxy 1.4.15 on a Ubuntu 10.04 box. This box is
virtualised, HW-specs: 1 CPU-core (Xeon 2.00GHz), 512MB RAM, 2x 
1GBit
virtual LAN (these are also two different physical NICs in the 
HV).


Now we've got the problem, that the initial connect through 
haproxy
seems to be delayed. The HTTP-Servers behind haproxy are 
physical

one's
and they seem to deliver the page quite a lot faster directly 
then

using
haproxy in front.

Any ideas or recommendations on checking haproxy to be not the 
source

of the delay?

Regards, Tim

--
Tim Korves
Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service:     serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS:         d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net




--
Tim Korves
Inhaber / Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service:     serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS:         d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net




--
Tim Korves
Inhaber / Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service:     serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS:         d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net




--
Tim Korves
Inhaber / Administrator

whTec
Teutoburger Straße 309
D-46119 Oberhausen
Fon: +49 (40) 70 97 50 35 -0
Fax: +49 (40) 70 97 50 35 -99
SIP: t.kor...@fon.whtec.net

---

Service: serv...@whtec.net
Buchhaltung: buchhalt...@whtec.net
DNS: d...@whtec.net

ACHTUNG:
Anfragen von BOS bitte über b...@whtec.net
Anfragen von NGOs (e.V., gGmbH etc.) bitte über n...@whtec.net



Re: haproxy / Python 'tornado' framework - digging into 502/504 errors

2011-09-13 Thread Alex Davies
Hi,

Thank you for your observation - indeed I did notice some of those as I was
writing my email - I have updated my globals to increase the server timeout
(as we are doing long polling) and reduce the others, and remove the
duplicates:

defaults
mode  http
optionhttplog
#option   tcplog
optiondontlognull
optiondontlog-normal

log   global
retries   10
maxconn   5
option forwardfor except 127.0.0.1/32 # Apache on https://127.0.0.1
option httpclose  # Required for REMOTE HEADER
option redispatch

timeout connect 1
timeout client  1
timeout server  720

I still notice the same errors in the logs! (slightly less 504, as I would
expect through the increase in timeout server - but I still don't
understand why I get any at all in the first minute of a new process).

Cheers,

Alex

On Tue, Sep 13, 2011 at 1:46 PM, Cyril Bonté cyril.bo...@free.fr wrote:

 clitimeout


haproxy / Python 'tornado' framework - digging into 502/504 errors

2011-09-13 Thread Alex Davies
Hi,

Thank you for your observation - indeed I did notice some of those as I was
writing my email - I have updated my globals to increase the server timeout
(as we are doing long polling) and reduce the others, and remove the
duplicates:

defaults
mode  http
optionhttplog
#option   tcplog
optiondontlognull
optiondontlog-normal

log   global
retries   10
maxconn   5
option forwardfor except 127.0.0.1/32 # Apache on https://127.0.0.1
 option httpclose  # Required for REMOTE HEADER
option redispatch

timeout connect 1
timeout client  1
timeout server  720

I still notice the same errors in the logs! (slightly less 504, as I would
expect through the increase in timeout server - but I still don't
understand why I get any at all in the first minute of a new process).

Cheers,

Alex

On Tue, Sep 13, 2011 at 1:46 PM, Cyril Bonté cyril.bo...@free.fr wrote:

 clitimeout


Re: haproxy / Python 'tornado' framework - digging into 502/504 errors

2011-09-13 Thread Cyril Bonté
Hi again Alex,

Le Mardi 13 Septembre 2011 13:26:04 Alex Davies a écrit :
 Hi,
 
 I am not a haproxy expert, but have been using it in production for some
 time with excellent results and I wonder if I can seek some expert advice on
 running the fairly fast application server http://www.tornadoweb.org/
 behind HAproxy (haproxy-1.3.23 using the EPEL RPM (-1) on RHEL6 x86_64).
 Haproxy is working very well for me, but i'm looking for help understanding
 how I can diagnose problems with the Tornado application I have running
 behind it.
 
 I have ~8 tornado processes running on two servers. Its important that one
 is active and the other is failover (some state is stored in memory). The
 parts of my haproxy configuration relevant to my question are below. I
 notice a large number of entries in the logs like this:
 
 502 errors:
 Sep 13 12:42:45 localhost haproxy[15128]:
 188.222.50.208:61001[13/Sep/2011:12:42:43.881] main
 python_8001/python_8001_fe1 10/0/0/-1/1527
 502 204 - - SH-- 6676/6676/2082/2082/0 0/0 POST /xxx/chat/status/updates
 HTTP/1.1
 Sep 13 12:42:45 localhost haproxy[15128]:
 81.246.46.162:29456[13/Sep/2011:12:42:14.289] main
 python_8001/python_8001_fe1 28/0/0/-1/31118
 502 204 - - SH-- 6675/6675/2081/2081/0 0/0 POST /xxx/chat/status/updates
 HTTP/1.1
 
 504 errors:
 Sep 13 12:43:08 localhost haproxy[15128]:
 180.234.122.248:52888[13/Sep/2011:12:38:08.822] main
 python_9004/python_9004_fe1 45/0/0/-1/300045
 504 194 - - sH-- 6607/6607/697/697/0 0/0 POST /xxx/chat/message/4/updates
 HTTP/1.1
 Sep 13 12:43:09 localhost haproxy[15128]:
 82.26.136.198:61758[13/Sep/2011:12:38:09.071] main
 python_8001/python_8001_fe1 19/0/0/-1/300020
 504 194 - - sH-- 6569/6569/2085/2085/0 0/0 POST /xxx/chat/status/updates
 HTTP/1.1
 
 It seems to me that all of these involve 0 seconds waiting in a queue, 0
 seconds to make a connection to the final app server and then a aborted
 connection to the app server before a complete response could be received
 The total time in milliseconds between accept and last close seems to be
 ~300 seconds for most of the requests (although far from all of them, as the
 first entry shows).

I wonder if you've not reached a limit on your tornado servers, for example 
the max number of open files. Do the servers go down when it happens (due to 
the check keyword that only performs a layer 4 check in your configuration) 
?

 If I *restart* (not reload) haproxy, I still get lines
 with the fifth of these numbers (Tt in the docs) as ~300,000 (the
 timeout server value in the config at the moment I copied the logs
 above), a few seconds after the haproxy process starts.

What you describe makes me think of a syslog asynchronous configuration. That 
could explain why you see logs after the restart.
When it happens, can you verify that the pid logged is really the pid of the 
new instance or if it's the old one ?

In your example, your instance has the pid 15128.

 I also get lots
 that seem to end on almost exactly 300k even when I change both timeout
 client and timeout server to very different numbers. It is possible that
 the application (jQuery) has a 300s timeout hardcoded, but in any case I do
 not understand why the haproxy logs show connections with a connection of
 300k failing when I stop haproxy, increase timeout server by a order of
 magnitude and start it again.
 
 Looking at the next part of the log entries it seems that bytes_read is
 always 204 for 504 errors and 194 for 504 errors. This does seem to be a
 fairly regular pattern:
 [root@frontend2 log]# cat /var/log/haproxy.log | grep  504 194 | wc -l
 1975
 [root@frontend2 log]# cat /var/log/haproxy.log | grep  502 204 | wc -l
 12401
 [root@frontend2 log]# cat /var/log/haproxy.log | wc -l
 18721
 
  My second question is how do I find out exactly what is being returned
 (easily), i.e. what are those 194/204 bytes? This might give me a hint as to
 what is going wrong or timing out on the application server.I guess I
 could try to tcpdump but I might struggle to actually filter down the
 correct data (there are large numbers of successful connections going on)

Those sizes correspond respectively to the 504 and 502 responses sent by 
haproxy.

[HTTP_ERR_502] =
HTTP/1.0 502 Bad Gateway\r\n
Cache-Control: no-cache\r\n
Connection: close\r\n
Content-Type: text/html\r\n
\r\n
htmlbodyh1502 Bad Gateway/h1\nThe server returned an invalid 
or 
incomplete response.\n/body/html\n,

[HTTP_ERR_504] =
HTTP/1.0 504 Gateway Time-out\r\n
Cache-Control: no-cache\r\n
Connection: close\r\n
Content-Type: text/html\r\n
\r\n
htmlbodyh1504 Gateway Time-out/h1\nThe server didn't respond 
in 
time.\n/body/html\n,

(you can find them in src/proto_http.c)

 The next part of the logs are most interesting; ignoring the two cookie
 fields we see that the the server-side timeout expired for the 504 errors
 and the TCP session was unexpectedly 

Re: [Proposal] Concurrency tuning by adding a limit to http-server-close

2011-09-13 Thread Cyril Bonté
Hi Willy,

A small update on this development.

Le Lundi 29 Août 2011 18:01:23 Willy Tarreau a écrit :
   If you're interested in doing this, I'd be glad to merge it and to
   provide help if needed. We need a struct list fe_idle in the
   struct
   proxy and add/remove idle connections there.
  
  Of course I'm interested. I can't promise I'll be available for it for
  the next days but I can start it shortly.
 
 Nice, thank you! Don't forget to take a rest, you're on holidays ;-)

First, I didn't forget to take a rest ;-)
More seriously, I've got a first version that looks to work quite well.
I couldn't raise maxconn keep-alive connections but maxconn - 1, due to 
the way haproxy pauses the listener when they are full or when the proxy is.

I still have to optimize some pathes added to resume the listeners when a 
connection goes back to the idle list. This is more true for proxies that have 
lots of listeners. I don't know such configurations but maybe you've already 
met some ;-)

During a test today, I had 2 minutes of panic due to an unexplained segfault 
but gdb quickly reminded me that I recompiled the sources with DEBUG_DEV 
enabled ! Except that, it never crashed.

-- 
Cyril Bonté



RE: SNMP Perl script with Centos 6.0

2011-09-13 Thread Dwyer, Simon
Not a problem Willy.

I should also note in that case that the initial error error on subcontainer 
'ia_addr' insert (-1)  Is due to the face i am using keepalived and i have an 
ip address without an interface.  I believe its a bug in the snmp libraries for 
centos.

Cheers,

Simon :)

From: Willy Tarreau [w...@1wt.eu]
Sent: Tuesday, September 13, 2011 4:27 PM
To: Dwyer, Simon
Cc: haproxy@formilux.org
Subject: Re: SNMP Perl script with Centos 6.0

On Tue, Sep 13, 2011 at 10:40:18AM +1000, Dwyer, Simon wrote:
 Issue resolved.  I thought i had already turn selinux to permissive.  
 Apparently not :)

Wow, good to know. Thanks for the feedback on this issue, apparently
nobody was able to bring any idea on this point, now we have it in the
ML's archives !

Cheers,
Willy





Re: [Proposal] Concurrency tuning by adding a limit to http-server-close

2011-09-13 Thread Willy Tarreau
Hi Cyril,

On Tue, Sep 13, 2011 at 10:13:14PM +0200, Cyril Bonté wrote:
 More seriously, I've got a first version that looks to work quite well.
 I couldn't raise maxconn keep-alive connections but maxconn - 1, due to 
 the way haproxy pauses the listener when they are full or when the proxy is.

I don't know if you have checked 1.5-dev7, there's a function there to wake
up the listeners that are waiting for maxconn to be OK again. It's already
used to apply maxconn without burning CPU cycles. And I don't see there
anything that would prevent you to use up to maxconn connections ; in
fact, I even ran some tests with maxconn 1 to check that it worked fine :-)

 I still have to optimize some pathes added to resume the listeners when a 
 connection goes back to the idle list. This is more true for proxies that 
 have 
 lots of listeners. I don't know such configurations but maybe you've already 
 met some ;-)

Check in session.c:process_session, you have this :

   if (s-listener-state == LI_FULL)
resume_listener(s-listener);

I think you can make use of this for your code. Since listeners are
individually full or not full, you don't need to scan the whole listeners
anymore. BTW, the worst config I have ever seen was someone binding to a
*large* port range. This means tens of thousands of listeners...

 During a test today, I had 2 minutes of panic due to an unexplained segfault 
 but gdb quickly reminded me that I recompiled the sources with DEBUG_DEV 
 enabled !

I think I remember about a recent patch from Simon to fix some breakage
in DEBUG_DEV, so 1.5-dev7 might be OK. But I've not used DEBUG_DEV for a
long time now and I don't know what shape it's in.

 Except that, it never crashed.

Fine !

Cheers,
Willy




Re: haproxy / Python 'tornado' framework - digging into 502/504 errors

2011-09-13 Thread Willy Tarreau
Hi Alex,

On Tue, Sep 13, 2011 at 03:18:54PM +0100, Alex Davies wrote:
 Hi,
 
 Thank you for your observation - indeed I did notice some of those as I was
 writing my email - I have updated my globals to increase the server timeout
 (as we are doing long polling) and reduce the others, and remove the
 duplicates:
 
 defaults
 mode  http
 optionhttplog
 #option   tcplog
 optiondontlognull
 optiondontlog-normal
 
 log   global
 retries   10
 maxconn   5
 option forwardfor except 127.0.0.1/32 # Apache on https://127.0.0.1
  option httpclose  # Required for REMOTE HEADER
 option redispatch
 
 timeout connect 1
 timeout client  1
 timeout server  720
 
 I still notice the same errors in the logs! (slightly less 504, as I would
 expect through the increase in timeout server - but I still don't
 understand why I get any at all in the first minute of a new process).

To complete Cyril's detailed analysis, I'd like to add that you'll only
see 502s when you restart, and it will take some time before you see 504s
again (eg: 2 hours with the config above).

The 502s mean that the server has suddenly aborted the connection (flags SH),
while the 504s indicate that haproxy was fed up with waiting and closed after
timeout server was elapsed.

So yes it's very possible that your server has its own timeout, but it should
be in the 30s from what I saw in your logs. It sill does not explain why some
requests never time out on the server, maybe they don't wake the same components
up ?

Regards,
Willy