Re: Seeing high CPU usage with obscene number of calls to epoll_wait
On Wed, Dec 5, 2012 at 10:58 PM, Willy Tarreau w...@1wt.eu wrote: Hi Bryan, Thanks a lot for your help Willy, I really appreciate. And for haproxy. It is a fantastic tool. On Wed, Dec 05, 2012 at 04:22:45PM +0100, Bryan Berry wrote:Does this stay that way for a long time ? I mean, could it be something like a health check not getting a response (eg: just a few seconds) or does that seem to match your client/server timeout (500s in your case) ? It does stay high, here is a graph of cpu performance over the last 24 hours, the left-hand side are % of CPU time https://docs.google.com/open?id=0BzPvBvLIIq7NV0QtTkliM3Yxenc The high cpu usage doesn't appear to correlate to any HTTP 500 status codes and I wouldn't expect it to since it seems related to the TCP mode proxying of our databases. Could you please add level admin on your stats socket, restart and issue a show sess all on the stats socket when the issue happens, and capture the output. It will help *a lot*. The best way to do it is to redirect it to a file, for example like this : echo show sess all | socat stdio /var/run/haproxy.sock show-sess.out done https://docs.google.com/document/d/1A3qEq0RmlAtG-fzKJDbZgB0pvmYJnlUuJ0T2IrpBGGg/edit Here are the IP addresses of the database backend servers. Note they are not the originals but have been munged to protect the innocent. 168.100.2.181, 168.100.2.237, 168.100.2.195, 168.100.2.183 just by playing w/ strace, it looks like the following function is being called over and over again with a value of 0 for wait_time status = epoll_wait(0, {}, 26, 0) Line 133, ev_epoll.c hope this helps! thanks again for your assistance
Re: Any interest in option dontlogcrap ?
Hi Willy, I would prefer trigger logging through an ACL, like dolog if { hdr(Host) www.domain.com } or dolog unless { path_end .jpg .png } :) cheers On Thu, Dec 6, 2012 at 1:55 AM, Willy Tarreau w...@1wt.eu wrote: SBD's mail made me think about what we'd want not to log, it's basically pollution. I *do* like to log pollution as long as it's low. But in his situation, it can become a nuisance. I was thinking that maybe we could have an option dontlogcrap or something like this, which would prevent haproxy from logging requests it does not completely parse. Do others think this could be useful or am I overlooking something ? Thanks, Willy
Re: Seeing high CPU usage with obscene number of calls to epoll_wait
On Thu, Dec 6, 2012 at 10:10 AM, Bryan Berry bryan.be...@gmail.com wrote: On Wed, Dec 5, 2012 at 10:58 PM, Willy Tarreau w...@1wt.eu wrote: Hi Bryan, Thanks a lot for your help Willy, I really appreciate. And for haproxy. It is a fantastic tool. On Wed, Dec 05, 2012 at 04:22:45PM +0100, Bryan Berry wrote:Does this stay that way for a long time ? I mean, could it be something like a health check not getting a response (eg: just a few seconds) or does that seem to match your client/server timeout (500s in your case) ? It does stay high, here is a graph of cpu performance over the last 24 hours, the left-hand side are % of CPU time https://docs.google.com/open?id=0BzPvBvLIIq7NV0QtTkliM3Yxenc The high cpu usage doesn't appear to correlate to any HTTP 500 status codes and I wouldn't expect it to since it seems related to the TCP mode proxying of our databases. Could you please add level admin on your stats socket, restart and issue a show sess all on the stats socket when the issue happens, and capture the output. It will help *a lot*. The best way to do it is to redirect it to a file, for example like this : echo show sess all | socat stdio /var/run/haproxy.sock show-sess.out done https://docs.google.com/document/d/1A3qEq0RmlAtG-fzKJDbZgB0pvmYJnlUuJ0T2IrpBGGg/edit Here are the IP addresses of the database backend servers. Note they are not the originals but have been munged to protect the innocent. 168.100.2.181, 168.100.2.237, 168.100.2.195, 168.100.2.183 just by playing w/ strace, it looks like the following function is being called over and over again with a value of 0 for wait_time status = epoll_wait(0, {}, 26, 0) Line 133, ev_epoll.c hope this helps! thanks again for your assistance Hi Willy, I got the same issue at a customer yesterday with long term TCP connections (exchange 2010 load-balancing). There was roughly 6 open connections at that time on the LB. cheers
Re: Any interest in option dontlogcrap ?
Hi Baptiste, On Thu, Dec 06, 2012 at 10:21:35AM +0100, Baptiste wrote: Hi Willy, I would prefer trigger logging through an ACL, like dolog if { hdr(Host) www.domain.com } or dolog unless { path_end .jpg .png } It would rather be dontlog as was initially planned, because logging is enabled by default. However since this old introduction into the roadmap, I realized that several users want it at different places and that we need at least 3 hooks : - tcp-request content - http-request - http-response SBD suggested off-list that we could also play with the logging level instead of just on/off, which kind of makes sense to me. Cheers, Willy
Re: Seeing high CPU usage with obscene number of calls to epoll_wait
On Thu, Dec 6, 2012 at 11:16 AM, Willy Tarreau w...@1wt.eu wrote: Hi Bryan, On Thu, Dec 06, 2012 at 10:10:18AM +0100, Bryan Berry wrote: It does stay high, here is a graph of cpu performance over the last 24 hours, the left-hand side are % of CPU time https://docs.google.com/open?id=0BzPvBvLIIq7NV0QtTkliM3Yxenc OK so since the graph does not commonly show 100%, I think that in practice it's oscillating quickly between 100 and zero and is averaged on the graph. It is not oscillating, I have watched htop (like top) and the cpu usage is stuck at 100% for extended periods The high cpu usage doesn't appear to correlate to any HTTP 500 status codes and I wouldn't expect it to since it seems related to the TCP mode proxying of our databases. At first glance in your trace, all sessions seem correct, so I suspect that this is related to the TCP checks. Baptiste encountered a similar issue with another user in TCP mode with raw TCP checks. I'll see if I can reproduce any such issue and/or find an explanation. In the mean time, if you're adventurous enough to try to disable checks on TCP servers to see if the problem disappears, that could help. I will try that Yes it helps, thank you very much. I'm now back to trying to understand what is happening, and will keep you updated. In case you're volunteer for more intrusive debugging (eg: with gdb), I might have a few tests to suggest. But I don't want to abuse, I understand that it's a production platform. I can try debugging w/ gdb as this system isn't yet in production but will be soon. I don't have experience using gdb for debugging. Is there a specific command u want me to run?
Re: Seeing high CPU usage with obscene number of calls to epoll_wait
On Thu, Dec 06, 2012 at 12:06:05PM +0100, Bryan Berry wrote: I can try debugging w/ gdb as this system isn't yet in production but will be soon. Great! I don't have experience using gdb for debugging. Is there a specific command u want me to run? Not one specific yet but I'm currently working on adding some info and I will then suggest you some things to try out. You need to keep in mind that once the program is interrupted by the debugger, nothing works anymore, so the service will be disrupted even for potential developers. Willy
SV: VS: Haparoxy hangs in one minute on config reload
Hi Willy, Nice to know that a fix is on its way. Looking forward to that. We are in a process of migrating from Windows/WebSphere and have another twenty-five Jetty-apps that will run on this environment. With health checks from all these applications the problem might be bigger than it is today. I have put option nolinger in all the backends with backend-check in our test-environment. This change will be merged into production on Monday, but it might take some time before we know for sure if this has improved the situation. Its only one week left to do changes before Christmas, so I am an not sure how many reloads there will be before next Year. Thanks for great help so far. I will update You as soon as we get five or more successful reloads (or worst case, a reload that hangs in one minute again) Regards Terje -Opprinnelig melding- Fra: Willy Tarreau [mailto:w...@1wt.eu] Sendt: 5. desember 2012 22:43 Til: Borgen, Terje Kopi: haproxy@formilux.org Emne: Re: VS: Haparoxy hangs in one minute on config reload Hi Terje, On Wed, Dec 05, 2012 at 09:33:19AM +0100, Borgen, Terje wrote: Hi Willy, Thanks for Your quick response. I think You might be onto something here. We have a similar setup with haproxy using port 80 and have never experienced this problem in that environment. OK. /proc/sys/net/ipv4/ip_local_port_range says 32768-61000, so nothing special here. We have another similar problem when restarting the Jetty-servers on the same server. We always get an error saying that the port is in use and we have to wait one minute before it can start again. The Jetty ports (as You can see in the config) are also outside the ip_local_port_range. But this might be another problem since it happens every restart. Yes, typically a listening port bound without SO_REUSEADDR. Very common in fact. Some additional info: - We have two identical servers running apache http server, haproxy and jetty servers. Most of the traffic hits the main server, and the reload problem have never happened on the failover server. So this problem might be traffic-related. - For one week we changed the inter-parameter on the clusters from default 2000 to 6 leaving rise/fall as default. In that period the problem never occurred. OK, I see. The health checks are causing too many time-wait sockets. This issue was very recently fixed (in 1.5-dev14) as haproxy now closes health check sockets with a TCP reset, thus avoiding the TIME_WAIT. I'm pretty sure they're the one causing the issue as I've experienced a similar one recently (reason why I fixed it :-)). I have not backported this yet as I wanted to keep an observation period. However you can try something : put option nolinger in your BACKENDS, not your frontends, otherwise some clients will experience truncated responses!!! All backend connections (including checks) will be closed by a reset and you should see much less TIME_WAIT sockets between haproxy and the servers. Regards, Willy
Re: domain based load balancing
On Thu, Dec 06, 2012 at 04:55:42PM +0100, Baptiste wrote: you could specify your domain across multiple lines: acl foo hdr(Host) name1 name2 acl foo hdr(Host) name3 name4 etc... a logical OR is applied if ACLs share the same name. And if there are *that* many, load them from a file, it will be more manageable : acl foo hdr(Host) -f names.txt Cheers, Willy
Re: SYN_RECEIVED / SMTP / Transparent mode
Hi, On 06.12.2012 16:53, Ozgur Tas wrote: Haproxy 1.4.22 on CENTOS 6.3 (kernel 2.6.32-279.14.1 ) on HYPER-V (with Hyper-V integration) - I know Centos a little bit and can confirm that this is working. Hi, I'm trying to get transparent proxy working, however looking at my TDC38 (hub) server for connections on port 25, I do see the correct client IP (10.10.0.223) where im telneting from on port 25, but just shows SYN_RECEIVED, does not establish a connection. Been looking for a solution for a while and cannot find an answer. (iptables is disabled and not looking to use it on my setup). Without iptables you wont get tproxy aka transparent proxy to work. On the machine where the haproxy runs you need firewall rules like this. -- #dns -A RH-Firewall-1-INPUT -p tcp --dport 25 -j ACCEPT # -A RH-Firewall-1-INPUT -m udp -p udp --dport 1194 -j ACCEPT -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited COMMIT *mangle -N DIVERT -A PREROUTING -p tcp -m socket -j DIVERT -A DIVERT -j MARK --set-mark 1 -A DIVERT -j ACCEPT COMMIT *nat :PREROUTING ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] -I POSTROUTING -d 0/0 -j MASQUERADE -o eth0 COMMIT -- Where eth0 is the external and eth1 the internal interface in /etc/rc.local I have these lines -- ip rule add fwmark 1 lookup 100 ip route add local 0.0.0.0/0 dev lo table 100 -- All traffic was running through tproxy needs to be routed locally back to haproxy. Thanks, Oz On TDC38(10.10.0.63 - hub server) C:\netstat -ano | findstr 223 TCP10.10.0.63:25 10.10.0.223:47217 SYN_RECEIVED4044 ( -- here) TCP10.10.0.63:443 10.10.0.107:56223 ESTABLISHED 4 TCP10.10.0.63:59531 10.10.0.107:42231 ESTABLISHED 3652 TCP10.10.0.63:59531 10.10.0.107:44223 ESTABLISHED 3652 TCP10.10.0.64:338910.10.0.223:60206 ESTABLISHED 5072 = HAPROXY config:: global #uid 99 #gid 99 daemon stats socket /var/run/haproxy.stat mode 600 level admin maxconn 4 ulimit-n 81000 pidfile /var/run/haproxy.pid defaults #log global mode http retries 3 contimeout 4000 clitimeout 360 srvtimeout 360 balance roundrobin option tcp-smart-accept option tcp-smart-connect frontend ft_smtp mode tcp bind 0.0.0.0:25 #source 0.0.0.0 usesrc clientip #log global #option tcplog #tcp-request inspect-delay 30s #acl content_present req_len gt 0 #tcp-request content reject if content_present default_backend bk_smtp on the frontend you dont need any source 0.0.0.0 line. backend bk_smtp mode tcp balance roundrobin source 0.0.0.0 usesrc clientip log global option tcplog option smtpchk HELO morrisonhershfield.com default-server inter 3s rise 2 fall 3 server TDC38 10.10.0.63:25 check listen stats :7000 stats enable stats show-node TDCLB01 stats show-desc MASTER node for Exchange#stats hide-version #stats realm Haproxy\ Statistics stats uri / stats refresh 5s option httpclose Please ensure that client TDC38 10.10.0.63 uses the haproxy machine as the default gateway. hope this helps you, cheers thomas
Re: domain based load balancing
Hi Willy, Thank you for that great software ! On Wed, Dec 5, 2012 at 8:11 PM, Willy Tarreau w...@1wt.eu wrote: Is there a better way of accomplishing this? Have you thought about hashing the Host header (for example) ? Just an idea, I don't know how that fits your need. Can you give an example of that idea ? Regards, Alexandre
RE: domain based load balancing
balance hdr(host) it would round robin but sticky anything with the same value to first server to get the request for that domain -Original Message- From: Alexandre Biancalana [mailto:biancal...@gmail.com] Sent: Thursday, December 6, 2012 12:51 PM To: Willy Tarreau Cc: Daniel Alfonso; haproxy@formilux.org Subject: Re: domain based load balancing Hi Willy, Thank you for that great software ! On Wed, Dec 5, 2012 at 8:11 PM, Willy Tarreau w...@1wt.eu wrote: Is there a better way of accomplishing this? Have you thought about hashing the Host header (for example) ? Just an idea, I don't know how that fits your need. Can you give an example of that idea ? Regards, Alexandre
Re: domain based load balancing
On Thu, Dec 06, 2012 at 01:45:51PM -0500, Daniel Alfonso wrote: balance hdr(host) it would round robin but sticky anything with the same value to first server to get the request for that domain more precisely it would not round robin, it would hash the value of the Host header, and use the result to select a server. Regards, Willy
Re: Seeing high CPU usage with obscene number of calls to epoll_wait
Thanks a lot Willy I will try it out tomorrow and let you know On Dec 6, 2012 8:31 PM, Willy Tarreau w...@1wt.eu wrote: Hi Bryan, I have some good news. I can realiably reproduce it and I have a workaround. It's not a fix yet until the issue is completely qualified. It happens with error delivery that tends to propagate via the send() channel without disabling send activity because there is nothing to send, hence the loop afterwards. The temporary fix consists in leaving the ERR check only on the recv path and disabling it on the send path : src/ev_epoll.c: - if (fdtab[fd].ev (FD_POLL_OUT|FD_POLL_ERR)) + if (fdtab[fd].ev (FD_POLL_OUT/*|FD_POLL_ERR*/)) fd_ev_set(fd, DIR_WR); At least your CPU issues will go, but I need to completely understand the root cause of the issue before committing this. Hoping this helps, Willy
Re: ssl for ver 1.5 question
Thanks I have it working.. Alex Baptiste wrote: Hi Alex, by default, IIS will export the cert in a PKCS12 format, you have to translate it into PEM format. When exporting, don't forget to export the private key as well. openssl pkcs12 -in key_and_cert.pfx -out key_andcert.pem -nodes cheers On Thu, Dec 6, 2012 at 2:43 PM, DeMarco, Alex alex.dema...@suny.edu wrote: Hello, I am trying to setup a test of haproxy terminating ssl for an IIS website. The IIS site already has a ssl cert bound to it. Do I just export the cert from IIS and then point haproxy to the cert file? Like: bind 0.0.0.0:443 ssl crt ./mycert.crt prefer-server-ciphers I am not well versed in SSL tech so thanks for all help. - ALex