Re: Seeing high CPU usage with obscene number of calls to epoll_wait

2012-12-06 Thread Bryan Berry
On Wed, Dec 5, 2012 at 10:58 PM, Willy Tarreau w...@1wt.eu wrote:

 Hi Bryan,

 Thanks a lot for your help Willy, I really appreciate. And for haproxy. It
is a fantastic tool.


 On Wed, Dec 05, 2012 at 04:22:45PM +0100, Bryan Berry wrote:Does this stay
 that way for a long time ? I mean, could it be something
 like a health check not getting a response (eg: just a few seconds) or
 does that seem to match your client/server timeout (500s in your case) ?


It does stay high, here is a graph of cpu performance over the last 24
hours, the left-hand side are % of CPU time
https://docs.google.com/open?id=0BzPvBvLIIq7NV0QtTkliM3Yxenc

The high cpu usage doesn't appear to correlate to any HTTP 500 status codes
and I wouldn't expect it to since it seems related to the TCP mode proxying
of our databases.



 Could you please add level admin on your stats socket, restart and issue
 a show sess all on the stats socket when the issue happens, and capture
 the output. It will help *a lot*. The best way to do it is to redirect it
 to a file, for example like this :

echo show sess all | socat stdio /var/run/haproxy.sock  show-sess.out


done

https://docs.google.com/document/d/1A3qEq0RmlAtG-fzKJDbZgB0pvmYJnlUuJ0T2IrpBGGg/edit

Here are the IP addresses of the database backend servers. Note they are
not the originals but have been munged to protect the innocent.

168.100.2.181, 168.100.2.237, 168.100.2.195, 168.100.2.183

just by playing w/ strace, it looks like the following function is being
called over and over again with a value of 0 for wait_time

status = epoll_wait(0, {}, 26, 0)

Line 133, ev_epoll.c

hope this helps! thanks again for your assistance


Re: Any interest in option dontlogcrap ?

2012-12-06 Thread Baptiste
Hi Willy,

I would prefer trigger logging through an ACL, like
 dolog if { hdr(Host) www.domain.com }
or
 dolog unless { path_end .jpg .png }

:)

cheers


On Thu, Dec 6, 2012 at 1:55 AM, Willy Tarreau w...@1wt.eu wrote:
 SBD's mail made me think about what we'd want not to log, it's basically
 pollution. I *do* like to log pollution as long as it's low. But in his
 situation, it can become a nuisance.

 I was thinking that maybe we could have an option dontlogcrap or something
 like this, which would prevent haproxy from logging requests it does not
 completely parse.

 Do others think this could be useful or am I overlooking something ?

 Thanks,
 Willy





Re: Seeing high CPU usage with obscene number of calls to epoll_wait

2012-12-06 Thread Baptiste
On Thu, Dec 6, 2012 at 10:10 AM, Bryan Berry bryan.be...@gmail.com wrote:
 On Wed, Dec 5, 2012 at 10:58 PM, Willy Tarreau w...@1wt.eu wrote:

 Hi Bryan,

 Thanks a lot for your help Willy, I really appreciate. And for haproxy. It
 is a fantastic tool.


 On Wed, Dec 05, 2012 at 04:22:45PM +0100, Bryan Berry wrote:Does this stay
 that way for a long time ? I mean, could it be something
 like a health check not getting a response (eg: just a few seconds) or
 does that seem to match your client/server timeout (500s in your case) ?


 It does stay high, here is a graph of cpu performance over the last 24
 hours, the left-hand side are % of CPU time
 https://docs.google.com/open?id=0BzPvBvLIIq7NV0QtTkliM3Yxenc

 The high cpu usage doesn't appear to correlate to any HTTP 500 status codes
 and I wouldn't expect it to since it seems related to the TCP mode proxying
 of our databases.



 Could you please add level admin on your stats socket, restart and issue
 a show sess all on the stats socket when the issue happens, and capture
 the output. It will help *a lot*. The best way to do it is to redirect it
 to a file, for example like this :

echo show sess all | socat stdio /var/run/haproxy.sock 
 show-sess.out


 done

 https://docs.google.com/document/d/1A3qEq0RmlAtG-fzKJDbZgB0pvmYJnlUuJ0T2IrpBGGg/edit

 Here are the IP addresses of the database backend servers. Note they are not
 the originals but have been munged to protect the innocent.

 168.100.2.181, 168.100.2.237, 168.100.2.195, 168.100.2.183

 just by playing w/ strace, it looks like the following function is being
 called over and over again with a value of 0 for wait_time

 status = epoll_wait(0, {}, 26, 0)

 Line 133, ev_epoll.c

 hope this helps! thanks again for your assistance



Hi Willy,

I got the same issue at a customer yesterday with long term TCP
connections (exchange 2010 load-balancing).
There was roughly 6 open connections at that time on the LB.

cheers



Re: Any interest in option dontlogcrap ?

2012-12-06 Thread Willy Tarreau
Hi Baptiste,

On Thu, Dec 06, 2012 at 10:21:35AM +0100, Baptiste wrote:
 Hi Willy,
 
 I would prefer trigger logging through an ACL, like
  dolog if { hdr(Host) www.domain.com }
 or
  dolog unless { path_end .jpg .png }

It would rather be dontlog as was initially planned, because logging is
enabled by default. However since this old introduction into the roadmap,
I realized that several users want it at different places and that we need
at least 3 hooks :
  - tcp-request content
  - http-request
  - http-response

SBD suggested off-list that we could also play with the logging level instead
of just on/off, which kind of makes sense to me.

Cheers,
Willy




Re: Seeing high CPU usage with obscene number of calls to epoll_wait

2012-12-06 Thread Bryan Berry
On Thu, Dec 6, 2012 at 11:16 AM, Willy Tarreau w...@1wt.eu wrote:

 Hi Bryan,

 On Thu, Dec 06, 2012 at 10:10:18AM +0100, Bryan Berry wrote:
  It does stay high, here is a graph of cpu performance over the last 24
  hours, the left-hand side are % of CPU time
  https://docs.google.com/open?id=0BzPvBvLIIq7NV0QtTkliM3Yxenc

 OK so since the graph does not commonly show 100%, I think that in
 practice it's oscillating quickly between 100 and zero and is averaged
 on the graph.

 It is not oscillating, I have watched htop (like top) and the cpu usage is
stuck at 100% for extended periods


  The high cpu usage doesn't appear to correlate to any HTTP 500 status
 codes
  and I wouldn't expect it to since it seems related to the TCP mode
 proxying
  of our databases.

 At first glance in your trace, all sessions seem correct, so I suspect that
 this is related to the TCP checks. Baptiste encountered a similar issue
 with
 another user in TCP mode with raw TCP checks. I'll see if I can reproduce
 any
 such issue and/or find an explanation.


In the mean time, if you're adventurous enough to try to disable checks on
 TCP servers to see if the problem disappears, that could help.

 I will try that


 Yes it helps, thank you very much. I'm now back to trying to understand
 what is happening, and will keep you updated. In case you're volunteer
 for more intrusive debugging (eg: with gdb), I might have a few tests
 to suggest. But I don't want to abuse, I understand that it's a production
 platform.


I can try debugging w/ gdb as this system isn't yet in production but will
be soon.

I don't have experience using gdb for debugging. Is there a specific
command u want me to run?


Re: Seeing high CPU usage with obscene number of calls to epoll_wait

2012-12-06 Thread Willy Tarreau
On Thu, Dec 06, 2012 at 12:06:05PM +0100, Bryan Berry wrote:
 I can try debugging w/ gdb as this system isn't yet in production but will
 be soon.

Great!

 I don't have experience using gdb for debugging. Is there a specific
 command u want me to run?

Not one specific yet but I'm currently working on adding some info and
I will then suggest you some things to try out. You need to keep in mind
that once the program is interrupted by the debugger, nothing works anymore,
so the service will be disrupted even for potential developers.

Willy




SV: VS: Haparoxy hangs in one minute on config reload

2012-12-06 Thread Borgen, Terje
Hi Willy,
Nice to know that a fix is on its way. Looking forward to that. We are in a 
process of migrating from Windows/WebSphere and have another twenty-five 
Jetty-apps that will run on this environment. With health checks from all these 
applications the problem might be bigger than it is today. 

I have put option nolinger in all the backends with backend-check in our 
test-environment. This change will be merged into production on Monday, but it 
might take some time before we know for sure if this has improved the 
situation. Its only one week left to do changes before Christmas, so I am an 
not sure how many reloads there will be before next Year.

Thanks for great help so far. I will update You as soon as we get five or more 
successful reloads (or worst case, a reload that hangs in one minute again)

Regards
Terje

-Opprinnelig melding-
Fra: Willy Tarreau [mailto:w...@1wt.eu] 
Sendt: 5. desember 2012 22:43
Til: Borgen, Terje
Kopi: haproxy@formilux.org
Emne: Re: VS: Haparoxy hangs in one minute on config reload

Hi Terje,

On Wed, Dec 05, 2012 at 09:33:19AM +0100, Borgen, Terje wrote:
 Hi Willy,
 Thanks for Your quick response.
 I think You might be onto something here. We have a similar setup with 
 haproxy using port 80 and have never experienced this problem in that 
 environment.

OK.

 /proc/sys/net/ipv4/ip_local_port_range says 32768-61000, so nothing 
 special here. We have another similar problem when restarting the 
 Jetty-servers on the same server. We always get an error saying that 
 the port is in use and we have to wait one minute before it can start 
 again. The Jetty ports (as You can see in the config) are also outside 
 the ip_local_port_range. But this might be another problem since it happens 
 every restart.

Yes, typically a listening port bound without SO_REUSEADDR. Very common in fact.

 Some additional info:
 - We have two identical servers running apache http server, haproxy 
 and jetty servers. Most of the traffic hits the main server, and the 
 reload problem have never happened on the failover server. So this 
 problem might be traffic-related.
 - For one week we changed the inter-parameter on the clusters from 
 default 2000 to 6 leaving rise/fall as default. In that period the 
 problem never occurred.

OK, I see. The health checks are causing too many time-wait sockets.
This issue was very recently fixed (in 1.5-dev14) as haproxy now closes health 
check sockets with a TCP reset, thus avoiding the TIME_WAIT. I'm pretty sure 
they're the one causing the issue as I've experienced a similar one recently 
(reason why I fixed it :-)).

I have not backported this yet as I wanted to keep an observation period.

However you can try something : put option nolinger in your BACKENDS, not 
your frontends, otherwise some clients will experience truncated responses!!! 
All backend connections (including checks) will be closed by a reset and you 
should see much less TIME_WAIT sockets between haproxy and the servers.

Regards,
Willy




Re: domain based load balancing

2012-12-06 Thread Willy Tarreau
On Thu, Dec 06, 2012 at 04:55:42PM +0100, Baptiste wrote:
 you could specify your domain across multiple lines:
 acl foo hdr(Host) name1 name2
 acl foo hdr(Host) name3 name4
 etc...
 
 a logical OR is applied if ACLs share the same name.

And if there are *that* many, load them from a file, it will be more
manageable :

   acl foo hdr(Host) -f names.txt

Cheers,
Willy




Re: SYN_RECEIVED / SMTP / Transparent mode

2012-12-06 Thread Thomas Heil
Hi,

On 06.12.2012 16:53, Ozgur Tas wrote:
 Haproxy 1.4.22  on CENTOS 6.3 (kernel 2.6.32-279.14.1 ) on HYPER-V  (with 
 Hyper-V integration)
 -
I know Centos a little bit and can confirm that this is working.
 Hi,
 I'm trying to get transparent proxy working, however looking at my TDC38 
 (hub) server for connections on port 25, I do see the correct client IP 
 (10.10.0.223) where im telneting from on port 25, but just shows 
 SYN_RECEIVED, does not establish a connection.  Been looking for a solution 
 for a while and cannot find an answer.   (iptables is disabled and not 
 looking to use it on my setup).
Without iptables you wont get tproxy aka transparent proxy to work. On
the machine where the haproxy runs you need firewall rules like
this.
--
#dns
-A RH-Firewall-1-INPUT -p tcp --dport 25 -j ACCEPT
#
-A RH-Firewall-1-INPUT -m udp -p udp --dport 1194 -j ACCEPT
-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited
COMMIT
*mangle
-N DIVERT
-A PREROUTING -p tcp -m socket -j DIVERT
-A DIVERT -j MARK --set-mark 1
-A DIVERT -j ACCEPT
COMMIT
*nat
:PREROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-I POSTROUTING -d 0/0 -j MASQUERADE -o eth0
COMMIT
--
Where eth0 is the external and eth1 the internal interface

in /etc/rc.local I have these lines
--
ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100
--
All traffic was running through tproxy needs to be routed locally back
to haproxy.
 Thanks,
 Oz

 On TDC38(10.10.0.63 - hub server)
 C:\netstat -ano | findstr 223
   TCP10.10.0.63:25  10.10.0.223:47217  SYN_RECEIVED4044   
  ( -- here)
   TCP10.10.0.63:443 10.10.0.107:56223  ESTABLISHED 4
   TCP10.10.0.63:59531   10.10.0.107:42231  ESTABLISHED 3652
   TCP10.10.0.63:59531   10.10.0.107:44223  ESTABLISHED 3652
   TCP10.10.0.64:338910.10.0.223:60206  ESTABLISHED 5072
 =

 HAPROXY config::

 global
 #uid 99
 #gid 99
 daemon
 stats socket /var/run/haproxy.stat mode 600 level admin
 maxconn 4
 ulimit-n 81000
 pidfile /var/run/haproxy.pid

 defaults
  #log global
  mode http
  retries 3
 contimeout  4000
 clitimeout  360
 srvtimeout  360
 balance roundrobin
 option tcp-smart-accept
 option tcp-smart-connect

 frontend ft_smtp
   mode tcp
   bind 0.0.0.0:25 
   #source 0.0.0.0 usesrc clientip
   #log global
   #option tcplog
   #tcp-request inspect-delay 30s
   #acl content_present req_len gt 0
   #tcp-request content reject if content_present
   default_backend bk_smtp
on the frontend you dont need any source 0.0.0.0 line.
 backend bk_smtp
   mode tcp
   balance roundrobin
   source 0.0.0.0 usesrc clientip
   log global
   option tcplog
   option smtpchk HELO morrisonhershfield.com
   default-server inter 3s rise 2 fall 3
   server TDC38 10.10.0.63:25 check

 listen stats :7000
 stats   enable
   stats show-node TDCLB01
   stats show-desc MASTER node for Exchange#stats hide-version
   #stats realm Haproxy\ Statistics
 stats   uri /
 stats refresh 5s
 option  httpclose




Please ensure that client TDC38 10.10.0.63 uses the haproxy machine as
the default gateway.

hope this helps you,

cheers
thomas



Re: domain based load balancing

2012-12-06 Thread Alexandre Biancalana
Hi Willy,

  Thank you for that great software !

On Wed, Dec 5, 2012 at 8:11 PM, Willy Tarreau w...@1wt.eu wrote:

 Is there a better way of accomplishing this?

 Have you thought about hashing the Host header (for example) ? Just an
 idea, I don't know how that fits your need.

Can you give an example of that idea ?

Regards,
Alexandre



RE: domain based load balancing

2012-12-06 Thread Daniel Alfonso
balance hdr(host)

it would round robin but sticky anything with the same value to first server to 
get the request for that domain

-Original Message-
From: Alexandre Biancalana [mailto:biancal...@gmail.com] 
Sent: Thursday, December 6, 2012 12:51 PM
To: Willy Tarreau
Cc: Daniel Alfonso; haproxy@formilux.org
Subject: Re: domain based load balancing

Hi Willy,

  Thank you for that great software !

On Wed, Dec 5, 2012 at 8:11 PM, Willy Tarreau w...@1wt.eu wrote:

 Is there a better way of accomplishing this?

 Have you thought about hashing the Host header (for example) ? Just an 
 idea, I don't know how that fits your need.

Can you give an example of that idea ?

Regards,
Alexandre



Re: domain based load balancing

2012-12-06 Thread Willy Tarreau
On Thu, Dec 06, 2012 at 01:45:51PM -0500, Daniel Alfonso wrote:
 balance hdr(host)
 
 it would round robin but sticky anything with the same value to first server 
 to get the request for that domain

more precisely it would not round robin, it would hash the value of the
Host header, and use the result to select a server.

Regards,
Willy




Re: Seeing high CPU usage with obscene number of calls to epoll_wait

2012-12-06 Thread Bryan Berry
Thanks a lot Willy

I will try it out tomorrow and let you know
On Dec 6, 2012 8:31 PM, Willy Tarreau w...@1wt.eu wrote:

 Hi Bryan,

 I have some good news. I can realiably reproduce it and I have a
 workaround.
 It's not a fix yet until the issue is completely qualified. It happens with
 error delivery that tends to propagate via the send() channel without
 disabling
 send activity because there is nothing to send, hence the loop afterwards.

 The temporary fix consists in leaving the ERR check only on the recv path
 and
 disabling it on the send path :

 src/ev_epoll.c:

 -   if (fdtab[fd].ev  (FD_POLL_OUT|FD_POLL_ERR))
 +   if (fdtab[fd].ev  (FD_POLL_OUT/*|FD_POLL_ERR*/))
 fd_ev_set(fd, DIR_WR);

 At least your CPU issues will go, but I need to completely understand the
 root cause of the issue before committing this.

 Hoping this helps,
 Willy




Re: ssl for ver 1.5 question

2012-12-06 Thread DeMarco, Alex
Thanks I have it working..
Alex
Baptiste wrote:
Hi Alex,

by default, IIS will export the cert in a PKCS12 format, you have to
translate it into PEM format.
When exporting, don't forget to export the private key as well.

openssl pkcs12 -in key_and_cert.pfx -out key_andcert.pem -nodes


cheers

On Thu, Dec 6, 2012 at 2:43 PM, DeMarco, Alex alex.dema...@suny.edu wrote:
 Hello,



 I am trying to setup a test of  haproxy terminating ssl  for an IIS website.
 The IIS site already has a ssl cert bound to it.  Do I just export the cert
 from IIS and then point haproxy to the cert file? Like:

 bind 0.0.0.0:443 ssl crt ./mycert.crt prefer-server-ciphers



 I am not well versed in SSL tech so thanks for all help.



 -  ALex