Re: HAproxy constant memory leak

2015-02-06 Thread Georges-Etienne Legendre
Hi Willy,

Yes, please send me the script.

Thanks!

-- Georges-Etienne

 Le 2015-02-06 à 01:55, Willy Tarreau w...@1wt.eu a écrit :
 
 Hi Georges-Etienne,
 
 On Thu, Feb 05, 2015 at 09:10:25PM -0500, Georges-Etienne Legendre wrote:
 Hi Willy,
 
 I'm not sure how to document this leak. I don't know exactly how is
 implemented the firewall SSL health check... Would the Wireshark trace be
 enough to report the issue?
 
 Yes I think it will definitely help, given that the exchange is very short,
 it basically is an SSL hello. It is important to report the strace output
 as well and to mention that haproxy is running in a chroot.
 
 If that helps you, I can send you the script I wrote from your capture.
 You could try it to confirm it does the same effect as the firewall's
 checks. Once confirmed, it will be much easier to build a bug report
 given that you'll simply have to attach the script as a reproducer.
 
 Best regards,
 Willy
 



Re: HAproxy constant memory leak

2015-02-06 Thread Pavlos Parissis
On 06/02/2015 11:19 πμ, Georges-Etienne Legendre wrote:
 Hi Willy,
 
 Yes, please send me the script.
 

Willy,
If it isn't against the policies of this ML to send attachments and the
script is few kilobytes size, could you please send it to the list?

Thanks,
Pavlos




signature.asc
Description: OpenPGP digital signature


Re: HAproxy constant memory leak

2015-02-05 Thread Willy Tarreau
Hi Georges-Etienne,

On Thu, Feb 05, 2015 at 09:10:25PM -0500, Georges-Etienne Legendre wrote:
 Hi Willy,
 
 I'm not sure how to document this leak. I don't know exactly how is
 implemented the firewall SSL health check... Would the Wireshark trace be
 enough to report the issue?

Yes I think it will definitely help, given that the exchange is very short,
it basically is an SSL hello. It is important to report the strace output
as well and to mention that haproxy is running in a chroot.

If that helps you, I can send you the script I wrote from your capture.
You could try it to confirm it does the same effect as the firewall's
checks. Once confirmed, it will be much easier to build a bug report
given that you'll simply have to attach the script as a reproducer.

Best regards,
Willy




Re: HAproxy constant memory leak

2015-02-05 Thread Georges-Etienne Legendre
Hi Willy,

I'm not sure how to document this leak. I don't know exactly how is
implemented the firewall SSL health check... Would the Wireshark trace be
enough to report the issue?

Thanks!

-- Georges-Etienne

On Tue, Feb 3, 2015 at 5:52 PM, Willy Tarreau w...@1wt.eu wrote:

 Hi Georges-Etienne,

 On Tue, Feb 03, 2015 at 08:09:15AM -0500, Georges-Etienne Legendre wrote:
  Hi Willy,
 
  Thanks a lot for this investigation, it was really helpful.
 
  My OpenSSL is up-to-date on this server. I first tried to remove the
 chroot
  statement. I'm pretty sure this in itself solved the leak, but I no
 longer
  have the traces and couple of hours after, our Ops changed the SSL check
 to
  a simple TCP check on port 443. So, I cannot confirm 100%.
 
  I can however confirm that I no longer experience the leak. I put back
 the
  chroot command to be safer.

 OK that's great.

  This also prompted me to tweak the SSL ciphers. I now use a more
 thoughtful
  list of ciphers (
  https://mozilla.github.io/server-side-tls/ssl-config-generator/) and
  disabled SSLv3. This indeed disables KRB5.
 
  ssl-default-bind-ciphers
 
 ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
  ssl-default-bind-options no-sslv3

 Wow! When we introduced SSL, I expected that a lot of difficulties
 would come from it, but not that the ugliest config statements would
 come with it as well :-)

  I will keep a close eye on the memory usage... HAproxy has been running
 for
  about 16 hours now, and here is the ps output:
  # ps -u nobody u
  USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
  nobody   63985  0.5  0.0  53868 10960 ?Ss   Feb02   5:19
  /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid
 
  Looks good :-)

 Yes indeed. Now I think it will really be important to report this leak
 to whomever it concerns (probably the distro vendor so that they decide
 whether it's in their own patches or in openssl upstream). My openssl
 version doesn't have krb5 and I have never understood what is needed to
 enable it nor what it provides. Crypto libs tend to be cryptic ...

 Willy




Re: HAproxy constant memory leak

2015-02-03 Thread Willy Tarreau
Hi Georges-Etienne,

On Tue, Feb 03, 2015 at 08:09:15AM -0500, Georges-Etienne Legendre wrote:
 Hi Willy,
 
 Thanks a lot for this investigation, it was really helpful.
 
 My OpenSSL is up-to-date on this server. I first tried to remove the chroot
 statement. I'm pretty sure this in itself solved the leak, but I no longer
 have the traces and couple of hours after, our Ops changed the SSL check to
 a simple TCP check on port 443. So, I cannot confirm 100%.
 
 I can however confirm that I no longer experience the leak. I put back the
 chroot command to be safer.

OK that's great.

 This also prompted me to tweak the SSL ciphers. I now use a more thoughtful
 list of ciphers (
 https://mozilla.github.io/server-side-tls/ssl-config-generator/) and
 disabled SSLv3. This indeed disables KRB5.
 
 ssl-default-bind-ciphers
 ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
 ssl-default-bind-options no-sslv3

Wow! When we introduced SSL, I expected that a lot of difficulties
would come from it, but not that the ugliest config statements would
come with it as well :-)

 I will keep a close eye on the memory usage... HAproxy has been running for
 about 16 hours now, and here is the ps output:
 # ps -u nobody u
 USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
 nobody   63985  0.5  0.0  53868 10960 ?Ss   Feb02   5:19
 /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid
 
 Looks good :-)

Yes indeed. Now I think it will really be important to report this leak
to whomever it concerns (probably the distro vendor so that they decide
whether it's in their own patches or in openssl upstream). My openssl
version doesn't have krb5 and I have never understood what is needed to
enable it nor what it provides. Crypto libs tend to be cryptic ...

Willy




Re: HAproxy constant memory leak

2015-02-02 Thread Georges-Etienne Legendre
Thanks for your help.

The configuration is now back to 5000 maxconn, and Haproxy has been running
with this config over the last weekend. The memory footprint is now 1G.

# ps -u nobody u
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
nobody9103  0.7  3.9 1334192 1291740 ? Ss   Jan30  30:03
/usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid

# telnet localhost 1935
show pools
Dumping pools usage. Use SIGQUIT to flush them.
  - Pool pipe (32 bytes) : 5 allocated (160 bytes), 5 used, 3 users [SHARED]
  - Pool capture (64 bytes) : 0 allocated (0 bytes), 0 used, 1 users
[SHARED]
  - Pool channel (80 bytes) : 168 allocated (13440 bytes), 6 used, 1 users
[SHARED]
  - Pool task (112 bytes) : 149 allocated (16688 bytes), 67 used, 1 users
[SHARED]
  - Pool uniqueid (128 bytes) : 0 allocated (0 bytes), 0 used, 1 users
[SHARED]
  - Pool connection (320 bytes) : 168 allocated (53760 bytes), 6 used, 1
users [SHARED]
  - Pool hdr_idx (416 bytes) : 84 allocated (34944 bytes), 2 used, 1 users
[SHARED]
  - Pool session (864 bytes) : 85 allocated (73440 bytes), 3 used, 1 users
[SHARED]
  - Pool requri (1024 bytes) : 21 allocated (21504 bytes), 0 used, 1 users
[SHARED]
  - Pool buffer (16416 bytes) : 168 allocated (2757888 bytes), 6 used, 1
users [SHARED]
Total: 10 pools, 2971824 bytes allocated, 111984 used.



I've also executed the tcpdump + strace for about 30 seconds. There should
be no confidential info, but to be sure, I will send them to you personally.

Let me know if I can capture anything else that could be helpful.

Thanks!

-- Georges-Etienne

On Sat, Jan 31, 2015 at 8:40 AM, Willy Tarreau w...@1wt.eu wrote:

 On Sat, Jan 31, 2015 at 12:59:34AM +0100, Lukas Tribus wrote:
   The maxconn was set to 4096 before, and after 45 days, haproxy was
   using 20gigs...
 
  Ok, can you set maxconn back to 4096, reproduce the leak (to at least
  a few gigabytes) and a run show pools a few times to see where
  exactly the memory consumption comes from?

 Also, could you please send a network capture of the checks from
 the firewall to haproxy (if possible, taken on the haproxy side) ?
 It is possible that there is a specific sequence leading to an
 improper close (eg: some SSL structs not being released at certain
 steps in the handhskake, etc).

 Please use this to take your capture :

 tcpdump -vs0 -pi eth0 -w checks.cap host firewall-ip and port
 local-port

 Wait for several seconds, then Ctrl-C. Be careful, your capture
 will contain all the traffic flowing between haproxy and the
 firewall's address facing it, so there might be confidential
 information there, only send to the list if you think it's OK.

 Ideally, in parallel you can try to strace haproxy during this
 capture :

strace -tts200 -o checks.log -p $(pgrep haproxy)

 Thanks,
 Willy




Re: HAproxy constant memory leak

2015-02-02 Thread Willy Tarreau
Hi Georges-Etienne,

On Mon, Feb 02, 2015 at 08:35:21AM -0500, Georges-Etienne Legendre wrote:
 Thanks for your help.
 
 The configuration is now back to 5000 maxconn, and Haproxy has been running
 with this config over the last weekend. The memory footprint is now 1G.

OK, so there's no doubt about it.

 # ps -u nobody u
 USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
 nobody9103  0.7  3.9 1334192 1291740 ? Ss   Jan30  30:03
 /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid
 
 # telnet localhost 1935
 show pools
 Dumping pools usage. Use SIGQUIT to flush them.
   - Pool pipe (32 bytes) : 5 allocated (160 bytes), 5 used, 3 users [SHARED]
   - Pool capture (64 bytes) : 0 allocated (0 bytes), 0 used, 1 users
 [SHARED]
   - Pool channel (80 bytes) : 168 allocated (13440 bytes), 6 used, 1 users
 [SHARED]
   - Pool task (112 bytes) : 149 allocated (16688 bytes), 67 used, 1 users
 [SHARED]
   - Pool uniqueid (128 bytes) : 0 allocated (0 bytes), 0 used, 1 users
 [SHARED]
   - Pool connection (320 bytes) : 168 allocated (53760 bytes), 6 used, 1
 users [SHARED]
   - Pool hdr_idx (416 bytes) : 84 allocated (34944 bytes), 2 used, 1 users
 [SHARED]
   - Pool session (864 bytes) : 85 allocated (73440 bytes), 3 used, 1 users
 [SHARED]
   - Pool requri (1024 bytes) : 21 allocated (21504 bytes), 0 used, 1 users
 [SHARED]
   - Pool buffer (16416 bytes) : 168 allocated (2757888 bytes), 6 used, 1
 users [SHARED]
 Total: 10 pools, 2971824 bytes allocated, 111984 used.

Impressive, nothing *seems* to be used here. So we're leaking somewhere else.
I spent the whole afternoon on Friday trying to reproduce various cases using
connection probes, with/without data, with/without request, etc, but couldn't
come to anything related to what you're seeing.

 I've also executed the tcpdump + strace for about 30 seconds. There should
 be no confidential info, but to be sure, I will send them to you personally.

Great, thank you, I'll study all this. I really appreciate your help on
this bug! I hope I can reproduce it and spot what is happening.

Best regards,
Willy




Re: HAproxy constant memory leak

2015-02-02 Thread Willy Tarreau
Georges-Etienne,

your captures were extremely informative. While I cannot reproduce the
behaviour here even by reinjecting the same health check requests, I'm
seeing two really odd things in your trace below :

We accept an SSL connection from the firewall :

08:15:52.297357 accept(6, {sa_family=AF_INET, sin_port=htons(32764), 
sin_addr=inet_addr(firewall)}, [16]) = 1

It sends 48 bytes :

08:15:52.297717 read(1, \200.\1\3\0\0\25\0\0\0\20, 11) = 11
08:15:52.297831 read(1, 
\0\0\3\0\0\10\0\0\6\4\0\200\0\0\4\0\0\5O\0\0@\202J#i\242K7)\300\2536o\245=\23,
 37) = 37

Then we're checking for /etc/krb5.conf :

08:15:52.297984 stat(/etc/krb5.conf, 0x7fff544b1990) = -1 ENOENT (No such 
file or directory)

Then trying to read some random :

08:15:52.298082 open(/dev/urandom, O_RDONLY) = -1 ENOENT (No such file or 
directory)

Then trying to figure the local host name :

08:15:52.298187 uname({sys=Linux, node=node's local hostname, ...}) = 0

Then doing some netlink-based studd :

08:15:52.298316 socket(PF_NETLINK, SOCK_RAW, 0) = 2
08:15:52.298395 bind(2, {sa_family=AF_NETLINK, pid=0, groups=}, 12) = 0
08:15:52.298471 getsockname(2, {sa_family=AF_NETLINK, pid=9103, 
groups=}, [12]) = 0
08:15:52.298550 sendto(2, \24\0\0\0\26\0\1\3\210x\317T\0\0\0\0\0\0\0\0, 20, 
0, {sa_family=AF_NETLINK, pid=0, groups=}, 12) = 20
08:15:52.298650 recvmsg(2, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=}, 
msg_iov(1)=[{0\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1\10\0\2\0\177\0\0\1\7\0\3\0lo\0\0\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\32\200\0\n\0\0\0\10\0\1\0\n\0\35\22\10\0\2\0\n\0\35\22\10\0\4\0\n\0\35?\n\0\3\0bond0\0\0\0\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\32\200\0\f\0\0\0\10\0\1\0\n\2\177\217\10\0\2\0\n\2\177\217\10\0\4\0\n\2\177\277\n\0\3\0bond2\0\0\0\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\32\200\0\r\0\0\0\10\0\1\0\nZ\6j...,
 4096}], msg_controllen=0, msg_flags=0}, 0) = 356
08:15:52.298841 recvmsg(2, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=}, 
msg_iov(1)=[{@\0\0\0\24\0\2\0\210x\317T\217#\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\24\0\6\0\377\377\377\377\377\377\377\377H\3\0\0H\3\0\0@\0\0\0\24\0\2\0\210x\317T\217#\0\0\n@\200\375\n\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37@\24\0\6\0\377\377\377\377\377\377\377\377f\5\0\0f\5\0\0@\0\0\0\24\0\2\0\210x\317T\217#\0\0\n@\200\375\v\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37A\24\0\6\0\377\377\377\377\377\377\377\377\232\5\0\0\232\5\0\0@\0\0\0\24\0\2\0...,
 4096}], msg_controllen=0, msg_flags=0}, 0) = 448
08:15:52.299059 recvmsg(2, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=}, 
msg_iov(1)=[{\24\0\0\0\3\0\2\0\210x\317T\217#\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\24\0\6\0\377\377\377\377\377\377\377\377H\3\0\0H\3\0\0@\0\0\0\24\0\2\0\210x\317T\217#\0\0\n@\200\375\n\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37@\24\0\6\0\377\377\377\377\377\377\377\377f\5\0\0f\5\0\0@\0\0\0\24\0\2\0\210x\317T\217#\0\0\n@\200\375\v\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37A\24\0\6\0\377\377\377\377\377\377\377\377\232\5\0\0\232\5\0\0@\0\0\0\24\0\2\0...,
 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
08:15:52.299242 close(2)= 0

Then trying to open nsswitch.conf :

08:15:52.299353 open(/etc/nsswitch.conf, O_RDONLY) = -1 ENOENT (No such file 
or directory)

Then does the netlink + nsswitch dance a second time, followed by about
10 times the following with various domain name suffixes :

08:15:52.300841 open(/etc/resolv.conf, O_RDONLY) = -1 ENOENT (No such file or 
directory)
08:15:52.300938 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 2
08:15:52.301018 connect(2, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr(127.0.0.1)}, 16) = 0
08:15:52.301100 poll([{fd=2, events=POLLOUT}], 1, 0) = 1 ([{fd=2, 
revents=POLLOUT}])
08:15:52.301179 sendto(2, \327\r\1\0\0\1\0\0\0\0\0\0\t_kerberos\fvarious 
domain suffixes\0\0\20\0\1, 51, MSG_NOSIGNAL, NULL, 0) = 51
08:15:52.301296 poll([{fd=2, events=POLLIN}], 1, 5000) = 1 ([{fd=2, 
revents=POLLERR}])
08:15:52.301373 close(2)= 0

Etc. It does that *a lot*. A few times we're seeing brk() with an
increasing value though it's not huge enough to prove everything leaks
there, but it proves that it happens inside openssl, since it's between
a read() performed by openssl and a stat() performed by it as well :

08:16:02.055371 epoll_wait(0, {{EPOLLIN, {u32=6, u64=6}}, {EPOLLIN, {u32=5, 
u64=5}}}, 200, 159) = 2
08:16:02.055457 accept(6, {sa_family=AF_INET, sin_port=htons(13053), 
sin_addr=inet_addr(some-public-address)}, [16]) = 2
08:16:02.00 fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
08:16:02.055658 accept(6, 0x7fff544b3e70, [128]) = -1 EAGAIN (Resource 
temporarily unavailable)
08:16:02.055806 read(2, \200.\1\3\0\0\25\0\0\0\20, 11) = 11
08:16:02.055908 read(2, \0\0\3\0\0\10\0\0\6\4\0\200\0\0\4\0\0\5\233G\314..., 
37) = 37
08:16:02.056005 

RE: HAproxy constant memory leak

2015-02-02 Thread Lukas Tribus
 OpenSSL sometimes acts stupidly like this inside a chroot. We've
 encountered a few issues in the past with openssl doing totally crazy
 stuff inside a chroot, including abort() on krb5-related things. From
 what I understood (others, please correct me if I'm wrong), such
 processing may be altered by the type of key or ciphers.
 
 In my opinion, you should attempt two things :
 
 1) ensure that your ssl library is up to date (double checking doesn't
 cost much)
 
 2) try it again without the chroot statement to see if when openssl finds
 what it's looking for, the leak stops.
 
 3) maybe file a report to the openssl list about a memory leak in that
 exact situation, with the traces you sent to me. Maybe they'll want
 to have your public key as well to verify some assumptions about
 what could be done inside the lib with its properties.

I suppose you can disable KRB5 ciphers to be part of the negotiation by
adding !KRB5 to your cipher configuration. If that works, it is probably
the simplest workaround, unless you actually need KRB5.


Lukas

  

Re: HAproxy constant memory leak

2015-01-31 Thread Georges-Etienne Legendre
The maxconn was set to 4096 before, and after 45 days, haproxy was using
20gigs...

What else could it be?

-- Georges-Etienne

On Fri, Jan 30, 2015 at 1:49 PM, Lukas Tribus luky...@hotmail.com wrote:

 With maxconn 5 this is expected behavior, because haproxy will use
 RAM up to an amount that is justified for 5 concurrent connections.
 Configure maxconn to a proper and real value and the RAM usage will be
 predictable. Lukas



RE: HAproxy constant memory leak

2015-01-31 Thread Lukas Tribus
 The maxconn was set to 4096 before, and after 45 days, haproxy was  
 using 20gigs... 

Ok, can you set maxconn back to 4096, reproduce the leak (to at least
a few gigabytes) and a run show pools a few times to see where
exactly the memory consumption comes from?

Lukas
  


Re: HAproxy constant memory leak

2015-01-31 Thread Willy Tarreau
On Sat, Jan 31, 2015 at 12:59:34AM +0100, Lukas Tribus wrote:
  The maxconn was set to 4096 before, and after 45 days, haproxy was  
  using 20gigs... 
 
 Ok, can you set maxconn back to 4096, reproduce the leak (to at least
 a few gigabytes) and a run show pools a few times to see where
 exactly the memory consumption comes from?

Also, could you please send a network capture of the checks from
the firewall to haproxy (if possible, taken on the haproxy side) ?
It is possible that there is a specific sequence leading to an
improper close (eg: some SSL structs not being released at certain
steps in the handhskake, etc).

Please use this to take your capture :

tcpdump -vs0 -pi eth0 -w checks.cap host firewall-ip and port local-port

Wait for several seconds, then Ctrl-C. Be careful, your capture
will contain all the traffic flowing between haproxy and the
firewall's address facing it, so there might be confidential
information there, only send to the list if you think it's OK.

Ideally, in parallel you can try to strace haproxy during this
capture :

   strace -tts200 -o checks.log -p $(pgrep haproxy)

Thanks,
Willy




RE: HAproxy constant memory leak

2015-01-31 Thread Lukas Tribus
With maxconn 5 this is expected behavior, because haproxy will use
RAM up to an amount that is justified for 5 concurrent connections.

Configure maxconn to a proper and real value and the RAM usage will be
predictable.


Lukas

  

HAproxy constant memory leak

2015-01-28 Thread Georges-Etienne Legendre
Hi all,

We're deploying HAproxy and we're experiencing what apprears to be a memory
leak. After couple of days, HAproxy is consuming gigs of RAM.

Running a ps command returns:

# ps -u nobody u
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
nobody   29960  0.2  1.0 387220 343472 ?   Rs   Jan27   2:40
/usr/sbin/haproxy -D -f /etc/haproxy/haproxy


Yesterday, I've been tracking the VSZ memory and haproxy leaks ~655 bytes
per 30 minutes. Today, we're at 387220 bytes.

14:00 128396
14:10 131624
14:20 134868
14:30 138100


What is totally strange is that we have the same version of HAproxy running
in another environment, and there is no leak there... However, there is
just one difference: In the environment that leaks, there is a firewall
pinging HAproxy every second (TCP open/close). I see that HAproxy is
reporting those ping as request errors (ereq). Here is a trace when running
haproxy with the -d flag:

:web.accept(0005)=0009 from [10.90.19.3:25611]
:web.clicls[0009:]
:web.closed[0009:]
0002:web.accept(0005)=000a from [10.90.19.2:52213]
0002:web.clicls[000a:]
0002:web.closed[000a:]
0004:web.accept(0005)=000a from [10.90.19.2:52357]
0004:web.clicls[000a:]
0004:web.closed[000a:]
0007:web.accept(0005)=000a from [10.90.19.2:52502]
0007:web.clicls[000a:]
0007:web.closed[000a:]


Anyway idea what could be wrong?
Thanks!



Our configuration is pretty simple:

global
log 127.0.0.1 local1 info
chroot /usr/share/haproxy
maxconn 5
uid 99
gid 99
daemon
tune.ssl.default-dh-param 1024
stats socket :1935 level admin
stats timeout 2m

defaults
log global
modehttp
retries 3
timeout connect 5s
timeout client 60s
timeout server 120s
timeout queue 60s
timeout http-request 15s
timeout http-keep-alive 15s
balance roundrobin
option http-keep-alive
option forwardfor
option httpchk OPTIONS /
option httplog clf
option dontlognull

frontend web
bind *:80
bind *:443 ssl crt /etc/pki/tls/wildcardssl.pem

use_backend app_auth if { path_dir app-authnz }
use_backend app_stats if { path_dir app-stats }
use_backend app_search if { path_dir app-search }
use_backend app_settings if { path_dir app-settings }

backend app_auth
reqrep ^([^\ :]*)\ /[^/]*/(.*) \1\ /\2
server fe01 127.0.0.1:4 check
server fe02 10.2.127.144:4 check backup
server fe03 10.2.127.145:4 check backup
server fe04 10.2.127.146:4 check backup


Here is some additional details:

# haproxy -vv
HA-Proxy version 1.5.9 2014/11/25
Copyright 2000-2014 Willy Tarreau w...@1wt.eu

Build options :
  TARGET  = linux26
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing
  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.3
Compression algorithms supported : identity, deflate, gzip
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_TRANSPARENT IP_FREEBIND

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.


--


# telnet localhost 1935
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
show info;show stat;show pools
Name: HAProxy
Version: 1.5.9
Release_date: 2014/11/25
Nbproc: 1
Process_num: 1
Pid: 29960
Uptime: 0d 0h12m25s
Uptime_sec: 745
Memmax_MB: 0
Ulimit-n: 100033
Maxsock: 100033
Maxconn: 5
Hard_maxconn: 5
CurrConns: 1
CumConns: 1630
CumReq: 1684
MaxSslConns: 0
CurrSslConns: 0
CumSslConns: 856
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 4
ConnRateLimit: 0
MaxConnRate: 9
SessRate: 4
SessRateLimit: 0
MaxSessRate: 9
SslRate: 2
SslRateLimit: 0
MaxSslRate: 4
SslFrontendKeyRate: 0
SslFrontendMaxKeyRate: 1
SslFrontendSessionReuse_pct: 100
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 0
SslCacheLookups: 10
SslCacheMisses: 0
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
ZlibMemUsage: 0
MaxZlibMemUsage: 0
Tasks: 18
Run_queue: 1
Idle_pct: 100
node: toro63nsfe01.pf.spop.ca
description:

#