Re: [ANNOUNCE] haproxy 1.4-dev5 with keep-alive :-) memory comsuption
Hi Hank, as I suspected, the problem was with the header capture. They were not released before being erased when clearing the session for a new keep-alive request. I have fixed it in git if you want to try the snapshot again : http://haproxy.1wt.eu/git?p=haproxy.git;a=snapshot;h=6fe60182aa150deec7fd367ad4b574d38dc80356;sf=tgz or, if you just want the patch against this night's snapshot : http://haproxy.1wt.eu/git?p=haproxy.git;a=commitdiff_plain;h=6fe60182aa150deec7fd367ad4b574d38dc80356 I'd like to thank you for the efforts you made to help me troubleshoot this issue. Best regards, Willy
Re: [ANNOUNCE] haproxy 1.4-dev5 with keep-alive :-) memory comsuption
Definitely haproxy process, nothing else runs on there and the older version remains stable for days/weeks: F S UIDPID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 1 S nobody 15547 1 18 80 0 - 1026097 epoll_ 10:54 ? 00:54:30 /usr/sbin/haproxy14d5 -D -f /etc/haproxy/haproxyka.cfg -p /var/run/haproxy.pid -sf 15536 1 S nobody 20631 1 29 80 0 - 17843 epoll_ 13:48 ?00:33:37 /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -sf 15547 On 1/5/10 11:10 PM, Willy Tarreau wrote: On Tue, Jan 05, 2010 at 11:00:30PM -0800, Hank A. Paulson wrote: Using git 034550b7420c24625a975f023797d30a14b80830 "[BUG] stats: show UP/DOWN status also in tracking servers" 6 hours ago... I am still seeing continuous memory consumption (about 1+ GB/hr) at 50-60 Mbps even after the number of connections has stablized: OK. Is this memory used by the haproxy process itself ? If so, could you please send me your exact configuration so that I may have a chance to spot something in the code related to what you use ? A memory leak is something very unlikely in haproxy, though it's not impossible. Everything works with pools which are released when the session closes. But maybe something in this area escaped from my radar (eg: header captures in keep-alive, etc...). 69 CLOSE_WAIT 9 CLOSING 4807 ESTABLISHED 35 FIN_WAIT1 4 FIN_WAIT2 255 LAST_ACK 10 LISTEN 3410 SYN_RECV This one is really impressive. 3410 SYN_RECV basically means you're under a SYN flood, or your network stack is not correctly tuned and you're slowing down your users a lot because they need to wait 3s before retransmitting. Regards, Willy Thanks, we pride ourselves on our huge SYN queue... :)
Re: [ANNOUNCE] haproxy 1.4-dev5 with keep-alive :-) memory comsuption
On Tue, Jan 05, 2010 at 11:00:30PM -0800, Hank A. Paulson wrote: > Using git 034550b7420c24625a975f023797d30a14b80830 > "[BUG] stats: show UP/DOWN status also in tracking servers" 6 hours ago... > > I am still seeing continuous memory consumption (about 1+ GB/hr) at 50-60 > Mbps even after the number of connections has stablized: OK. Is this memory used by the haproxy process itself ? If so, could you please send me your exact configuration so that I may have a chance to spot something in the code related to what you use ? A memory leak is something very unlikely in haproxy, though it's not impossible. Everything works with pools which are released when the session closes. But maybe something in this area escaped from my radar (eg: header captures in keep-alive, etc...). > 69 CLOSE_WAIT > 9 CLOSING >4807 ESTABLISHED > 35 FIN_WAIT1 > 4 FIN_WAIT2 > 255 LAST_ACK > 10 LISTEN >3410 SYN_RECV This one is really impressive. 3410 SYN_RECV basically means you're under a SYN flood, or your network stack is not correctly tuned and you're slowing down your users a lot because they need to wait 3s before retransmitting. Regards, Willy
Re: [ANNOUNCE] haproxy 1.4-dev5 with keep-alive :-) memory comsuption
On 1/4/10 9:15 PM, Willy Tarreau wrote: On Mon, Jan 04, 2010 at 07:05:48PM -0800, Hank A. Paulson wrote: On 1/4/10 2:43 PM, Willy Tarreau wrote: - Maybe this new timeout should have a default value to prevent infinite keep-alive connections. - For this timeout, haproxy could display a warning (at startup) if the value is greater than the client timeout. In fact I think that using http-request by default is fine and even desired. After all, it's the time we accept to keep a connection waiting for a request, which exactly matches that purpose. The ability to have a distinct value for keep-alive is just a bonus. But please do have it as a separate settable timeout for situations like I have on a few servers where 80% or more of the traffic comes from a few IPs and if they have keep alive capability, I am willing to wait a relatively long time (longer than the http request time out) for that connection to send more requests - because the server spent time opening up the tcp window to a good value for decent throughput and don't want to have to start that process over again unnecessarily. That's an interesting point. So basically you're confirming that we don't want a min() of the two values, but rather an override. Hank, if you're interested in trying keep-alive again, please use snapshot 20100105 from here : http://haproxy.1wt.eu/download/1.4/src/snapshot/ The only suspected remaining issue reported by Cyril seems not to be one at first after some tests. I could reproduce the same behaviour but the close_wait connections were the ones pending in the system which got delayed due to SYN_SENT retries and were processed long after the initial one, but all eventually resorbed (at least in my situation). And all the cases you reported with stuck sessions and memory increasing seem to be gone right now. Regards, Willy Using git 034550b7420c24625a975f023797d30a14b80830 "[BUG] stats: show UP/DOWN status also in tracking servers" 6 hours ago... I am still seeing continuous memory consumption (about 1+ GB/hr) at 50-60 Mbps even after the number of connections has stablized: # w;date;free -m;netstat -ant | fgrep -v connections | fgrep -v Proto | awk '{print $6}' | sort | uniq -c 12:51:16 up 10 days, 19:33, 1 user, load average: 0.00, 0.00, 0.00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT root pts/110.x.y.z 09:250.00s 0.23s 0.00s w Wed Jan 6 12:51:16 WIT 2010 total used free sharedbuffers cached Mem: 5129 3718 1411 0 94185 -/+ buffers/cache: 3438 1691 Swap:0 0 0 69 CLOSE_WAIT 9 CLOSING 4807 ESTABLISHED 35 FIN_WAIT1 4 FIN_WAIT2 255 LAST_ACK 10 LISTEN 3410 SYN_RECV 2493 TIME_WAIT # w;date;free -m;netstat -ant | fgrep -v connections | fgrep -v Proto | awk '{print $6}' | sort | uniq -c 13:40:58 up 10 days, 20:23, 1 user, load average: 0.00, 0.00, 0.00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT root pts/110.x.y.z 09:250.00s 0.26s 0.01s w Wed Jan 6 13:40:58 WIT 2010 total used free sharedbuffers cached Mem: 5129 4831298 0 94185 -/+ buffers/cache: 4550578 Swap:0 0 0 86 CLOSE_WAIT 10 CLOSING 4510 ESTABLISHED 40 FIN_WAIT1 7 FIN_WAIT2 390 LAST_ACK 10 LISTEN 3062 SYN_RECV 2256 TIME_WAIT