Re: [ANNOUNCE] haproxy 1.4-dev5 with keep-alive :-) memory comsuption

2010-01-07 Thread Willy Tarreau
Hi Hank,

as I suspected, the problem was with the header capture. They were not
released before being erased when clearing the session for a new keep-alive
request. I have fixed it in git if you want to try the snapshot again :

   
http://haproxy.1wt.eu/git?p=haproxy.git;a=snapshot;h=6fe60182aa150deec7fd367ad4b574d38dc80356;sf=tgz

or, if you just want the patch against this night's snapshot :

   
http://haproxy.1wt.eu/git?p=haproxy.git;a=commitdiff_plain;h=6fe60182aa150deec7fd367ad4b574d38dc80356

I'd like to thank you for the efforts you made to help me troubleshoot this
issue.

Best regards,
Willy




Re: [ANNOUNCE] haproxy 1.4-dev5 with keep-alive :-) memory comsuption

2010-01-06 Thread Hank A. Paulson
Definitely haproxy process, nothing else runs on there and the older version 
remains stable for days/weeks:


F S UIDPID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY  TIME CMD
1 S nobody   15547 1 18  80   0 - 1026097 epoll_ 10:54 ?  00:54:30 
/usr/sbin/haproxy14d5 -D -f /etc/haproxy/haproxyka.cfg -p /var/run/haproxy.pid 
-sf 15536
1 S nobody   20631 1 29  80   0 - 17843 epoll_ 13:48 ?00:33:37 
/usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -sf 15547


On 1/5/10 11:10 PM, Willy Tarreau wrote:

On Tue, Jan 05, 2010 at 11:00:30PM -0800, Hank A. Paulson wrote:

Using git 034550b7420c24625a975f023797d30a14b80830
"[BUG] stats: show UP/DOWN status also in tracking servers" 6 hours ago...

I am still seeing continuous memory consumption (about 1+ GB/hr) at 50-60
Mbps even after the number of connections has stablized:


OK. Is this memory used by the haproxy process itself ?
If so, could you please send me your exact configuration so that
I may have a chance to spot something in the code related to what
you use ? A memory leak is something very unlikely in haproxy, though
it's not impossible. Everything works with pools which are released
when the session closes. But maybe something in this area escaped
from my radar (eg: header captures in keep-alive, etc...).


  69 CLOSE_WAIT
   9 CLOSING
4807 ESTABLISHED
  35 FIN_WAIT1
   4 FIN_WAIT2
 255 LAST_ACK
  10 LISTEN
3410 SYN_RECV


This one is really impressive. 3410 SYN_RECV basically means you're
under a SYN flood, or your network stack is not correctly tuned and
you're slowing down your users a lot because they need to wait 3s
before retransmitting.

Regards,
Willy


Thanks, we pride ourselves on our huge SYN queue...   :)



Re: [ANNOUNCE] haproxy 1.4-dev5 with keep-alive :-) memory comsuption

2010-01-05 Thread Willy Tarreau
On Tue, Jan 05, 2010 at 11:00:30PM -0800, Hank A. Paulson wrote:
> Using git 034550b7420c24625a975f023797d30a14b80830
> "[BUG] stats: show UP/DOWN status also in tracking servers" 6 hours ago...
> 
> I am still seeing continuous memory consumption (about 1+ GB/hr) at 50-60 
> Mbps even after the number of connections has stablized:

OK. Is this memory used by the haproxy process itself ?
If so, could you please send me your exact configuration so that
I may have a chance to spot something in the code related to what
you use ? A memory leak is something very unlikely in haproxy, though
it's not impossible. Everything works with pools which are released
when the session closes. But maybe something in this area escaped
from my radar (eg: header captures in keep-alive, etc...).

>  69 CLOSE_WAIT
>   9 CLOSING
>4807 ESTABLISHED
>  35 FIN_WAIT1
>   4 FIN_WAIT2
> 255 LAST_ACK
>  10 LISTEN
>3410 SYN_RECV

This one is really impressive. 3410 SYN_RECV basically means you're
under a SYN flood, or your network stack is not correctly tuned and
you're slowing down your users a lot because they need to wait 3s
before retransmitting.

Regards,
Willy




Re: [ANNOUNCE] haproxy 1.4-dev5 with keep-alive :-) memory comsuption

2010-01-05 Thread Hank A. Paulson

On 1/4/10 9:15 PM, Willy Tarreau wrote:

On Mon, Jan 04, 2010 at 07:05:48PM -0800, Hank A. Paulson wrote:

On 1/4/10 2:43 PM, Willy Tarreau wrote:

- Maybe this new timeout should have a default value to prevent infinite
keep-alive connections.
- For this timeout, haproxy could display a warning (at startup) if the
value is greater than the client timeout.


In fact I think that using http-request by default is fine and even
desired.
After all, it's the time we accept to keep a connection waiting for a
request,
which exactly matches that purpose. The ability to have a distinct value
for
keep-alive is just a bonus.


But please do have it as a separate settable timeout for situations like I
have on a few servers where 80% or more of the traffic comes from a few IPs
and if they have keep alive capability, I am willing to wait a relatively
long time (longer than the http request time out) for that connection to
send more requests - because the server spent time opening up the tcp
window to a good value for decent throughput and don't want to have to
start that process over again unnecessarily.


That's an interesting point. So basically you're confirming that we don't
want a min() of the two values, but rather an override.

Hank, if you're interested in trying keep-alive again, please use snapshot
20100105 from here :

http://haproxy.1wt.eu/download/1.4/src/snapshot/

The only suspected remaining issue reported by Cyril seems not to be one
at first after some tests. I could reproduce the same behaviour but the
close_wait connections were the ones pending in the system which got
delayed due to SYN_SENT retries and were processed long after the initial
one, but all eventually resorbed (at least in my situation). And all the
cases you reported with stuck sessions and memory increasing seem to be
gone right now.

Regards,
Willy


Using git 034550b7420c24625a975f023797d30a14b80830
"[BUG] stats: show UP/DOWN status also in tracking servers" 6 hours ago...

I am still seeing continuous memory consumption (about 1+ GB/hr) at 50-60 Mbps 
even after the number of connections has stablized:


# w;date;free -m;netstat -ant | fgrep -v connections | fgrep -v Proto | awk 
'{print $6}' | sort | uniq -c


12:51:16 up 10 days, 19:33,  1 user,  load average: 0.00, 0.00, 0.00
USER TTY  FROM  LOGIN@   IDLE   JCPU   PCPU WHAT
root pts/110.x.y.z  09:250.00s  0.23s  0.00s w
Wed Jan  6 12:51:16 WIT 2010

 total   used   free sharedbuffers cached
Mem:  5129   3718   1411  0 94185
-/+ buffers/cache:   3438   1691
Swap:0  0  0

 69 CLOSE_WAIT
  9 CLOSING
   4807 ESTABLISHED
 35 FIN_WAIT1
  4 FIN_WAIT2
255 LAST_ACK
 10 LISTEN
   3410 SYN_RECV
   2493 TIME_WAIT


# w;date;free -m;netstat -ant | fgrep -v connections | fgrep -v Proto | awk 
'{print $6}' | sort | uniq -c


13:40:58 up 10 days, 20:23,  1 user,  load average: 0.00, 0.00, 0.00
USER TTY  FROM  LOGIN@   IDLE   JCPU   PCPU WHAT
root pts/110.x.y.z  09:250.00s  0.26s  0.01s w
Wed Jan  6 13:40:58 WIT 2010

 total   used   free sharedbuffers cached
Mem:  5129   4831298  0 94185
-/+ buffers/cache:   4550578
Swap:0  0  0

 86 CLOSE_WAIT
 10 CLOSING
   4510 ESTABLISHED
 40 FIN_WAIT1
  7 FIN_WAIT2
390 LAST_ACK
 10 LISTEN
   3062 SYN_RECV
   2256 TIME_WAIT