Re: HAProxy - 504 Gateway Timeout error.

2011-07-18 Thread gidot
Willy Tarreau w at 1wt.eu writes:

 
 Hi,
 
 On Thu, Jul 14, 2011 at 10:21:56AM +, gidot wrote:
  Thanks Willy,
  
  I managed to fix the problem with the log. It's due to my entry in /etc/
  syslogd.conf. This thread http://www.serverphorums.com/read.php?
  10,127228,127867 helped me out :).
  
  Here is the excerpt of my haproxy.log. Hope someone can enlighten me if 
there's 
  anything obvious from this log that can help me to troubleshoot my problem. 
  Here's from grep'ing 504:
  
  Jul 13 20:37:57 localhost haproxy[98507]: 213.47.109.71:51261 [13/
  Jul/2011:20:37:07.967] webjailfarm webjailfarm/wj01 7/0/0/-1/+50009 504 
+194 - 
  - sHVN 62/62/52/12/0 0/0 GET /main.php?location=war HTTP/1.1 
 (...)
 
 All of these logs indicate that the server is simply not responding within
 50 seconds. As frustrating as this can be, this is something quite common
 when servers get overloaded or when they try to access a locked resource.
 The two following ones however are more concerning :
 
  Jul 13 20:41:25 localhost haproxy[98507]: 188.123.218.31:4180 [13/
  Jul/2011:20:40:35.132] webjailfarm webjailfarm/wj08 2/0/0/-1/+50005 504 
+194 - 
  - sHVN 78/78/61/4/0 0/0 GET /images/gamefavicons.png HTTP/1.1 
  Jul 13 20:43:36 localhost haproxy[98507]: 217.246.8.81:2284 [13/
  Jul/2011:20:42:46.796] webjailfarm webjailfarm/wj05 1/0/0/-1/+50003 504 
+194 - 
  - sHVN 56/56/46/0/0 0/0 GET /emptyicon.gif HTTP/1.1 
 
 I think that such resources are purely static and have no reason not to be
 quickly delivered. Is there any possibility that the same servers are
 accessed via other backend sections, or even directly without passing via
 haproxy ? I'm asking because what I suspect is that the server's connection
 limit is reached due to other activity, but the listening socket is not yet
 saturated, so our request lies in the server's backlog until a connection
 is released so that a process (or thread) can process the pending request
 (which did not happen in time here).
 
  And some others:
  
  Jul 13 20:38:08 localhost haproxy[98507]: 89.228.101.118:51199 [13/
  Jul/2011:20:38:08.893] webjailfarm webjailfarm/wj05 0/0/-1/-1/+1 503 +212 - 
- 
  SCDN 44/44/37/7/+3 0/0 POST /login.php HTTP/1.1 
 
 The connection was referencing a server which was already detected as DOWN
 (hence the D flag), so the health checks have noticed the event. The
 connection was redispatched onto another server (wj05) but the connection
 failed there. It could be the same thing as above, but with the backlog full,
 so the system is rejecting extra connections instead of queuing them. It
 could also be that you restarted the server and the connections were attempted
 while the port was not bound.
 
  Jul 13 20:38:14 localhost haproxy[98507]: 188.101.27.150:61567 [13/
  Jul/2011:20:38:14.883] webjailfarm webjailfarm/wj06 13/0/1/-1/+14 502 +204 
- - 
  SHVN 53/53/47/7/0 0/0 GET /js/scriptaculous.js?load=effects,slider 
HTTP/1.1 
 
 The 502s normally indicate that the server broke the connection without
 responding. This can be the consequence of a server restart as well as it
 can indicate dying processes.
 
  At the moment we're still having problems with clients receiving 502 and 
504 
  errors. It was quiet for the first few days after we have tuned the box, 
but 
  since 2 days ago, they're back.
 
 If you check your stats page, you should see that your servers state are
 changing a lot. A server must not flap, it must have a steady state. In
 my opinion, the fact that they're seen down is not the cause of the problem
 but one of the consequences : something is blocking your servers or making
 them process requests slowly and at one point they can't even process health
 checks anymore. Requests are aborted on timeouts and checks fail, causing
 the server to be marked down.
 
 This is normally what happens when servers connection limit gets overflown.
 You may want to try to increase your MaxClients or equivalent. Be careful
 though, as this can imply a higher memory usage.
 
 Another solution people generally like is to split dynamic/static contents,
 which is called content switching. You build a farm out of a very fast and
 scalable server such as nginx and send the static requests there. You keep
 the rest on current servers, the load should drop quite a bit.
 
  Btw, I tried to run the command echo show sess | socat stdio /var/run/
haproxy/
  haproxy.sock, and some entries show that it's not forwarding to any server 
  (none). Is this normal?
  
  [/root] # echo show sess | socat stdio /var/run/haproxy/haproxy.sock
  0x800fbfc00: proto=tcpv4 src=178.190.178.184:52094 fe=webjailfarm 
  be=webjailfarm srv=none ts=02 age=46s calls=1 rq
  [f=501000h,l=0,an=0eh,rx=3s,wx=,ax=] rp[f=001000h,l=0,an=00h,rx=,wx=,ax=] 
s0=
  [7,18h,fd=55,ex=] s1=[0,0h,fd=-1,ex=] exp=3s
 (...)
 
 It is normal for connections which have not yet sent a full request. In your
 case, the request buffer is empty so nothing was received from the client.
 Until you don't see 

Re: nbproc1, ksoftirqd and response time - just can't get it

2011-07-18 Thread John Helliwell
Are you running irqbalance? It may help distribute network interface interrupts

Sent from my iPhone

On 18 Jul 2011, at 03:42, Dmitriy Samsonov dmitriy.samso...@gmail.com wrote:

 My test setup is three Dell r410 servers (dual Intel(R) Xeon(R) CPU X5650  @ 
 2.67GHz - 24 threads total, 128Gb RAM) all connected to 1Gbps network.
 
 One server is haproxy, configured to block all requests with 
 'Accept-Encoding: none':
 
 global
   daemon
   maxconn 8
   option forwardfor
   retries 10
 
 frontend public
 bind 192.168.0.1:80
 default_backend nginx
 acl   accepts_none hdr(Accept-Encoding) -i none
 errorfile 403 /raid/emptypage.txt
 block if accepts_none
 
 backend nginx
   server srv 127.0.0.1:80 maxconn 8192
 
 File /raid/emptypage.txt is an empty file made with 'touch 
 /raid/emptypage.txt'.
 
 I'm doing ab2 -c 1000 -H 'Accept-Encoding: None' -n 100 
 http://192.168.0.1/ on two other servers and get following:
 
 When nbproc = 1 haproxy saturates 100% of cpu core it runs at, but server is 
 running nice, I'm able to get reply from nginx behind by using curl on my 
 machine: curl http://192.168.0.1/, ab reports 16833 requests/second each and 
 longest request is around 14seconds.
 
 When I change nbproc to higher values (maximum is 24 as there 24 threads 
 total) I can see ksoftirq/0 process saturating cpu core, network becomes slow 
 on server, ab reports same 16k-17k requests/second for each client, but 
 longest request is always around 20-30 seconds. 
 
 I've seen such things with ksoftirq/0 running at 100% and network is almost 
 down during DDoS attacks in case of too many iptables rules but what is 
 happening now? And what number to use at nbproc? Is it ok to have haproxy 
 running at 100%? It looks like I can have 30k requests per second in my 
 setup, is there any way to make it higher? I've done some basic tuning like 
 tcp_max_tw_buckets = 1024*1024, tcp_tw_reuse = 1, tcp_max_syn_backlog = 
 3. Am I running out of options?


Re: [PATCH 0/6] Free memory on exit

2011-07-18 Thread Willy Tarreau
On Fri, Jul 15, 2011 at 01:14:05PM +0900, Simon Horman wrote:
 The motivation for this is that when soft-restart is merged
 it will be come more important to free all relevant memory in deinit(

All applied, thank you Simon !

Willy




Re: nbproc1, ksoftirqd and response time - just can't get it

2011-07-18 Thread Willy Tarreau
On Mon, Jul 18, 2011 at 08:54:15AM +0100, John Helliwell wrote:
 Are you running irqbalance? It may help distribute network interface 
 interrupts

It's often worse for low-latency processing such as haproxy. irqbalance is
nice when each interrupt induces very long CPU processing, but here we have
almost a 1-to-1 affinity between packets and processed requests (especially
in keep-alive mode), and hard-fixing interrupts and CPU affinity always yields
to best results, and by far.

Regards,
Willy




unsubscribe

2011-07-18 Thread Sergio Toledo



Thoughts on the master-worker model

2011-07-18 Thread Willy Tarreau
Hi Simon !

Last week I could find some time with a quiet place at work to dig again
into the master-worker patch series.

I went back to what we discussed a few weeks ago concerning the pid management
and the cmdline parsing which must be performed only once, until I realized
that I was having trouble again with pid management due to two things that
we cannot easily cover :
  - a new process starting with -sf which would replace the old processes
running in master/worker mode.

  - binding issues within the master process when reloading a configuration

The first point causes a simple issue : if we start a new process which wants
to replace the old ones that were working in master-worker mode, it will send
them a SIGTTOU+SIGUSR1. The issue there is that the master should release all
of its listening sockets, forward the signal to all children then passively
wait for all of them to leave (including old ones). This is not undoable but
adds a bit of complexity, which I was planning on implementing anyway.

The second point is more complex. I realized that there normally is the same
config in the master process and in all of its active children. What happens
if the master fails to reload a configuration ? I'm thinking about the worst
issues, the one related to binding IP:ports, which can only be detected by an
active process. If the issue is just a conflict with another socket, then
surely the new process can restart with -sf instead and get rid of the issue
(back to point 1). But still, the issue of keeping a master process alive with
a config that has nothing to do with what its children are doing is messy at
best, and very dangerous.

My long-term solution would be to have everything related to the configuration
behind a pointer, and have that pointer referenced in sessions, health checks
etc... That way, each session would use the config it was instanciated with
and it would be much easier to allow an old and a new conf to coexist. It
would not completely fix the issues with some global settings though (eg:
nbproc, pollers, tuning options, ...) but it would be a gain. That way, if
the new config fails, we can switch pointers back to old config and forget
everything that was attempted.

Then my thinking went a bit further : before doing what is described above,
we could have the current master fork a new master which would handle the
new config and issue new processes. If for any reason that new process fails
to start, it simply disappears and nothing changes. What I like with this
method is that it also implicitly allows changing many global settings, even
the master-worker mode may be changed. New sessions would simply use the new
config from the new process and old sessions would remain on the old one. The
only downside I can think of is that it will never provide any possibility to
maintain stats or any state between the old and the new process, but that's
secondary as the master-worker model is not meant for that either.

Another advantage I was seeing to forking a new master from the old one was
that we could probably keep the soft-restart semantics for the situations
where the new process cannot bind : it could send a SIGTTOU to old processes
to relinquish the ports and send a SIGTTIN in case of failure.

That thinking gave me another idea that I have not developped yet. The core
of your work is the socket cache and is what makes the system reliable. One
possibility would that the master doesn't own any configuration at all, just
the sockets. It would be the new processes that would connect to the socket
cache to grab some ports, then fork the new workers as it is done today.
I must say I'm not completely at ease with such a model because I think that
an instanciator is needed, but I like the idea of an autonomous socket cache,
which becomes sort of an interface between haproxy and the kernel. It will
also help if one day we want to implement FTP support, as we'll have to be
able to bind outgoing sockets to local port 20, and that could be performed
by the central socket cache.

With all this in mind, I think we need to discuss a bit more before going
back to the keyboard. Given that a number of bugs have been fixed since
1.5-dev6, I'll probably issue -dev7 soon and that should not stop us from
trying to elaborate a model that suits all needs.

As usual, I'm very interested in getting your insights, comments, opinions,
ideas, etc...

Best regards,
Willy




Re: nbproc1, ksoftirqd and response time - just can't get it

2011-07-18 Thread Dmitriy Samsonov
Hi!

2011/7/18 Willy Tarreau w...@1wt.eu

 Hi,

 On Mon, Jul 18, 2011 at 06:42:54AM +0400, Dmitriy Samsonov wrote:
  My test setup is three Dell r410 servers (dual Intel(R) Xeon(R) CPU X5650
  @
  2.67GHz - 24 threads total, 128Gb RAM) all connected to 1Gbps network.
 
  One server is haproxy, configured to block all requests with
  'Accept-Encoding: none':
 
  global
   daemon
  maxconn 8
  option forwardfor
   retries 10

 Something is missing here above, probably defaults and maybe a few other
 options before option forwardfor, such as mode http and even a few
 timeouts.


Yes, section defaults is the follwing:
defaults
mode http
maxconn 79500
timeout client 20s
timeout server 15s
timeout queue  60s
timeout connect 4s
timeout http-request 5s
timeout http-keep-alive 250
#option httpclose
option abortonclose
balance roundrobin
option forwardfor
retries 10

Also there was conntrack module loaded - friend of mine was playing with
iptables and did not remove it. Now there is no iptables at all:

dex9 ipv4 # sysctl -a | grep conntrack | wc -l
0
dex9 ipv4 # lsmod | grep xt_ | wc -l
0
dex9 ipv4 # lsmod | grep nf_ | wc -l
0

I followed your recommendation and set affinity for processes:
dex9 ipv4 # schedtool 16638 # haproxy's process
PID 16638: PRIO   0, POLICY N: SCHED_NORMAL, NICE   0, AFFINITY 0x2

dex9 ipv4 # schedtool 3  # ksoftirqd/0
PID 3: PRIO   0, POLICY N: SCHED_NORMAL, NICE   0, AFFINITY 0x1

Now in top it looks like this:
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

16638 root  20   0  100m  36m  520 R   93  0.0  19:33.18 haproxy

3 root  20   0 000 R   38  0.0   7:11.76 ksoftirqd/0


93% percent for haproxy and 38% for ksoftirqd/0

I was lucky enough to reach 66903 session rate (max) and average value for
Cur is around 40-42k.

Typical output of one of two ab2 running is:

Server Software:
Server Hostname:nohost
Server Port:80

Document Path:  /
Document Length:0 bytes

Concurrency Level:  1000
Time taken for tests:   470.484 seconds
Complete requests:  1000
Failed requests:0
Write errors:   0
Total transferred:  0 bytes
HTML transferred:   0 bytes
Requests per second:21254.72 [#/sec] (mean)
Time per request:   47.048 [ms] (mean)
Time per request:   0.047 [ms] (mean, across all concurrent requests)
Transfer rate:  0.00 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:0   34 275.9 11   21086
Processing: 0   13  17.8 11 784
Waiting:00   0.0  0   0
Total:  2   47 276.9 22   21305

Percentage of the requests served within a certain time (ms)
  50% 22
  66% 26
  75% 28
  80% 30
  90% 37
  95% 41
  98% 47
  99%266
 100%  21305 (longest request)

Typical output of vmstat is:
dex9 ipv4 # vmstat 1
procs ---memory-- ---swap-- -io -system--
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
wa
 1  0  0 131771328  46260  6401600 2 1  865  503  1  6
94  0
 1  0  0 131770688  46260  6402400 0 0 40496 6323  1  9
90  0
 1  0  0 131771920  46260  6402400 0 0 47046 7212  1  8
91  0
 2  0  0 131771704  46260  6402400 0 0 40864 6143  1  8
91  0
 1  0  0 131771688  46268  6402400 012 36547 5268  1  8
91  0
 2  0  0 131771696  46268  6403200 0 0 53189 8979  0  5
95  0
 3  0  0 131771584  46268  6402400 0 0 31633 4025  0  4
96  0
 2  0  0 131771840  46268  6402000 0 0 49723 9290  1  9
91  0
 2  0  0 131772448  46268  6402800 0 0 44484 7008  1  8
91  0
 2  0  0 131771688  46276  6403200 020 40132 4531  1  8
92  0
 2  0  0 131771456  46276  6402800 0 0 36006 4445  0  8
91  0
 2  0  0 131772208  46276  6402400 0 0 41325 5902  1  8
91  0
 2  0  0 131771208  46276  6403200 0 0 44262 7427  1  5
94  0
 1  0  0 131771456  46276  6402800 0 0 42403 5422  1  8
91  0
 2  0  0 131771944  46284  6402000 012 46907 7419  1  7
93  0
 1  0  0 131772960  46284  6402000 0 0 42772 6663  1  8
91  0
 2  0  0 131772832  46284  6402400 0 0 45298 6695  1  8
91  0
 2  0  0 131772712  46284  6402800 0 0 44604 5361  0  6
94  0
 1  0  0 131772464  46284  6402800 0 0 39798 5105  0  4
96  0

Also, I've checked version of NIC's firmware:
dex9 ipv4 # ethtool -i eth0
driver: bnx2
version: 2.0.21
firmware-version: 6.2.12 bc 5.2.3
bus-info: :01:00.0


Moreover, I've tried launching two ab2 localy:
dex9 ipv4 # ab2 -c 1000 -H 

Re: nbproc1, ksoftirqd and response time - just can't get it

2011-07-18 Thread Willy Tarreau
Hi Dmitriy,

On Mon, Jul 18, 2011 at 10:01:47PM +0400, Dmitriy Samsonov wrote:
 defaults
 mode http
 maxconn 79500
 timeout client 20s
 timeout server 15s
 timeout queue  60s
 timeout connect 4s
 timeout http-request 5s
 timeout http-keep-alive 250
 #option httpclose
 option abortonclose
 balance roundrobin
 option forwardfor

OK, by using option http-server-close, you'll benefit from an active
close to the servers, which will improve things a lot when using real
servers. Still it will not change anything in your tests.

 retries 10
 
 Also there was conntrack module loaded - friend of mine was playing with
 iptables and did not remove it. Now there is no iptables at all:
 
 dex9 ipv4 # sysctl -a | grep conntrack | wc -l
 0
 dex9 ipv4 # lsmod | grep xt_ | wc -l
 0
 dex9 ipv4 # lsmod | grep nf_ | wc -l
 0

OK fine.

 I followed your recommendation and set affinity for processes:
 dex9 ipv4 # schedtool 16638 # haproxy's process
 PID 16638: PRIO   0, POLICY N: SCHED_NORMAL, NICE   0, AFFINITY 0x2
 
 dex9 ipv4 # schedtool 3  # ksoftirqd/0
 PID 3: PRIO   0, POLICY N: SCHED_NORMAL, NICE   0, AFFINITY 0x1

I'm not sure that applying schedtool to kernel threads has any effect.
Normally you should echo 1  /proc/irq/XXX/smp_affinity to force
interrupts to a specific core.

 Now in top it looks like this:
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 
 16638 root  20   0  100m  36m  520 R   93  0.0  19:33.18 haproxy
 
 3 root  20   0 000 R   38  0.0   7:11.76 ksoftirqd/0
 
 
 93% percent for haproxy and 38% for ksoftirqd/0

OK. As long as you don't reach 100%, there's something perturbating the
tests. Possibly that your IRQs are spread over all cores.

 I was lucky enough to reach 66903 session rate (max) and average value for
 Cur is around 40-42k.

Fine, this is a lot better now. Since you're running at 2000 concurrent
connections, the impact on the cache is noticeable (at 32kB per connection
for haproxy, it's 64MB of RAM possibly touched each second, maybe only 16MB
since requests are short and fit in a single page). Could you recheck at
only 250 concurrent connections in total (125 per ab) ? This usually is
the optimal point I observe. I'm not saying that it should be your target,
but we're chasing the issues :-)

 Typical output of one of two ab2 running is:
 
 Server Software:
 Server Hostname:nohost
 Server Port:80
 
 Document Path:  /
 Document Length:0 bytes
 
 Concurrency Level:  1000
 Time taken for tests:   470.484 seconds
 Complete requests:  1000
 Failed requests:0
 Write errors:   0
 Total transferred:  0 bytes
 HTML transferred:   0 bytes
 Requests per second:21254.72 [#/sec] (mean)
 Time per request:   47.048 [ms] (mean)
 Time per request:   0.047 [ms] (mean, across all concurrent requests)
 Transfer rate:  0.00 [Kbytes/sec] received
 
 Connection Times (ms)
   min  mean[+/-sd] median   max
 Connect:0   34 275.9 11   21086

This one means there is packet loss on SYN packets. Some requests
take up to 4 SYN to pass (0+3+6+9 seconds). Clearly something is
wrong, either on the network or more likely net.core.somaxconn.
You have to restart haproxy after you change this default setting.

Does dmesg say anything on either the clients or the proxy machine ?

 Processing: 0   13  17.8 11 784
 Waiting:00   0.0  0   0
 Total:  2   47 276.9 22   21305
 
 Percentage of the requests served within a certain time (ms)
   50% 22
   66% 26
   75% 28
   80% 30
   90% 37
   95% 41
   98% 47
   99%266
  100%  21305 (longest request)
 
 Typical output of vmstat is:
 dex9 ipv4 # vmstat 1
 procs ---memory-- ---swap-- -io -system--
 cpu
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
 wa
  1  0  0 131771328  46260  6401600 2 1  865  503  1  6
 94  0
  1  0  0 131770688  46260  6402400 0 0 40496 6323  1  9
 90  0

OK, so 1% user, 9% system, 90% idle, 0% wait at 40k int/s. Since this is
scaled to 100% for all cores, it means that we're saturating a core in the
system (which is expected with short connections).

I don't remember if I asked you what version of haproxy and what kernel you
were using. Possibly that some TCP options can improve things a bit.

 Also, I've checked version of NIC's firmware:
 dex9 ipv4 # ethtool -i eth0
 driver: bnx2
 version: 2.0.21
 firmware-version: 6.2.12 bc 5.2.3
 bus-info: :01:00.0

OK, let's hope it's fine. I remember having seen apparently good results
with version 4.4 from what I recall, so this one should be OK.

 Moreover, I've tried launching two ab2 localy:
 dex9 ipv4 # ab2 -c 1000 -H 'Accept-Encoding: None' -n 1000
 http://localweb/
 This 

haproxy response buffering

2011-07-18 Thread P.R.
Hi all,
I am having a similar problem as this guy:
http://forums.rightscale.com/showthread.php?t=665

Basically, when sending chunked response, and not ending the connection,
haproxy will buffer initial chunks for 3 seconds, after that chunks go out
instantly.

I tinkered with buffer size settings (tune.*) but that didn't help. One
solution to work is manually sending about 8k.

How to fix this the right way? Thanks.


Re: haproxy response buffering

2011-07-18 Thread Willy Tarreau
Hi,

On Mon, Jul 18, 2011 at 11:46:12PM +0300, P.R. wrote:
 Hi all,
 I am having a similar problem as this guy:
 http://forums.rightscale.com/showthread.php?t=665
 
 Basically, when sending chunked response, and not ending the connection,
 haproxy will buffer initial chunks for 3 seconds, after that chunks go out
 instantly.

It does not specifically buffer for any length of time, but sets the MSG_MORE
flag on output so that the system avoids sending incomplete frames.

 I tinkered with buffer size settings (tune.*) but that didn't help. One
 solution to work is manually sending about 8k.
 
 How to fix this the right way? Thanks.

Please note that the right way to fix it is to use HTTP the correct way.
There must be no assumption on the way chunks are delivered, and any
intermediary between the server and the client is free to re-chunk them
and even to wait for the whole message to be buffered before sending it
at once. So fixing it the right way means fixing the application or the
way it's used.

One workaround for such applications was recently introduced in haproxy.
You can use option http-no-delay either in the frontend or the backend,
and it will instruct the system not to buffer requests/responses and to
send them without waiting. This means that incomplete TCP segments might
be sent over the network and that more ACKs will be sent by the client,
but it can allow you to get a more acceptable behaviour during the time
it takes to fix the application.

Please also note that doing so will only workaround the problem at the
haproxy level, but will not change anything for other network components
along the path. Any proxy (transparent or no) may do the same thing.

If you're interested on the subject, this wrong use has been explicitly
covered by the HTTP-bis working group, because such breakage has already
been reported with other applications and intermediaries :

   http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-15#section-2.2

Regards,
Willy




Unreliable soft-restarts on Ubuntu 10.04 with haproxy 1.4.8 and 1.4.15

2011-07-18 Thread Jonathan Simms
Hello all,

I've been tearing my hair out trying to get soft-restarts to work reliably, and
have been frustrated for about a week solid. I'm running 1.4.15 on Ubuntu 10.04
amd64, kernel version 2.6.32-32-generic.

The behavior I'm seeing is that -sf restarts will work sporadically from the
command-line.

it prints the output:

[ALERT] 198/213054 (23213) : Starting frontend http-in: cannot bind socket
[ALERT] 198/213054 (23213) : Starting frontend openstack-ssl-in:
cannot bind socket
[ALERT] 198/213054 (23213) : Starting frontend mother-ssl-in: cannot bind socket
[ALERT] 198/213054 (23213) : Starting frontend authoritae-ssl-in:
cannot bind socket
[ALERT] 198/213054 (23213) : Starting frontend encoders-in: cannot bind socket
[ALERT] 198/213054 (23213) : Starting frontend encoders-v2-in: cannot
bind socket
[ALERT] 198/213054 (23213) : Starting frontend redis-in: cannot bind socket


An strace of the restart shows:

17492 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 5
17492 fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
17492 setsockopt(5, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
17492 setsockopt(5, SOL_SOCKET, 0xf /* SO_??? */, [1], 4) = -1
ENOPROTOOPT (Protocol not available)
17492 bind(5, {sa_family=AF_INET, sin_port=htons(6379),
sin_addr=inet_addr(0.0.0.0)}, 16) = -1 EADDRINUSE (Address already in 
use)
17492 close(5)  = 0
17492 kill(13512, SIGTTOU)  = 0


It does this for every frontend I have defined.


my config is:

global
maxconn 32768
pidfile /var/run/haproxy.pid
stats socket /var/run/haproxy-admin.sock uid 0 gid 0 mode 600
daemon
user haproxy
group haproxy
log localhost daemon emerg debug
spread-checks 3

defaults
mode http
retries 3
option redispatch
maxconn 32768
contimeout 5000ms
clitimeout 30ms
srvtimeout 300ms

# backend definitions elided, all of the default_backend lines below
# point to valid backend names

frontend http-in
bind 0.0.0.0:80
monitor-uri /_haproxy_check
default_backend mother

frontend openstack-ssl-in
bind 0.0.0.0:8443
mode tcp
default_backend openstack-ssl-swift

frontend mother-ssl-in
bind 0.0.0.0:11443
mode tcp
default_backend mother-ssl

frontend authoritae-ssl-in
bind 0.0.0.0:10443
mode tcp
default_backend authoritae-ssl

frontend encoders-in
bind 0.0.0.0:3000
default_backend encoders

frontend encoders-v2-in
bind 0.0.0.0:3002
default_backend encoders-v2

frontend redis-in
bind 0.0.0.0:6379
mode tcp
default_backend redis

# -

output of haproxy -vv

[20110718-21.36.25]# haproxy -vv
HA-Proxy version 1.4.8 2010/06/16
Copyright 2000-2010 Willy Tarreau w...@1wt.eu

Build options :
  TARGET  = linux26
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g
  OPTIONS = USE_LINUX_SPLICE=1 USE_LINUX_TPROXY=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes

Available polling systems :
 sepoll : pref=400,  test result OK
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 4 (4 usable), will use sepoll.


I've been going totally crazy trying to get this working. The exact same build
of haproxy on debian lenny seems to restart reliably. I've tried this same
config with 1.4.15 and it exhibits the same behavior. Is there some sysctl I
need to set or something?

Help would be greatly appreciated.


Thanks very much,
Jonathan



Re: Unreliable soft-restarts on Ubuntu 10.04 with haproxy 1.4.8 and 1.4.15

2011-07-18 Thread Willy Tarreau
On Mon, Jul 18, 2011 at 05:46:55PM -0400, Jonathan Simms wrote:
 Hello all,
 
 I've been tearing my hair out trying to get soft-restarts to work reliably, 
 and
 have been frustrated for about a week solid. I'm running 1.4.15 on Ubuntu 
 10.04
 amd64, kernel version 2.6.32-32-generic.
 
 The behavior I'm seeing is that -sf restarts will work sporadically from the
 command-line.
 
 it prints the output:
 
 [ALERT] 198/213054 (23213) : Starting frontend http-in: cannot bind socket
 [ALERT] 198/213054 (23213) : Starting frontend openstack-ssl-in:
 cannot bind socket
 [ALERT] 198/213054 (23213) : Starting frontend mother-ssl-in: cannot bind 
 socket
 [ALERT] 198/213054 (23213) : Starting frontend authoritae-ssl-in:
 cannot bind socket
 [ALERT] 198/213054 (23213) : Starting frontend encoders-in: cannot bind socket
 [ALERT] 198/213054 (23213) : Starting frontend encoders-v2-in: cannot
 bind socket
 [ALERT] 198/213054 (23213) : Starting frontend redis-in: cannot bind socket

Wow this is very concerning, this bug was a regression in 2.6.38, and it looks
like it was backported to their 2.6.32-32 kernel ! It was fixed in 2.6.38.8 if
my memory serves me right. It seems rather strange they'd backport a patch that
is known to break some products (haproxy and amavis are at least two identified
victims), but it's possible they didn't notice the later fix.

It would be nice to know what exact kernel their 2.6.32 is based on, and ideally
what patches were applied on top of that.

Regards,
Willy




Re: Unreliable soft-restarts on Ubuntu 10.04 with haproxy 1.4.8 and 1.4.15

2011-07-18 Thread Jonathan Simms
On Mon, Jul 18, 2011 at 6:06 PM, Willy Tarreau w...@1wt.eu wrote:
 On Mon, Jul 18, 2011 at 05:46:55PM -0400, Jonathan Simms wrote:
 Hello all,

 I've been tearing my hair out trying to get soft-restarts to work reliably, 
 and
 have been frustrated for about a week solid. I'm running 1.4.15 on Ubuntu 
 10.04
 amd64, kernel version 2.6.32-32-generic.

 The behavior I'm seeing is that -sf restarts will work sporadically from the
 command-line.

 it prints the output:

 [ALERT] 198/213054 (23213) : Starting frontend http-in: cannot bind socket
 [ALERT] 198/213054 (23213) : Starting frontend openstack-ssl-in:
 cannot bind socket
 [ALERT] 198/213054 (23213) : Starting frontend mother-ssl-in: cannot bind 
 socket
 [ALERT] 198/213054 (23213) : Starting frontend authoritae-ssl-in:
 cannot bind socket
 [ALERT] 198/213054 (23213) : Starting frontend encoders-in: cannot bind 
 socket
 [ALERT] 198/213054 (23213) : Starting frontend encoders-v2-in: cannot
 bind socket
 [ALERT] 198/213054 (23213) : Starting frontend redis-in: cannot bind socket

 Wow this is very concerning, this bug was a regression in 2.6.38, and it looks
 like it was backported to their 2.6.32-32 kernel ! It was fixed in 2.6.38.8 if
 my memory serves me right. It seems rather strange they'd backport a patch 
 that
 is known to break some products (haproxy and amavis are at least two 
 identified
 victims), but it's possible they didn't notice the later fix.

 It would be nice to know what exact kernel their 2.6.32 is based on, and 
 ideally
 what patches were applied on top of that.

 Regards,
 Willy

Willy,

I looked at the previous bug report here
http://comments.gmane.org/gmane.comp.web.haproxy/5439
based on 2.6.38 and checked the ubuntu 2.6.32 kernel for the offending patch
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c191a836a908d1dd6b40c503741f91b914de3348
and I didn't see it applied to the kernel I'm using.

Is there any other explanation, or some information I can find for you?


Thanks,
Jonathan



Re: nbproc1, ksoftirqd and response time - just can't get it

2011-07-18 Thread Dmitriy Samsonov
Hi!


 Fine, this is a lot better now. Since you're running at 2000 concurrent
 connections, the impact on the cache is noticeable (at 32kB per connection
 for haproxy, it's 64MB of RAM possibly touched each second, maybe only 16MB
 since requests are short and fit in a single page). Could you recheck at
 only 250 concurrent connections in total (125 per ab) ? This usually is
 the optimal point I observe. I'm not saying that it should be your target,
 but we're chasing the issues :-)


Using one/two clients with ab2 -c 250 -H 'Accept-Encoding: None' -n
1 http://testhost I can get:
Requests per second:    25360.45 [#/sec] (mean)
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    6  94.8      3    9017
Processing:     0    4  16.7      3    5002
Waiting:        0    0  10.3      0    5002
Total:          0   10  96.3      6    9022
Percentage of the requests served within a certain time (ms)
  50%      6
  66%      7
  75%      8
  80%      8
  90%      9
  95%     10
  98%     11
  99%     13
 100%   9022 (longest request)

Maximum session rate is 49000 (exact value).

Also, I've found reason why SYN packets were lost. It looks like it
happened because of slow interrupt handling (same ksoftirq/0 at 100%),
ifconfig eth0 reports dropped packets. I've tried to set ethtool -g
eth0 rx 2040 and dropped packets are gone at least according to
ifconfig.

Also I've upgraded kernel from 2.6.38-r6 (gentoo) to 2.6.39-r3
(gentoo) - nothing changed. At all. haproxy version is 1.4.8.

Altering somaxconn also did change anything.

Only changing affinity of irq/haproxy is affecting system, maximum
rate changes from 25k to 49k. And that's it...

I'm including sysctl -a output, but I think all this happens because
of some troubles with bnx2 driver - I just don't see any explanation
why 70-80Mbs are saturating haproxy and irq handling (lost packets!).
I have an option to try 'High Performance 1000PT Intel Network Card'
could it be any better or I should try find solution for current
configuration?

My final task is to handle DDoS attacks with flexible and robust
filter available. Haproxy is already helping me to stay alive under
~8-10k DDoS bots (I'm using two servers and DNS RR in production), but
attackers are not sleeping and I'm expecting attacks to continue with
more bots. I bet they will stop at 20-25k bots. Such botnet will
generate approx. 500k session rate. and ~1Gbps bandwidth so I was
dreaming to handle it on this one server with two NIC's bonded giving
me 2Gbps for traffic:)



  Typical output of one of two ab2 running is:
 
  Server Software:
  Server Hostname:        nohost
  Server Port:            80
 
  Document Path:          /
  Document Length:        0 bytes
 
  Concurrency Level:      1000
  Time taken for tests:   470.484 seconds
  Complete requests:      1000
  Failed requests:        0
  Write errors:           0
  Total transferred:      0 bytes
  HTML transferred:       0 bytes
  Requests per second:    21254.72 [#/sec] (mean)
  Time per request:       47.048 [ms] (mean)
  Time per request:       0.047 [ms] (mean, across all concurrent requests)
  Transfer rate:          0.00 [Kbytes/sec] received
 
  Connection Times (ms)
                min  mean[+/-sd] median   max
  Connect:        0   34 275.9     11   21086

 This one means there is packet loss on SYN packets. Some requests
 take up to 4 SYN to pass (0+3+6+9 seconds). Clearly something is
 wrong, either on the network or more likely net.core.somaxconn.
 You have to restart haproxy after you change this default setting.

 Does dmesg say anything on either the clients or the proxy machine ?

  Processing:     0   13  17.8     11     784
  Waiting:        0    0   0.0      0       0
  Total:          2   47 276.9     22   21305
 
  Percentage of the requests served within a certain time (ms)
    50%     22
    66%     26
    75%     28
    80%     30
    90%     37
    95%     41
    98%     47
    99%    266
   100%  21305 (longest request)
 
  Typical output of vmstat is:
  dex9 ipv4 # vmstat 1
  procs ---memory-- ---swap-- -io -system--
  cpu
   r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
  wa
   1  0      0 131771328  46260  64016    0    0     2     1  865  503  1  6
  94  0
   1  0      0 131770688  46260  64024    0    0     0     0 40496 6323  1  9
  90  0

 OK, so 1% user, 9% system, 90% idle, 0% wait at 40k int/s. Since this is
 scaled to 100% for all cores, it means that we're saturating a core in the
 system (which is expected with short connections).

 I don't remember if I asked you what version of haproxy and what kernel you
 were using. Possibly that some TCP options can improve things a bit.

  Also, I've checked version of NIC's firmware:
  dex9 ipv4 # ethtool -i eth0
  driver: bnx2
  version: 2.0.21
  firmware-version: 6.2.12 bc 5.2.3
  bus-info: :01:00.0

 OK, let's hope it's fine. I remember having seen apparently 

Re: nbproc1, ksoftirqd and response time - just can't get it

2011-07-18 Thread Hank A. Paulson

On 7/18/11 5:25 PM, Dmitriy Samsonov wrote:

My final task is to handle DDoS attacks with flexible and robust
filter available. Haproxy is already helping me to stay alive under
~8-10k DDoS bots (I'm using two servers and DNS RR in production), but
attackers are not sleeping and I'm expecting attacks to continue with
more bots. I bet they will stop at 20-25k bots. Such botnet will
generate approx. 500k session rate. and ~1Gbps bandwidth so I was
dreaming to handle it on this one server with two NIC's bonded giving
me 2Gbps for traffic:)


I think if that is your goal then you should definitely move to the intel 
NICs, people seem to have problems with those bnx NICs with linux.


Since you are using a new-ish kernel, you might also want to look at the 
splice options and the smart accept/smart* options for haproxy.


Since dDOS mitigation is your goal, if you have the money you may want to try 
the 10Gb NICs since as Willie said they seem to perform better even at lower 
levels.


If you have a non-Dell machine with fewer cores and a faster processor you 
might want to test that to see if it will work better in this scenario.
Also on all machines, try with hyperthreading on/off at the BIOS level to see 
if that makes a difference. And you can reduce the cores/cpus used in the bios 
and grub level settings, so you might try going down to 2 cores, 1 cpu, no 
hyperthreading and see if that makes a difference. Also, if oyu do you an 
Intel card/Intel onboard NIC, there are some setting IT/AO that may affect ma 
performance.


If this is for dDOS mitigation, for the majority of the connections are you 
going to be tarpitting or blocking or passing them on to a real backend 
server?  You maybe testing a scenario that does not map well to your real 
world usage. I would suggest putting keepalived on the current machine (if 
there is one) and any new machine you are thinking of using to replace the 
existing one, then you can switch to the new one easily and switch back if you 
find any show stopper issues.


Also, for dDOS mitigation you probably want to increase these:
net.ipv4.tcp_max_syn_backlog
net.ipv4.ip_local_port_range

Here is a facebook note about scaling memcache connections per second:
http://www.facebook.com/note.php?note_id=39391378919



No Attachments Please

2011-07-18 Thread Lyris ListManager

You sent an email to the Exchange list
with an attachment. We have disabled this
option as recently a virus was attached.
Please resend your posting without it?

Thanks! 




Re: Unreliable soft-restarts on Ubuntu 10.04 with haproxy 1.4.8 and 1.4.15

2011-07-18 Thread Willy Tarreau
On Mon, Jul 18, 2011 at 06:33:33PM -0400, Jonathan Simms wrote:
 Willy,
 
 I looked at the previous bug report here
 http://comments.gmane.org/gmane.comp.web.haproxy/5439
 based on 2.6.38 and checked the ubuntu 2.6.32 kernel for the offending patch
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c191a836a908d1dd6b40c503741f91b914de3348
 and I didn't see it applied to the kernel I'm using.

OK then that's already a good thing, but we have to find out what
could cause a similar issue on a specific distro !

 Is there any other explanation, or some information I can find for you?

Do all the listeners have the same issue or only a few ? And did the
config change between the working one and the reloaded one ? What could
cause the same issue to happen is a copy-paste of a bind line in the
same file, which would cause a conflict when trying to bind the second
one.

Also, please check that you're don't have more than one process running
when the issue appears. It could be that another old process still holds
the ports open and does not get the signal to release them. But this would
be surprising considering that your config only allows one process.

In your trace below, you only have the expected part :

17492 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 5
17492 fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
17492 setsockopt(5, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
17492 setsockopt(5, SOL_SOCKET, 0xf /* SO_??? */, [1], 4) = -1
ENOPROTOOPT (Protocol not available)
17492 bind(5, {sa_family=AF_INET, sin_port=htons(6379),
sin_addr=inet_addr(0.0.0.0)}, 16) = -1 EADDRINUSE (Address 
already in use)
17492 close(5)  = 0
17492 kill(13512, SIGTTOU)  = 0

The first bind() fails, then the new process sends a SIGTTOU signal to the
old one asking it to release the ports, then haproxy tries to bind again for
a certain time, and only complains if it fails for too long. Ideally, a full
strace of the issue could help, but please take it with strace -tt so that
we get the timers.

Regards,
Willy