Randomness in hard-stop-after

2018-02-08 Thread Samuel Reed
When reconfiguring across a large web tier, it's possible that many
HAProxy reloads are initiated at nearly the same time. We have a large
number of long-lived TCP sessions, so we must use hard-stop-after to
eventually kill them off, or HAProxy instances will remain open for days.

It would be very helpful to have an option like
`spread-hard-stop-after`, similar to `spread-checks`, to introduce some
random variance into the timeout such that the disconnection of these
sessions does not cause a major traffic event across the stack.

Also, thank you very much for the 1.8.4 release!




Re: High load average under 1.8 with multiple draining processes

2018-01-15 Thread Samuel Reed
Reload is done via the sysvinit wrapper, which executes:

$HAPROXY -f "$CONFIG" -p $PIDFILE -sf $(cat $PIDFILE) -D $EXTRAOPTS

No $EXTRAOPTS are specified. We don't use unix sockets and we don't use
master-worker. I'll send an anonymized haproxy config directly to you
two along with the results of "show sess" et al.

We don't use peers but if we fall back to nbproc we likely will.

More info to come shortly.

On 1/15/18 8:22 AM, Willy Tarreau wrote:
> On Mon, Jan 15, 2018 at 08:14:40AM -0600, Samuel Reed wrote:
>> Thank you for the patch and your quick attention to this issue. Results
>> after a few reloads, 8 threads on 16 core machine, both draining and new
>> process have patches.
>>
>> New process:
>>
>> % time seconds  usecs/call calls    errors syscall
>> -- --- --- - - 
>>  96.24    0.432348  16 26917   epoll_wait
>>   3.60    0.016158  16  1023 7 recvfrom
>>   0.12    0.000524  31    17   sendto
>>   0.04    0.000190   0  1126 3 write
>>   0.01    0.36   1    70    23 read
>>   0.00    0.00   0    21   close
>>   0.00    0.00   0 7   socket
>>   0.00    0.00   0 7 7 connect
>>   0.00    0.00   0    13   sendmsg
>>   0.00    0.00   0    17   setsockopt
>>   0.00    0.00   0 7   fcntl
>>   0.00    0.00   0 9   epoll_ctl
>>   0.00    0.00   0 5 5 accept4
>> -- --- --- - - 
>> 100.00    0.449256 29239    45 total
>>
>>
>> Draining process:
>>
>> % time seconds  usecs/call calls    errors syscall
>> -- --- --- - - 
>>  78.26    0.379045  16 23424   epoll_wait
>>  13.94    0.067539   7  9877 4 recvfrom
>>   7.80    0.037764   4 10471 6 write
>>   0.00    0.07   0    29    10 read
>>   0.00    0.00   0 9   close
>>   0.00    0.00   0 5   sendto
>>   0.00    0.00   0 3   shutdown
>>   0.00    0.00   0    20   epoll_ctl
>> -- --- --- - - 
>> 100.00    0.484355 43838    20 total
>>
>>
>> I ran this a few times while both processes were live and the numbers
>> weren't significantly different. The new process still has a remarkably
>> high proportion of epoll_wait.
> Thank you Samuel for the test. It's sad, but it may indicate something
> completely different. Christopher at least I'm willing to integrate your
> fix to rule out this corner case in the future.
>
> Among the differences possible between an old and a new process, we can
> enumerate very few things, for example the peers, which work differently
> for new and old processes. Do you use peers in your config ? It would also
> be possible that we pass an fd corresponding to a more or less closed
> listener or something like this. Do you reload with -x to pass FDs across
> processes ? Do you use master-worker ? Just trying to rule out a number of
> hypothesis. An anonymized version of your config will definitely help here
> I'm afraid.
>
> Thanks!
> Willy




Re: High load average under 1.8 with multiple draining processes

2018-01-15 Thread Samuel Reed
Thank you for the patch and your quick attention to this issue. Results
after a few reloads, 8 threads on 16 core machine, both draining and new
process have patches.

New process:

% time seconds  usecs/call calls    errors syscall
-- --- --- - - 
 96.24    0.432348  16 26917   epoll_wait
  3.60    0.016158  16  1023 7 recvfrom
  0.12    0.000524  31    17   sendto
  0.04    0.000190   0  1126 3 write
  0.01    0.36   1    70    23 read
  0.00    0.00   0    21   close
  0.00    0.00   0 7   socket
  0.00    0.00   0 7 7 connect
  0.00    0.00   0    13   sendmsg
  0.00    0.00   0    17   setsockopt
  0.00    0.00   0 7   fcntl
  0.00    0.00   0 9   epoll_ctl
  0.00    0.00   0 5 5 accept4
-- --- --- - - 
100.00    0.449256 29239    45 total


Draining process:

% time seconds  usecs/call calls    errors syscall
-- --- --- - - 
 78.26    0.379045  16 23424   epoll_wait
 13.94    0.067539   7  9877 4 recvfrom
  7.80    0.037764   4 10471 6 write
  0.00    0.07   0    29    10 read
  0.00    0.00   0 9   close
  0.00    0.00   0 5   sendto
  0.00    0.00   0 3   shutdown
  0.00    0.00   0    20   epoll_ctl
-- --- --- - - 
100.00    0.484355 43838    20 total


I ran this a few times while both processes were live and the numbers
weren't significantly different. The new process still has a remarkably
high proportion of epoll_wait.


On 1/15/18 7:48 AM, Christopher Faulet wrote:
> Le 12/01/2018 à 18:51, Willy Tarreau a écrit :
>> On Fri, Jan 12, 2018 at 11:06:32AM -0600, Samuel Reed wrote:
>>> On 1.8-git, similar results on the new process:
>>>
>>> % time seconds  usecs/call calls    errors syscall
>>> -- --- --- - - 
>>>   93.75    0.265450  15 17805   epoll_wait
>>>    4.85    0.013730  49   283   write
>>>    1.40    0.003960  15   266    12 recvfrom
>>>    0.01    0.18   0    42    12 read
>>>    0.00    0.00   0    28   close
>>>    0.00    0.00   0    12   socket
>>>    0.00    0.00   0    12    12 connect
>>>    0.00    0.00   0    19 1 sendto
>>>    0.00    0.00   0    12   sendmsg
>>>    0.00    0.00   0 6   shutdown
>>>    0.00    0.00   0    35   setsockopt
>>>    0.00    0.00   0 7   getsockopt
>>>    0.00    0.00   0    12   fcntl
>>>    0.00    0.00   0    13   epoll_ctl
>>>    0.00    0.00   0 2 2 accept4
>>> -- --- --- - - 
>>> 100.00    0.283158 18554    39 total
>>>
>>> Cursory look through the strace output looks the same, with the same
>>> three types as in the last email, including the cascade.
>>
>> OK thank you for testing. On Monday we'll study this with Christopher.
>>
>
> Hi Samuel,
>
> Here are 2 patches that may solve your problem. Idea is to set the
> poller timeout to 0 for a specific thread only when some processing
> are expected for this thread. The job was already done for tasks and
> applets using bitfields. Now, we do the same for the FDs.
>
> For now, we don't know if it aims your problem, but it should avoid a
> thread to loop for nothing. Could you check if it works please ?
>




Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Samuel Reed
On 1.8-git, similar results on the new process:

% time seconds  usecs/call calls    errors syscall
-- --- --- - - 
 93.75    0.265450  15 17805   epoll_wait
  4.85    0.013730  49   283   write
  1.40    0.003960  15   266    12 recvfrom
  0.01    0.18   0    42    12 read
  0.00    0.00   0    28   close
  0.00    0.00   0    12   socket
  0.00    0.00   0    12    12 connect
  0.00    0.00   0    19 1 sendto
  0.00    0.00   0    12   sendmsg
  0.00    0.00   0 6   shutdown
  0.00    0.00   0    35   setsockopt
  0.00    0.00   0 7   getsockopt
  0.00    0.00   0    12   fcntl
  0.00    0.00   0    13   epoll_ctl
  0.00    0.00   0 2 2 accept4
-- --- --- - - 
100.00    0.283158 18554    39 total

Cursory look through the strace output looks the same, with the same
three types as in the last email, including the cascade.


On 1/12/18 10:23 AM, Willy Tarreau wrote:
> On Fri, Jan 12, 2018 at 10:13:55AM -0600, Samuel Reed wrote:
>> Excellent! Please let me know if there's any other output you'd like
>> from this machine.
>>
>> Strace on that new process shows thousands of these types of syscalls,
>> which vary slightly,
>>
>> epoll_wait(3, {{EPOLLIN, {u32=206, u64=206}}}, 200, 239) = 1
> If the u32 value almost doesn't vary, that's an uncaught event. We've
> got a report for this that we've just fixed yesterday which started to
> appear after the system was upgraded with Meltdown fixes. That seems
> unrelated but reverting made the problem disappear.
>
>> and these:
>>
>> epoll_wait(3, {}, 200, 0)   = 0
> This one used to appear in yesterday's report though it could be caused
> by other bugs as well. That's the one I predicted.
>
>> There is also something of a cascade (each repeats about 10-20x before
>> the next):
>>
>> epoll_wait(3, {{EPOLLIN, {u32=47, u64=47}}}, 200, 71) = 1
>> epoll_wait(3, {{EPOLLIN, {u32=93, u64=93}}, {EPOLLIN, {u32=656,
>> u64=656}}}, 200, 65) = 2
>> epoll_wait(3, {{EPOLLIN, {u32=93, u64=93}}, {EPOLLIN, {u32=656,
>> u64=656}}, {EPOLLIN, {u32=227, u64=227}}}, 200, 0) = 3
>> epoll_wait(3, {{EPOLLIN, {u32=93, u64=93}}, {EPOLLIN, {u32=656,
>> u64=656}}, {EPOLLIN, {u32=227, u64=227}}, {EPOLLIN, {u32=785,
>> u64=785}}}, 200, 65) = 4
>> epoll_wait(3, {{EPOLLIN, {u32=93, u64=93}}, {EPOLLIN, {u32=656,
>> u64=656}}, {EPOLLIN, {u32=227, u64=227}}, {EPOLLIN, {u32=785, u64=785}},
>> {EPOLLIN, {u32=639, u64=639}}}, 200, 64) = 5
>>
>> I've seen it go as deep as 15. The trace is absolutely dominated by these.
> OK that's very interesting. Just in doubt, please update to latest
> 1.8-git to see if it makes this issue disappear.
>
> Thanks,
> Willy




Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Samuel Reed
Excellent! Please let me know if there's any other output you'd like
from this machine.

Strace on that new process shows thousands of these types of syscalls,
which vary slightly,

epoll_wait(3, {{EPOLLIN, {u32=206, u64=206}}}, 200, 239) = 1

and these:

epoll_wait(3, {}, 200, 0)   = 0

There is also something of a cascade (each repeats about 10-20x before
the next):

epoll_wait(3, {{EPOLLIN, {u32=47, u64=47}}}, 200, 71) = 1
epoll_wait(3, {{EPOLLIN, {u32=93, u64=93}}, {EPOLLIN, {u32=656,
u64=656}}}, 200, 65) = 2
epoll_wait(3, {{EPOLLIN, {u32=93, u64=93}}, {EPOLLIN, {u32=656,
u64=656}}, {EPOLLIN, {u32=227, u64=227}}}, 200, 0) = 3
epoll_wait(3, {{EPOLLIN, {u32=93, u64=93}}, {EPOLLIN, {u32=656,
u64=656}}, {EPOLLIN, {u32=227, u64=227}}, {EPOLLIN, {u32=785,
u64=785}}}, 200, 65) = 4
epoll_wait(3, {{EPOLLIN, {u32=93, u64=93}}, {EPOLLIN, {u32=656,
u64=656}}, {EPOLLIN, {u32=227, u64=227}}, {EPOLLIN, {u32=785, u64=785}},
{EPOLLIN, {u32=639, u64=639}}}, 200, 64) = 5

I've seen it go as deep as 15. The trace is absolutely dominated by these.


On 1/12/18 10:01 AM, Willy Tarreau wrote:
> On Fri, Jan 12, 2018 at 09:50:58AM -0600, Samuel Reed wrote:
>> To accelerate the process, I've increased the number of threads from 4
>> to 8 on a 16-core machine. Ran strace for about 5s on each.
>>
>> Single process (8 threads):
>>
>> $ strace -cp 16807
>> % time seconds  usecs/call calls    errors syscall
>> -- --- --- - - 
>>  71.36    0.330172  21 15479   epoll_wait
>>  13.59    0.062861   4 14477 1 write
>>  10.58    0.048955   4 11518    10 recvfrom
>>   4.44    0.020544  38   537   244 read
> This one is OK and shows that quite some time is in fact spent waiting
> for I/O events.
>
>> Two processes (2x8 threads):
>>
>> ## Draining process
>>
>> % time seconds  usecs/call calls    errors syscall
>> -- --- --- - - 
>>  48.65    0.544758  30 18359   epoll_wait
>>  28.69    0.321283  14 23540   write
>>  22.60    0.253049  19 13338   recvfrom
>>   0.04    0.000474   1   786   374 read
>>   0.03    0.000287   2   187   sendto
> This one as well.
>
>> ## "New" process
>>
>> % time seconds  usecs/call calls    errors syscall
>> -- --- --- - - 
>>  93.87    1.588239  11    149253   epoll_wait
>>   3.84    0.064985  30  2140    31 recvfrom
>>   1.77    0.029905  13  2388   write
>>   0.34    0.005737  10   589   130 read
>>   0.12    0.002018  38    53   close
>>   0.06    0.000960   8   114 2 sendto
> This one is very interesting! So the epoll_wait to other syscalls ratio
> went from roughly 1/2 to 30/1. I'm pretty sure that a regular strace would
> show you a large number of epoll_wait(0)=0 indicating we're missing some
> events. I seem to remember that sometimes there are situations where a
> thread may be notified by epoll() about an fd it cannot take care of but
> I don't remember in which case, I'll have to discuss with Christopher.
>
> But at least now we have an explanation and it's not directly related to
> thread contention but more likely with the mapping of FDs to threads, so
> we may have opportunities to improve the situation here.
>
> Thanks!
> Willy




Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Samuel Reed
To accelerate the process, I've increased the number of threads from 4
to 8 on a 16-core machine. Ran strace for about 5s on each.

Single process (8 threads):

$ strace -cp 16807
% time seconds  usecs/call calls    errors syscall
-- --- --- - - 
 71.36    0.330172  21 15479   epoll_wait
 13.59    0.062861   4 14477 1 write
 10.58    0.048955   4 11518    10 recvfrom
  4.44    0.020544  38   537   244 read
  0.02    0.94   1   135   sendto
  0.01    0.51   3    16   epoll_ctl
  0.00    0.00   0    24   close
  0.00    0.00   0 9   socket
  0.00    0.00   0 9 9 connect
  0.00    0.00   0    25   sendmsg
  0.00    0.00   0 6   shutdown
  0.00    0.00   0    24   setsockopt
  0.00    0.00   0 3   getsockopt
  0.00    0.00   0 9   fcntl
  0.00    0.00   0 8 7 accept4
-- --- --- - - 
100.00    0.462677 42279   271 total


Two processes (2x8 threads):

## Draining process

% time seconds  usecs/call calls    errors syscall
-- --- --- - - 
 48.65    0.544758  30 18359   epoll_wait
 28.69    0.321283  14 23540   write
 22.60    0.253049  19 13338   recvfrom
  0.04    0.000474   1   786   374 read
  0.03    0.000287   2   187   sendto
  0.00    0.00   0 2   close
  0.00    0.00   0 1   sendmsg
  0.00    0.00   0 1   shutdown
  0.00    0.00   0 1   setsockopt
-- --- --- - - 
100.00    1.119851 56215   374 total

## "New" process

% time seconds  usecs/call calls    errors syscall
-- --- --- - - 
 93.87    1.588239  11    149253   epoll_wait
  3.84    0.064985  30  2140    31 recvfrom
  1.77    0.029905  13  2388   write
  0.34    0.005737  10   589   130 read
  0.12    0.002018  38    53   close
  0.06    0.000960   8   114 2 sendto
  0.00    0.31   1    25   shutdown
  0.00    0.19   0   102   sendmsg
  0.00    0.19   0    58   epoll_ctl
  0.00    0.15   0    31   fcntl
  0.00    0.00   0    31   socket
  0.00    0.00   0    31    31 connect
  0.00    0.00   0    94   setsockopt
  0.00    0.00   0 8   getsockopt
  0.00    0.00   0    47    29 accept4
-- --- --- - - 
100.00    1.691928    154964   223 total


It does indeed appear the new process is contending with the old, even
with just 16 threads on a 16-core box. A third process (oversubscribed
to 24 threads on 16 cores):

% time seconds  usecs/call calls    errors syscall
-- --- --- - - 
 97.21    0.950863  14 69926   epoll_wait
  2.32    0.022677 208   109   write
  0.42    0.004106  48    85    14 recvfrom
  0.04    0.000439  17    26   close
  0.00    0.22   1    34   epoll_ctl
  0.00    0.00   0   136    33 read
  0.00    0.00   0 1   brk
  0.00    0.00   0    14   socket
  0.00    0.00   0    14    14 connect
  0.00    0.00   0    15 1 sendto
  0.00    0.00   0    11   sendmsg
  0.00    0.00   0    13 1 shutdown
  0.00    0.00   0    50   setsockopt
  0.00    0.00   0 3   getsockopt
  0.00    0.00   0    14   fcntl
  0.00    0.00   0    34    22 accept4
-- --- --- - - 
100.00    0.978107 70485    85 total


During this time, each of the three processes was running at roughly
250-350% CPU.


On 1/12/18 9:34 AM, Willy Tarreau wrote:
> On Fri, Jan 12, 2018 at 09:28:54AM -0600, Samuel Reed wrote:
>> Thanks for your quick answer, Willy.
>>
>> That's a shame to hear but makes sense. We'll try

Re: High load average under 1.8 with multiple draining processes

2018-01-12 Thread Samuel Reed
Thanks for your quick answer, Willy.

That's a shame to hear but makes sense. We'll try out some ideas for
reducing contention. We don't use cpu-map with nbthread; I considered it
best to let the kernel take care of this, especially since there are
some other processes on that box. I don't really want to fall back to
nbproc but we may have to, at least until we get the number of reloads down.


On 1/12/18 8:55 AM, Willy Tarreau wrote:
> Hi Samuel,
>
> On Thu, Jan 11, 2018 at 08:29:15PM -0600, Samuel Reed wrote:
>> Is there a regression in the 1.8 series with SO_REUSEPORT and nbthread
>> (we didn't see this before with nbproc) or somewhere we should start
>> looking?
> In fact no, nbthread is simply new so it's not a regression but we're
> starting to see some side effects. One possibility I can easily imagine
> is that at most places we're using spinlocks because the locks are very
> short-lived and very small, so they have to be cheap. One limit of
> spinlocks is that it's mandatory that you don't have more threads than
> cores so that your threads are never scheduled out with a lock held, to
> let another one spin for nothing for a timeslice.
>
> The reload makes an interesting case because if you use cpumap to bind
> your threads to CPU cores, during the soft-stop period, they do have to
> coexist on the same cores and a thread of one process disturbs the thread
> of another process by stealing its CPU regularly.
>
> I can't say I'm seeing any easy solution to this in the short term, that's
> something we have to add to the list of things to improve in the future.
> Maybe something as simple as starting with SCHED_FIFO to prevent threads
> from being preempted outside of the poll loop, and dropping it upon reload
> could help a lot, but that's just speculation.
>
> We'll have to continue to think about this I guess. It may be possible
> that if your old processes last very long you'd continue to get a better
> experience using nbproc than nbthread :-/
>
> Willy




High load average under 1.8 with multiple draining processes

2018-01-11 Thread Samuel Reed
We've recently upgraded to HAProxy 1.8.3, which we run with `nbthread 4`
(we used to run nbproc 4 with older releases). This has generally been
good, especially for stick tables & stats.

We terminate SSL and proxy a large number of long-running TCP
connections (websockets). When configuration changes (usually a server
going up or down), a reload occurs. For many hours, there may be 2-5
active HAProxy instances as the old ones drain. We use hard-stop-after
to keep it reasonable.

I have noticed that load average gets *very* high while these processes
are still present. Our web tier is usually at a load average of 5 with
16 cores. Across the board, load averages go up if stale HAProxy
instances are active. I saw as high as 34 with one instance that had 5
instances, and 100% CPU, most of it sys. Even with just 2, the loadavg
is double what it is with 1. Terminating the old process immediately
brings the load down.

Is there a regression in the 1.8 series with SO_REUSEPORT and nbthread
(we didn't see this before with nbproc) or somewhere we should start
looking? We make (relatively) heavy use of stick tables for DDoS
protection purposes and terminate SSL, but aside from that our
configuration is pretty vanilla. Nothing changed from 1.7 to 1.8 except
changing nbproc to nbthread.

Thanks!




Re: Adding HSTS or custom headers on redirect

2014-12-13 Thread Samuel Reed
Pavlos Parissis pavlos.parissis@... writes:

 
 
 
 On 2 December 2014 at 09:17, Samuel Reed samuel.trace.reed at gmail.com 
wrote:I'm running the latest 1.5 release.
 Our site runs primarily on the `www` subdomain, but we want to enable HSTS for
 all subdomains (includeSubdomains). Unfortunately, due to the way HSTS works,
 the HSTS header MUST be present on the redirect from https://example.com 
tohttps://www.example.com. I am using configuration like:
 rspadd Strict-Transport-Security:\ max-age=31536000;\ includeSubDomains
 redirect prefix https://www.example.com code 301 if \
 { hdr(host) -i example.com }
 For whatever reason, even when the rspadd line is before the redirect, no
 headers are added to the redirect, making this impossible. I've considered
 a fake backend with a fake 503 file to get around this - something like:
 HTTP/1.1 301 Moved Permanently
 Cache-Control: no-cache
 Content-Length: 0
 Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
 Location: https://www.example.com/
 Connection: close
 While this will work, it feels really hacky. Is there a better way to add a
 header on a redirect?
 
 
 Have a look at the thread 'add response header based on presence of request 
header', your case matches the case I mentioned there.
 
 
 Cheers,Pavlos
 
 
 
 

Hi Pavlos - unfortunately this does not match my use case,
I always want to send the HSTS header, it is not dependent 
on the request. I just need to ensure that this 
header is *always* sent, even on a redirect.





Adding HSTS or custom headers on redirect

2014-12-02 Thread Samuel Reed
I'm running the latest 1.5 release.

Our site runs primarily on the `www` subdomain, but we want to enable HSTS for
all subdomains (includeSubdomains). Unfortunately, due to the way HSTS works,
the HSTS header MUST be present on the redirect from https://example.com to 
https://www.example.com. I am using configuration like:

rspadd Strict-Transport-Security:\ max-age=31536000;\ includeSubDomains
redirect prefix https://www.example.com code 301 if \
{ hdr(host) -i example.com }

For whatever reason, even when the rspadd line is before the redirect, no 
headers are added to the redirect, making this impossible. I've considered 
a fake backend with a fake 503 file to get around this - something like: 

HTTP/1.1 301 Moved Permanently
Cache-Control: no-cache
Content-Length: 0
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Location: https://www.example.com/
Connection: close

While this will work, it feels really hacky. Is there a better way to add a
header on a redirect?

Thanks
Sam




Stick-table misconfiguration; haproxy should throw error

2014-10-17 Thread Samuel Reed
I just wanted to report an issue that had me hitting my head on the table for a 
few hours:

I reconfigured a development environment to use Chef to generate haproxy config 
files. Every 2-3 minutes or so, depending on load, haproxy would crash with a 
`trap divide error` (divide by zero).

It worked fine previously, and during those few minutes it works perfectly, but 
the trap divide error persisted. Tried a new machine, still got it. Tried every 
1.5 version from 1.5.5 all the way to the early dev builds, still got it.

Turns out the culprit was a misnamed variable on my end that was causing my 
stick table definition to read `stick-table type ip 
,bytes-in-rate(),http_err_rate(10s)`. That is, there was no period defined on 
the bytes-in-rate.

It would be very nice if haproxy would regard this as a configuration error and 
throw on start rather than crash minutes later with a cryptic trap divide error.

Thanks for a great project.

Best,
Sam


signature.asc
Description: Message signed with OpenPGP using GPGMail