Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-03-25 Thread Willy Tarreau
Hi Ashwin,

On Mon, Mar 25, 2019 at 02:51:17PM -0700, Ashwin Neerabail wrote:
> Hi Willy,
> 
> I tested against the latest version in the haproxy source repo.
> Things got significantly worse. Even median latencies have shot up to 150ms
> ( compared to 4ms for haproxy 1.8)
> p99 shot up above 1second.
> One strange thing I observed in the stats page is nbthread shows up as 64 (
> its 1 for HAProxy 1.8). I am using the exact same configuration across both
> versions.

That's expected after 2.0-dev1, please have a look at the dev2 announce I
just sent. If you want a single thread, please just set "nbthread 1".

> ctime , rtime are reporting higher values for 2.0 ( though 1.8 is fine
> against the same backends at the same time)

What exact version did you try ? I've just issued 2.0-dev2 with important
fixes for issues that were causing some streams to starve for a while and
now I can't reproduce any such delay anymore. But given that some of the
pending issues were addressed yesterday evening, you guess why I'm
interested in knowing which exact version you tried ;-)

Thanks,
Willy



[ANNOUNCE] haproxy-2.0-dev2

2019-03-25 Thread Willy Tarreau
Hi,

HAProxy 2.0-dev2 was released on 2019/03/26. It added 176 new commits
after version 2.0-dev1.

This version starts more important changes. One of the most visible
ones is that haproxy will now automatically start with threads enabled
if neither "nbthread" nor "nbproc" is configured. It will check the
number of CPUs it's running on and will start as many threads. This
means that it will not be necessary anymore to adjust the configuration
to adjust the number of threads and the CPUs bound, by just setting the
affinity in the service's configuration, haproxy will automatically
adapt and use the same number of threads. On systems where haproxy
cannot retrieve the affinity information it will still default to a
single thread.

A small byproduct of this change is that now nbproc and nbthread are
exclusive. We experimented with both at the same time in 1.8 and it
was totally pointless since it maintains all the problems caused by
processes and causes many internal difficulties as one can imagine.
I'm still thinking how we could further simplify the "process" and
"cpu-map" directives based on this, though keeping "1/X" in the values
is not a big deal.

Some of you might be wondering "but how am I supposed to bind the
listeners now?". The response is that now there is some thread
load balancing in the listener's accept queue. Thus by default, you
can have a single "bind" line with no "process" setting and multiple
threads, and the accept() code will distribute the connections load to
the threads based on their respective number of connections. I found
this to address an issue I've been facing with h2load from the very
beginning, by which the traffic was never evenly spread, and the test
would start fast and slow down at the end, because some threads used
to have more connections than other ones and at the end of the test
only one or two threads were finishing alone. Now this issue is gone
because all threads get the same number of connections, and the
performance is extremely stable across tests. It's so stable that I
managed to get more than one million requests per second out of the
cache on my laptop ;-)

One nice effect of this automatic traffic distribution is that haproxy
can now much better share its CPUs with the network stack. In the past,
either you had one single socket and the traffic was not evenly spread,
or you had multiple sockets with a "process" directive and the traffic
was distributed in round-robin by the system. But when some cores are
highly loaded and others less (e.g. due to SSL traffic), the round
robin gives quite bad results and overloads already loaded threads.
Here instead the traffic remains very smooth since highly loaded
threads will get less new connections.

Of course it is still possible to continue to bind the sockets by hand,
and it still gives slightly higher raw performance since it skips the
incoming load balancing step. But when we're talking about hundreds of
thousands of connections per second, most people don't care about a
difference that sets limits 100 to 1000 times higher than their needs
and I expect to see trivial configs re-appear over time.

Another visible change in -dev2 is that the frontend's and global maxconn
value are now automatically set. We're indeed seeing far too often people
not set the global maxconn value, keeping an inappropriately low limit,
and at the same time those like us who develop are used to see warnings
all the time that their maxconn is too high for their ulimit. So this
means that the default value of 2000 is suitable for nobody. So now what
will be done when there's no maxconn is that the default value will be
automatically calculated based on the number of FDs allocated to the
process by "ulimit -n". This can be set in service settings on many
systems so this is another resource limit that will not require a
configuration change anymore. For example on systemd it seems to be
LimitNOFILE.

And now the frontend's default maxconn which most people don't set
because they believe it's the same as the global maxconn, will be the
global maxconn (configured or calculated). So this means that now, a
config which doesn't set a maxconn will have the maximum number of
possible conns set correctly by default. I've heard that maxconn was
one of the most difficult setting to get right in Docker images, so
let's hope that it will be much more turn-key now :-)

This version also contains a significant number of fixes, par of which
were already merged into 1.9.5 and others which I expect to see soon
in 1.9.6.

Among these fixes, we managed to address the trouble caused to the
abortonclose option in 1.8 when H2 was introduced. In short, we now
have a separate flag and don't pretend that an input stream is closed
at the end of the request, we make the difference with an end of message
input. We intend to backport this to the next 1.9 if no issue is
reported, which I'm now confident in given the time we spent chasing
various issues that this could 

Re: DNS Resolver Issues

2019-03-25 Thread Baptiste
>
> A reload of the HAProxy instance also forces the instances to query all
> records from the resolver.
>
>
Hi Bruno,

Actually, this is true only when you don't use the 'resolvers' section or
for the parameters who doesn't benefit from the resolvers section, here the
'addr' parameter.

Baptiste


Re: DNS Resolver Issues

2019-03-25 Thread Baptiste
Hi all,

Thanks @daniel for you very detailed report and @Piba for your help.
As Piba pointed out, the issue is related to the 'addr' parameter.
Currently, the only component in HAProxy which can benefit from dynamic
resolution at run time is the 'server', which means any other object using
a DNS hostname which does not resolve at start up may trigger an error,
like you discovered with 'addr'.

@Piba, feel free to fill up a feature request on github and Cc me there, so
we can discuss this point.

Baptiste



On Sat, Mar 23, 2019 at 2:53 PM PiBa-NL  wrote:

> Hi Daniel, Baptiste,
>
> @Daniel, can you remove the 'addr loadbalancer-internal.xxx.yyy' from
> the server check? It seems to me that that name is not being resolved by
> the 'resolvers'. And even if it would it would be kinda redundant as it
> is in the example as it is the same as the servername.?. Not sure how
> far below scenarios are all explained by this though..
>
> @Baptiste, is it intentional that a wrong 'addr' dns name makes haproxy
> fail to start despite having the supposedly never failing
> 'default-server init-addr last,libc,none' ? Is it possibly a good
> feature request to support re-resolving a dns name for the addr setting
> as well ?
>
> Regards,
> PiBa-NL (Pieter)
>
> Op 21-3-2019 om 20:37 schreef Daniel Schneller:
> > Hi!
> >
> > Thanks for the response. I had looked at the "hold" directives, but
> since they all seem to have reasonable defaults, I did not touch them.
> > I specified 10s explictly, but it did not make a difference.
> >
> > I did some more tests, however, and it seems to have more to do with the
> number of responses for the initial(?) DNS queries.
> > Hopefully these three tables make sense and don't get mangled in the
> mail. The "templated"
> > proxy is defined via "server-template" with 3 "slots". The "regular" one
> just as "server".
> >
> >
> > Test 1: Start out  with both "valid" and "broken" DNS entries. Then
> comment out/add back
> > one at a time as described in (1)-(5).
> > Each time after changing /etc/hosts, restart dnsmasq and check haproxy
> via hatop.
> > Haproxy started fresh once dnsmasq was set up to (1).
> >
> > |  state   state
> >  /etc/hosts |  regular templated
> > |-
> > (1) BRK|  UP/L7OK DOWN/L4TOUT
> >  VALID  |  MAINT/resolution
> > |  UP/L7OK
> > |
> >
> > (2) BRK|  DOWN/L4TOUT DOWN/L4TOUT
> >  #VALID |  MAINT/resolution
> > |  MAINT/resolution
> > |
> > (3) #BRK   |  UP/L7OK UP/L7OK
> >  VALID  |  MAINT/resolution
> > |  MAINT/resolution
> > |
> > (4) BRK|  UP/L7OK UP/L7OK
> >  VALID  |  DOWN/L4TOUT
> > |  MAINT/resolution
> > |
> > (5) BRK|  DOWN/L4TOUT DOWN/L4TOUT
> >  #VALID |  MAINT/resolution
> > |  MAINT/resolution
> >
> > This all looks normal and as expected. As soon as the "VALID" DNS entry
> is present, the
> > UP state follows within a few seconds.
> >
> >
> >
> > Test 2: Start out "valid only" (1) and proceed as described in (2)-(5),
> again restarting
> > dnsmasq each time, and haproxy reloaded after dnsmasq was set up to (1).
> >
> > |  state   state
> >  /etc/hosts |  regular templated
> > |
> > (1) #BRK   |  UP/L7OK MAINT/resolution
> >  VALID  |  MAINT/resolution
> > |  UP/L7OK
> > |
> > (2) BRK|  UP/L7OK DOWN/L4TOUT
> >  VALID  |  MAINT/resolution
> > |  UP/L7OK
> > |
> > (3) #BRK   |  UP/L7OK MAINT/resolution
> >  VALID  |  MAINT/resolution
> > |  UP/L7OK
> > |
> > (4) BRK|  UP/L7OK DOWN/L4TOUT
> >  VALID  |  MAINT/resolution
> > |  UP/L7OK
> > |
> > (5) BRK|  DOWN/L4TOUT 

Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-03-25 Thread Ashwin Neerabail
Hi Willy,

I tested against the latest version in the haproxy source repo.
Things got significantly worse. Even median latencies have shot up to 150ms
( compared to 4ms for haproxy 1.8)
p99 shot up above 1second.
One strange thing I observed in the stats page is nbthread shows up as 64 (
its 1 for HAProxy 1.8). I am using the exact same configuration across both
versions.
ctime , rtime are reporting higher values for 2.0 ( though 1.8 is fine
against the same backends at the same time)

-Ashwin



Thanks,
Ashwin

On Fri, Mar 22, 2019 at 11:03 AM Ashwin Neerabail  wrote:

> Hey Willy,
>
> Thats great news. Thanks for the quick action.
> I will verify and get back.
>
> Thanks,
> Ashwin
>
> On Fri, Mar 22, 2019 at 10:19 AM Willy Tarreau  wrote:
>
>> Hi Ashwin,
>>
>> We have found the root cause of this. The H2 streams were not getting
>> the fairness they deserved due to their wake-up ordering : it happened
>> very often that a stream interrupted on a ux buffer full condition could
>> be placed at the end of the list and/or its place preempted by another
>> stream trying to send for the first time.
>>
>> We've pushed all the fixes for this in 2.0-dev for now and I'll backport
>> them to 1.9 early next week. It would be nice if you could give it a try
>> to confirm that it's now OK for you.
>>
>> Cheers,
>> Willy
>>
>


[PR] DOC: The option httplog is no longer valid in a backend.

2019-03-25 Thread PR Bot
Dear list!

Author: Freddy Spierenburg 
Number of patches: 1

This is an automated relay of the Github pull request:
   DOC: The option httplog is no longer valid in a backend.

Patch title(s): 
   DOC: The option httplog is no longer valid in a backend.

Link:
   https://github.com/haproxy/haproxy/pull/68

Edit locally:
   wget https://github.com/haproxy/haproxy/pull/68.patch && vi 68.patch

Apply locally:
   curl https://github.com/haproxy/haproxy/pull/68.patch | git am -

Description:
   Inside the Proxy keywords matrix it looks like the option httplog is
   stil valid within a backend. This is no longer the case, hence this
   updates the documentation.

Instructions:
   This github pull request will be closed automatically; patch should be
   reviewed on the haproxy mailing list (haproxy@formilux.org). Everyone is
   invited to comment, even the patch's author. Please keep the author and
   list CCed in replies. Please note that in absence of any response this
   pull request will be lost.