Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Hi Ashwin, On Mon, Mar 25, 2019 at 02:51:17PM -0700, Ashwin Neerabail wrote: > Hi Willy, > > I tested against the latest version in the haproxy source repo. > Things got significantly worse. Even median latencies have shot up to 150ms > ( compared to 4ms for haproxy 1.8) > p99 shot up above 1second. > One strange thing I observed in the stats page is nbthread shows up as 64 ( > its 1 for HAProxy 1.8). I am using the exact same configuration across both > versions. That's expected after 2.0-dev1, please have a look at the dev2 announce I just sent. If you want a single thread, please just set "nbthread 1". > ctime , rtime are reporting higher values for 2.0 ( though 1.8 is fine > against the same backends at the same time) What exact version did you try ? I've just issued 2.0-dev2 with important fixes for issues that were causing some streams to starve for a while and now I can't reproduce any such delay anymore. But given that some of the pending issues were addressed yesterday evening, you guess why I'm interested in knowing which exact version you tried ;-) Thanks, Willy
[ANNOUNCE] haproxy-2.0-dev2
Hi, HAProxy 2.0-dev2 was released on 2019/03/26. It added 176 new commits after version 2.0-dev1. This version starts more important changes. One of the most visible ones is that haproxy will now automatically start with threads enabled if neither "nbthread" nor "nbproc" is configured. It will check the number of CPUs it's running on and will start as many threads. This means that it will not be necessary anymore to adjust the configuration to adjust the number of threads and the CPUs bound, by just setting the affinity in the service's configuration, haproxy will automatically adapt and use the same number of threads. On systems where haproxy cannot retrieve the affinity information it will still default to a single thread. A small byproduct of this change is that now nbproc and nbthread are exclusive. We experimented with both at the same time in 1.8 and it was totally pointless since it maintains all the problems caused by processes and causes many internal difficulties as one can imagine. I'm still thinking how we could further simplify the "process" and "cpu-map" directives based on this, though keeping "1/X" in the values is not a big deal. Some of you might be wondering "but how am I supposed to bind the listeners now?". The response is that now there is some thread load balancing in the listener's accept queue. Thus by default, you can have a single "bind" line with no "process" setting and multiple threads, and the accept() code will distribute the connections load to the threads based on their respective number of connections. I found this to address an issue I've been facing with h2load from the very beginning, by which the traffic was never evenly spread, and the test would start fast and slow down at the end, because some threads used to have more connections than other ones and at the end of the test only one or two threads were finishing alone. Now this issue is gone because all threads get the same number of connections, and the performance is extremely stable across tests. It's so stable that I managed to get more than one million requests per second out of the cache on my laptop ;-) One nice effect of this automatic traffic distribution is that haproxy can now much better share its CPUs with the network stack. In the past, either you had one single socket and the traffic was not evenly spread, or you had multiple sockets with a "process" directive and the traffic was distributed in round-robin by the system. But when some cores are highly loaded and others less (e.g. due to SSL traffic), the round robin gives quite bad results and overloads already loaded threads. Here instead the traffic remains very smooth since highly loaded threads will get less new connections. Of course it is still possible to continue to bind the sockets by hand, and it still gives slightly higher raw performance since it skips the incoming load balancing step. But when we're talking about hundreds of thousands of connections per second, most people don't care about a difference that sets limits 100 to 1000 times higher than their needs and I expect to see trivial configs re-appear over time. Another visible change in -dev2 is that the frontend's and global maxconn value are now automatically set. We're indeed seeing far too often people not set the global maxconn value, keeping an inappropriately low limit, and at the same time those like us who develop are used to see warnings all the time that their maxconn is too high for their ulimit. So this means that the default value of 2000 is suitable for nobody. So now what will be done when there's no maxconn is that the default value will be automatically calculated based on the number of FDs allocated to the process by "ulimit -n". This can be set in service settings on many systems so this is another resource limit that will not require a configuration change anymore. For example on systemd it seems to be LimitNOFILE. And now the frontend's default maxconn which most people don't set because they believe it's the same as the global maxconn, will be the global maxconn (configured or calculated). So this means that now, a config which doesn't set a maxconn will have the maximum number of possible conns set correctly by default. I've heard that maxconn was one of the most difficult setting to get right in Docker images, so let's hope that it will be much more turn-key now :-) This version also contains a significant number of fixes, par of which were already merged into 1.9.5 and others which I expect to see soon in 1.9.6. Among these fixes, we managed to address the trouble caused to the abortonclose option in 1.8 when H2 was introduced. In short, we now have a separate flag and don't pretend that an input stream is closed at the end of the request, we make the difference with an end of message input. We intend to backport this to the next 1.9 if no issue is reported, which I'm now confident in given the time we spent chasing various issues that this could
Re: DNS Resolver Issues
> > A reload of the HAProxy instance also forces the instances to query all > records from the resolver. > > Hi Bruno, Actually, this is true only when you don't use the 'resolvers' section or for the parameters who doesn't benefit from the resolvers section, here the 'addr' parameter. Baptiste
Re: DNS Resolver Issues
Hi all, Thanks @daniel for you very detailed report and @Piba for your help. As Piba pointed out, the issue is related to the 'addr' parameter. Currently, the only component in HAProxy which can benefit from dynamic resolution at run time is the 'server', which means any other object using a DNS hostname which does not resolve at start up may trigger an error, like you discovered with 'addr'. @Piba, feel free to fill up a feature request on github and Cc me there, so we can discuss this point. Baptiste On Sat, Mar 23, 2019 at 2:53 PM PiBa-NL wrote: > Hi Daniel, Baptiste, > > @Daniel, can you remove the 'addr loadbalancer-internal.xxx.yyy' from > the server check? It seems to me that that name is not being resolved by > the 'resolvers'. And even if it would it would be kinda redundant as it > is in the example as it is the same as the servername.?. Not sure how > far below scenarios are all explained by this though.. > > @Baptiste, is it intentional that a wrong 'addr' dns name makes haproxy > fail to start despite having the supposedly never failing > 'default-server init-addr last,libc,none' ? Is it possibly a good > feature request to support re-resolving a dns name for the addr setting > as well ? > > Regards, > PiBa-NL (Pieter) > > Op 21-3-2019 om 20:37 schreef Daniel Schneller: > > Hi! > > > > Thanks for the response. I had looked at the "hold" directives, but > since they all seem to have reasonable defaults, I did not touch them. > > I specified 10s explictly, but it did not make a difference. > > > > I did some more tests, however, and it seems to have more to do with the > number of responses for the initial(?) DNS queries. > > Hopefully these three tables make sense and don't get mangled in the > mail. The "templated" > > proxy is defined via "server-template" with 3 "slots". The "regular" one > just as "server". > > > > > > Test 1: Start out with both "valid" and "broken" DNS entries. Then > comment out/add back > > one at a time as described in (1)-(5). > > Each time after changing /etc/hosts, restart dnsmasq and check haproxy > via hatop. > > Haproxy started fresh once dnsmasq was set up to (1). > > > > | state state > > /etc/hosts | regular templated > > |- > > (1) BRK| UP/L7OK DOWN/L4TOUT > > VALID | MAINT/resolution > > | UP/L7OK > > | > > > > (2) BRK| DOWN/L4TOUT DOWN/L4TOUT > > #VALID | MAINT/resolution > > | MAINT/resolution > > | > > (3) #BRK | UP/L7OK UP/L7OK > > VALID | MAINT/resolution > > | MAINT/resolution > > | > > (4) BRK| UP/L7OK UP/L7OK > > VALID | DOWN/L4TOUT > > | MAINT/resolution > > | > > (5) BRK| DOWN/L4TOUT DOWN/L4TOUT > > #VALID | MAINT/resolution > > | MAINT/resolution > > > > This all looks normal and as expected. As soon as the "VALID" DNS entry > is present, the > > UP state follows within a few seconds. > > > > > > > > Test 2: Start out "valid only" (1) and proceed as described in (2)-(5), > again restarting > > dnsmasq each time, and haproxy reloaded after dnsmasq was set up to (1). > > > > | state state > > /etc/hosts | regular templated > > | > > (1) #BRK | UP/L7OK MAINT/resolution > > VALID | MAINT/resolution > > | UP/L7OK > > | > > (2) BRK| UP/L7OK DOWN/L4TOUT > > VALID | MAINT/resolution > > | UP/L7OK > > | > > (3) #BRK | UP/L7OK MAINT/resolution > > VALID | MAINT/resolution > > | UP/L7OK > > | > > (4) BRK| UP/L7OK DOWN/L4TOUT > > VALID | MAINT/resolution > > | UP/L7OK > > | > > (5) BRK| DOWN/L4TOUT
Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Hi Willy, I tested against the latest version in the haproxy source repo. Things got significantly worse. Even median latencies have shot up to 150ms ( compared to 4ms for haproxy 1.8) p99 shot up above 1second. One strange thing I observed in the stats page is nbthread shows up as 64 ( its 1 for HAProxy 1.8). I am using the exact same configuration across both versions. ctime , rtime are reporting higher values for 2.0 ( though 1.8 is fine against the same backends at the same time) -Ashwin Thanks, Ashwin On Fri, Mar 22, 2019 at 11:03 AM Ashwin Neerabail wrote: > Hey Willy, > > Thats great news. Thanks for the quick action. > I will verify and get back. > > Thanks, > Ashwin > > On Fri, Mar 22, 2019 at 10:19 AM Willy Tarreau wrote: > >> Hi Ashwin, >> >> We have found the root cause of this. The H2 streams were not getting >> the fairness they deserved due to their wake-up ordering : it happened >> very often that a stream interrupted on a ux buffer full condition could >> be placed at the end of the list and/or its place preempted by another >> stream trying to send for the first time. >> >> We've pushed all the fixes for this in 2.0-dev for now and I'll backport >> them to 1.9 early next week. It would be nice if you could give it a try >> to confirm that it's now OK for you. >> >> Cheers, >> Willy >> >
[PR] DOC: The option httplog is no longer valid in a backend.
Dear list! Author: Freddy Spierenburg Number of patches: 1 This is an automated relay of the Github pull request: DOC: The option httplog is no longer valid in a backend. Patch title(s): DOC: The option httplog is no longer valid in a backend. Link: https://github.com/haproxy/haproxy/pull/68 Edit locally: wget https://github.com/haproxy/haproxy/pull/68.patch && vi 68.patch Apply locally: curl https://github.com/haproxy/haproxy/pull/68.patch | git am - Description: Inside the Proxy keywords matrix it looks like the option httplog is stil valid within a backend. This is no longer the case, hence this updates the documentation. Instructions: This github pull request will be closed automatically; patch should be reviewed on the haproxy mailing list (haproxy@formilux.org). Everyone is invited to comment, even the patch's author. Please keep the author and list CCed in replies. Please note that in absence of any response this pull request will be lost.