Re: Upgrade from 1.7 to 2.0 = increased CPU usage
On Thu, Jul 25, 2019 at 02:36:49AM +0200, Elias Abacioglu wrote: > Hi Willy, > > This would explain the 503s > ``` > # change a 503 response into a 204(a friendly decline). > errorfile 503 /etc/haproxy/errors/204.http > > acl is_disable path_beg /getuid/rogue-ad-exchange > # http-request deny defaults to 403, change it to a 503, > # which is a masked 204 since haproxy doesn't have a 204 errorfile. > http-request deny deny_status 503 if is_disable > ``` > also > ``` > backend robotstxt > errorfile 503 /etc/haproxy/errors/200.robots.http > backend crossdomainxml > errorfile 503 /etc/haproxy/errors/200.crossdomain.http > backend emptygif > errorfile 503 /etc/haproxy/errors/200.emptygif.http > ``` > Basically I use 503 if I want to block a sender in a friendly way(i.e > making them believe we just declined the transaction) and to host 3 tiny > files, robots.txt, crossdomain.xml and empty.gif. But I'm pretty sure I've seen 503s *received* by haproxy, indicating that the next component sent them, so this cannot be the ones you produce by your configuration. > It felt excessive to setup redundant webservers for a total of 703 bytes of > files and also it felt wasteful to have it in the java backend. So I > cheated haproxys errorfiles. Oh don't worry you're not the only one to do that :-) I've even seen an auto-generated config using one backend per file and an error file matching the contents of each file of a directory, to replace a web server! > So I don't think that the 503 causes retries for our clients, it's just me > abusing haproxy. I'm really speaking about 503 being received by haproxy and delivered as 503 to the clients, not about 503s in the logs that in fact were rewritten differently. Look here : 10:51:13.776098 recvfrom(44797, "HTTP/1.1 503 Service Unavailable"..., 16320, 0, NULL, NULL) = 55 10:51:13.776184 recvfrom(19524, "HTTP/1.1 503 Service Unavailable"..., 16320, 0, NULL, NULL) = 55 10:51:13.776272 recvfrom(57869, "HTTP/1.1 503 Service Unavailable"..., 16320, 0, NULL, NULL) = 55 10:51:13.776391 recvfrom(35693, "HTTP/1.1 503 Service Unavailable"..., 16320, 0, NULL, NULL) = 55 10:51:13.776613 recvfrom(8041, "HTTP/1.1 503 Service Unavailable"..., 16320, 0, NULL, NULL) = 55 then: 10:51:13.844586 sendto(61292, "HTTP/1.1 503 Service Unavailable"..., 112, MSG_DO NTWAIT|MSG_NOSIGNAL, NULL, 0) = 112 10:51:13.844617 sendto(62213, "HTTP/1.1 503 Service Unavailable"..., 112, MSG_DO NTWAIT|MSG_NOSIGNAL, NULL, 0) = 112 10:51:13.844646 sendto(62685, "HTTP/1.1 503 Service Unavailable"..., 112, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 112 10:51:13.844672 sendto(65490, "HTTP/1.1 503 Service Unavailable"..., 112, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 112 So this is why I was asking. > We receive transactional requests, ad exchanges sending us requests. OK so such services generally do not retry. > Also real browsers connecting to us when cookie syncing. OK. > So the transactional we want to keep-alive so the clients sends multiple > http requests per connection. Of course. > And the browser clients we want to close the connection to the client after > it's request+response. > So the browser clients backend have "option forceclose". Which would > explain the short connections. OK, makes sense. > Currently we have "http-reuse safe" in the defaults section and "http-reuse > never" in a tcp mode listener that forwards all :443 traffic to another set > of haproxies that has more cores and does TLS termination. And this is to > not mess upp the X-Forward-For headers. There is no http-reuse in TCP mode, you probably even get a warning. > I will try "http-reuse always" in the defaults, but not in the tcp mode > listener as we rely on X-Forward-For. It must have no other effect than emitting a warning for your TCP mode. Additionally, the reuse is per request. So your XFF header will remain valid since each request will emit its own XFF header. Reuse is only about reusing a keep-alive connection. > Even if I get better performance it still wouldn't answer why the HAProxy > CPU usage (with same config) would increase with the same config in v1.7 > compared to v2.0. That's why I was asking about whether or not the 503 can induce client retries. > Assuming that the "http-reuse always" might help performance in 2.0, it's > not fair comparing a better performance tuned v2.0 vs a less tuned v1.7. That's not my goal. I want to make sure we're not accumulating lots of unused server-side connections in the server pools, which could in turn make the servers sick and deliver 503s. With reuse safe this can definitely happen, with reuse always it will not. In fact I'm really interested in knowing if you still receive lots of 503 like this, and if you have that many concurrent connections. In your trace I'm seeing file descriptors as high as 84000 approximately, and if for any reason this is not normal it could explain a difference. We could even imagine that there are connect retries
Send http 413 response
Hello list. I'm trying to send a HTTP 413 to the user based on the hdr(Content-Length). What I've tried so far: 1. Create a http413 backend only with `errorfile 400` + `http-request deny_status 400`. In the frontend, configure a `use_backend http413 if `. This is my current approach but it is wasting some time in the frontend for every single request of every single backend - we have about 1000 backends and only about 10% need to check Content Length - btw distinct content lengths. 2. Use the `errorfile 400` approach in the same backend that does the load balance. This doesn't sound good because I'm overwriting some internal response code and its payload. Btw, what if I need another 2, 3, 10 http response codes? 3. Use some creativity eg: errorfile 413 + deny_status 413; use_backend inside another backend; what more? The later doesn't make sense but the former is a pitty that isn't supported. Is there another way to deny a http request with a custom status and html content that I’m missing? Thanks! ~jm
Re: Upgrade from 1.7 to 2.0 = increased CPU usage
Hi Willy, This would explain the 503s ``` # change a 503 response into a 204(a friendly decline). errorfile 503 /etc/haproxy/errors/204.http acl is_disable path_beg /getuid/rogue-ad-exchange # http-request deny defaults to 403, change it to a 503, # which is a masked 204 since haproxy doesn't have a 204 errorfile. http-request deny deny_status 503 if is_disable ``` also ``` backend robotstxt errorfile 503 /etc/haproxy/errors/200.robots.http backend crossdomainxml errorfile 503 /etc/haproxy/errors/200.crossdomain.http backend emptygif errorfile 503 /etc/haproxy/errors/200.emptygif.http ``` Basically I use 503 if I want to block a sender in a friendly way(i.e making them believe we just declined the transaction) and to host 3 tiny files, robots.txt, crossdomain.xml and empty.gif. It felt excessive to setup redundant webservers for a total of 703 bytes of files and also it felt wasteful to have it in the java backend. So I cheated haproxys errorfiles. So I don't think that the 503 causes retries for our clients, it's just me abusing haproxy. We receive transactional requests, ad exchanges sending us requests. Also real browsers connecting to us when cookie syncing. So the transactional we want to keep-alive so the clients sends multiple http requests per connection. And the browser clients we want to close the connection to the client after it's request+response. So the browser clients backend have "option forceclose". Which would explain the short connections. Currently we have "http-reuse safe" in the defaults section and "http-reuse never" in a tcp mode listener that forwards all :443 traffic to another set of haproxies that has more cores and does TLS termination. And this is to not mess upp the X-Forward-For headers. I will try "http-reuse always" in the defaults, but not in the tcp mode listener as we rely on X-Forward-For. Even if I get better performance it still wouldn't answer why the HAProxy CPU usage (with same config) would increase with the same config in v1.7 compared to v2.0. Assuming that the "http-reuse always" might help performance in 2.0, it's not fair comparing a better performance tuned v2.0 vs a less tuned v1.7. Thanks Elias On Wed, Jul 24, 2019 at 8:07 PM Willy Tarreau wrote: > Hi Elias, > > On Wed, Jul 24, 2019 at 11:01:22AM +0200, Elias Abacioglu wrote: > > Hi Lukas, > > > > 2.0.3 still has the same issue, after 1-3 minutes it goes to using 100% > of > > it's available cores. > > I've created a new strace file. Will send it to you and Willy. > > Thanks for testing. I've looked at your trace. I'm not seeing any abnormal > behaviour there. However I'm seeing lots of 503 responses returned by the > server. Could it be that your client retries on 503, leading to an increase > of the load ? It could also possibly explain why this happens after some > time (i.e. if the servers start to fall after some time). > > Also I'm seeing that you're having a lot of short connections. Maybe you're > accumulating a large number of idle connections to the backend servers. > Could you please try to add "http-reuse always" to your backend(s) to see > if that improves the situation ? > > Thanks, > Willy >
2.0.3 High CPU Usage
Hello, I am currently running Haproxy 1.6.14-1ppa1~xenial-66af4a1 2018/01/06. There are many features that were implemented in 1.8, 1.9 and 2.0 that would benefit my deployments. I tested 2.0.3-1ppa1~xenial last night but unfortunately found it to be using excessive amounts of CPU and had to revert. For this implementation, I have two separate use cases in haproxy: first being external HTTP/HTTPS load balancing to a cluster from external clients, the second being HTTP internal load balancing between the two different applications (for simplicity sake we can call them front and back). The excessive CPU was noticed on the second implementation, HTTP between the front and back applications. I previously leveraged nbproc and cpu-map to isolate the use cases, but in 2.0 moved to nbthread (default) and cpu-map (auto) to isolate. The CPU usage was so excessive that I had to move the second implementation to two cores to not utilize 100% of the processer and still I was getting timeouts. It took some time to rewrite the config files from 1.6 to 2.0 but I was able to get them all configured properly and leveraged top and mpstat to ensure threads and use cases were on the proper cores. Because of the problems with usage case #2 I did not even get a chance to evaluate use case #1, but again, I use cpu-map and 'process' to isolate these use cases as much as possible. Upon reverting back to 1.6 (install and configs) everything worked as expected. Here is the CPU usage on 1.6 from mpstat -P ALL 5: 08:33:02 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 08:33:07 PM 0 7.48 0.00 16.63 0.00 0.00 0.00 0.00 0.00 0.00 75.88 Here is the CPU usage on 2.0.3 when using one thread: 08:29:35 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 08:29:40 PM 39 35.28 0.00 55.24 0.00 0.00 0.00 0.00 0.00 0.00 9.48 Here is the CPU usage on 2.0.3 when using two threads (the front application still experienced timeouts to the back application even without 100% cpu utilization on the cores): 08:30:48 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 08:30:53 PM 0 22.93 0.00 19.75 0.00 0.00 0.00 0.00 0.00 0.00 57.32 08:30:53 PM 39 21.60 0.00 25.10 0.00 0.00 0.00 0.00 0.00 0.00 53.29 Also, note, our front generally keeps connections open to our back for an extended period of time as it pools them internally, so many requests are sent over the connection via HTTP/1.1 keep-alive connections. I think we had roughly ~1000 connections established during these tests. Some configurations that might be relevant to your analysis (there are more but they are pretty much standard, such as user, group, stats, log, chroot, etc): global cpu-map auto:1/1-40 0-39 maxconn 50 spread-checks 2 server-state-file global server-state-base /var/lib/haproxy/ defaults option dontlognull option dontlog-normal option redispatch option tcp-smart-accept option tcp-smart-connect timeout connect 2s timeout client 50s timeout server 50s timeout client-fin 1s timeout server-fin 1s This part has been sanitized and I reduced the number of servers from 14 to 2. listen back bind 10.0.0.251:8080 defer-accept process 1/40 bind 10.0.0.252:8080 defer-accept process 1/40 bind 10.0.0.253:8080 defer-accept process 1/40 bind 10.0.0.254:8080 defer-accept process 1/40 mode http maxconn 65000 fullconn 65000 balance leastconn http-reuse safe source 10.0.1.100 option httpchk GET /ping HTTP/1.0 http-check expect string OK server s1 10.0.2.1:8080 check agent-check agent-port 8009 agent-inter 250ms inter 500ms fastinter 250ms downinter 1000ms weight 100 source 10.0.1.100 server s2 10.0.2.2:8080 check agent-check agent-port 8009 agent-inter 250ms inter 500ms fastinter 250ms downinter 1000ms weight 100 source 10.0.1.101 To configure multiple cores, I changed the bind line to add 'process 1/1' I also removed process 1/1 from the other use case. The OS is Ubuntu 16.04.3 LTS, procs are 2x E5-2630, 64GB of RAM. The output from haproxy -vv looked very typical between both, epoll, openssl 1.0.2g (not used in this case), etc. Please let me know if there is any additional information I can provide to assist in isolating the cause of this issue. Thank you! Nick
Re: Upgrade from 1.7 to 2.0 = increased CPU usage
Hi Elias, On Wed, Jul 24, 2019 at 11:01:22AM +0200, Elias Abacioglu wrote: > Hi Lukas, > > 2.0.3 still has the same issue, after 1-3 minutes it goes to using 100% of > it's available cores. > I've created a new strace file. Will send it to you and Willy. Thanks for testing. I've looked at your trace. I'm not seeing any abnormal behaviour there. However I'm seeing lots of 503 responses returned by the server. Could it be that your client retries on 503, leading to an increase of the load ? It could also possibly explain why this happens after some time (i.e. if the servers start to fall after some time). Also I'm seeing that you're having a lot of short connections. Maybe you're accumulating a large number of idle connections to the backend servers. Could you please try to add "http-reuse always" to your backend(s) to see if that improves the situation ? Thanks, Willy
Partnership - guest posts or sponsored content
Hello, I already tried to get in touch but didn't get any response... not sure if you receved this? If not interested, please just let me know. My name is Dennis and I'm a Blog Outreach specialist, I saw your blog on haproxy.com and decided to get in touch. I think that a lot of my clients will be interested in regular article placements, either paid guest posts (we provide an article) or sponsored content (you/your team write it for us). This could potentially bring some good regular income for you or you business. Would you be interested? Kind Regards, Dennis P. Outreach Specialist email: den...@blogoutreach.net BlogOutreach.net https://blogoutreach.net/
Cannot enable a config "disabled" frontend via socket command
Exactly this problem: https://www.mail-archive.com/haproxy@formilux.org/msg19356.html is still true for frontends, so I can't start a frontend in disabled mode and later on enable it via socket. Tested version: 1.8.19 in Debian buster. Best regards, Martin
load-server-state-from-file "automatic" transfer?
Hi! I have been looking into load-server-state-from file to prevent 500 errors being reported after a service reload. Currently we are seeing these, because the new instance comes up and first wants to see the minimum configured number of health checks for a backend server to succeed, before it hands requests to it. From what I can tell, the state file needs to be saved manually before a service reload, so that the new process coming up can read it back. I can do that, of course, but I was wondering what the reasoning was to not have this data transferred to a new process in a similar fashion as file handles or stick-tables (via peers)? Thanks a lot! Daniel -- Daniel Schneller Principal Cloud Engineer GPG key at https://keybase.io/dschneller CenterDevice GmbH Rheinwerkallee 3 53227 Bonn www.centerdevice.com __ Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael Rosbach, Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: DE-815299431 Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und evtl. beigefügte Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet. Pflichtinformationen gemäß Artikel 13 DSGVO Im Falle des Erstkontakts sind wir gemäß Art. 12, 13 DSGVO verpflichtet, Ihnen folgende datenschutzrechtliche Pflichtinformationen zur Verfügung zu stellen: Wenn Sie uns per E-Mail kontaktieren, verarbeiten wir Ihre personenbezogenen Daten nur, soweit an der Verarbeitung ein berechtigtes Interesse besteht (Art. 6 Abs. 1 lit. f DSGVO), Sie in die Datenverarbeitung eingewilligt haben (Art. 6 Abs. 1 lit. a DSGVO), die Verarbeitung für die Anbahnung, Begründung, inhaltliche Ausgestaltung oder Änderung eines Rechtsverhältnisses zwischen Ihnen und uns erforderlich ist (Art. 6 Abs. 1 lit. b DSGVO) oder eine sonstige Rechtsnorm die Verarbeitung gestattet. Ihre personenbezogenen Daten verbleiben bei uns, bis Sie uns zur Löschung auffordern, Ihre Einwilligung zur Speicherung widerrufen oder der Zweck für die Datenspeicherung entfällt (z. B. nach abgeschlossener Bearbeitung Ihres Anliegens). Zwingende gesetzliche Bestimmungen – insbesondere steuer- und handelsrechtliche Aufbewahrungsfristen – bleiben unberührt. Sie haben jederzeit das Recht, unentgeltlich Auskunft über Herkunft, Empfänger und Zweck Ihrer gespeicherten personenbezogenen Daten zu erhalten. Ihnen steht außerdem ein Recht auf Widerspruch, auf Datenübertragbarkeit und ein Beschwerderecht bei der zuständigen Aufsichtsbehörde zu. Ferner können Sie die Berichtigung, die Löschung und unter bestimmten Umständen die Einschränkung der Verarbeitung Ihrer personenbezogenen Daten verlangen. Details entnehmen Sie unserer Datenschutzerklärung (https://www.centerdevice.de/datenschutz/). Unseren Datenschutzbeauftragten erreichen Sie per E-Mail unter erdm...@sicdata.de. signature.asc Description: Message signed with OpenPGP
Subscribe
Subscribe
Re: Upgrade from 1.7 to 2.0 = increased CPU usage
Hi Lukas, 2.0.3 still has the same issue, after 1-3 minutes it goes to using 100% of it's available cores. I've created a new strace file. Will send it to you and Willy. Thanks, Elias On Tue, Jul 23, 2019 at 8:31 PM Lukas Tribus wrote: > Hello Elias, > > > could you try 2.0.3 please? > > > It was just released today and fixes a CPU hogging issue. > > > cheers, > lukas >
Re: FreeBSD CI builds fail
On Wed, Jul 24, 2019 at 10:01:33AM +0200, Tim Düsterhus wrote: > Am 24.07.19 um 05:55 schrieb Willy Tarreau: > > I also noticed the build failure but couldn't find any link to the build > > history to figure when it started to fail. How did you figure that the > > commit above was the first one ? > > While I did it as Ilya did by scrolling through GitHub's commit list, That was the least natural way for me to do it. Thank Ilya for the screenshot by the way. I clicked on the red cross, the the freebsd link reporting the failure, and searched the history there but couldn't find it. > there is also: > > Travis: https://travis-ci.com/haproxy/haproxy/builds > Cirrus: https://cirrus-ci.com/github/haproxy/haproxy Ah yes this one is more useful, that's what I was looking for. I just cannot figure how to reach it when I'm on the build status page :-/ > Keep in mind for both that only the current head after a push is being > built, so larger pushes might hide issues to CI. Of course! But the goal is not to build every single commit either but to detect early that something went wrong instead of discovering it after a version is released as we used to in the past. > In this specific case > the offending patch was pushed together with 7764a57d3292b6b4f1e488b > ("BUG/MEDIUM: threads: cpu-map designating a single") and only the > latter was tested. Yep! > > Ideally we'd need a level of failure in CI builds. Some should be just of > > level "info" and not cause a build error because we'd know they are likely > > to fail but are still interested in the logs. But I don't think we can do > > this. > > > > I'm not sure this is possible either, but I also don't think it's a good > idea, because then you get used to this kind of issue and ignore it. For > example this one would probably have been written off as "ah, it's just > flaky" instead of actually investigating what's wrong: > https://github.com/haproxy/haproxy/issues/118 It's true. But what is also true is that the tests are not meant to be run in the CI build environment but on developers' machines first. Being able to run in the CI env is a bonus. As a aside effect of some technical constraints imposed by such environments (slow VMs with flaky timings, host enforcing at least a little bit of security, etc) we do expect that some tests will randomly fail. These ones could be tagged as such and just report a counter of failures among the more or less expected ones. When you're used to see that 4 to 6 tests usually fail and suddenly you find 13 that have failed, you can be interested in having a look there, even if it's possibly to just start it again to confirm. And these ones should not fail at all in more controlled environments. There's nothing really problematic here in the end, this just constantly reminds us that not all tests can be automated. By the way maybe we could have some form of exclusion for tags instead of deciding that a test only belongs to one type. Because the reality is that we do *not* want to run certain tests. The most common ones we don't want to run locally are "slow" and "bug", which are already exclusive to each other. But by tagging tests with multiple labels we could then decide to exclude some labels during the build. And in this case we could tag some tests as "flaky-on-cirrus", "flaky-on-travis", "flaky-in-vm", "flaky-in-container", "flaky-firewall" etc and ignore them in such environments. Cheers, Willy
Re: FreeBSD CI builds fail
Willy, Am 24.07.19 um 05:55 schrieb Willy Tarreau: > I also noticed the build failure but couldn't find any link to the build > history to figure when it started to fail. How did you figure that the > commit above was the first one ? While I did it as Ilya did by scrolling through GitHub's commit list, there is also: Travis: https://travis-ci.com/haproxy/haproxy/builds Cirrus: https://cirrus-ci.com/github/haproxy/haproxy Keep in mind for both that only the current head after a push is being built, so larger pushes might hide issues to CI. In this specific case the offending patch was pushed together with 7764a57d3292b6b4f1e488b ("BUG/MEDIUM: threads: cpu-map designating a single") and only the latter was tested. >> This one fails because there's a L4 timeout, I can probably update the regex >> to >> take that into account, the interesting part is the failure and the step at >> which it fails, but for now we expect a connection failure and not a timeout. > > There's always the possibility (especially in CI environments) that some > rules are in place on the system to prevent connections to unexpected ports. > > Ideally we'd need a level of failure in CI builds. Some should be just of > level "info" and not cause a build error because we'd know they are likely > to fail but are still interested in the logs. But I don't think we can do > this. > I'm not sure this is possible either, but I also don't think it's a good idea, because then you get used to this kind of issue and ignore it. For example this one would probably have been written off as "ah, it's just flaky" instead of actually investigating what's wrong: https://github.com/haproxy/haproxy/issues/118 Best regards Tim Düsterhus
haproxy=2.0.3: ereq counter grow in tcp-mode since haproxy=2.0
Hi! I've mentioned that since moving from 1.9.8 to 2.0-branch of haproxy, ereq counter of frontend tcp-mode sections began to grow. I had zeroes in that counter before haproxy 2.0, now the number of "error requests" is much higher. Example: listen sample.service:1234 bind ipv6@xxx:yyy mode tcp balance leastconn timeout server 1h timeout client 1h option tcp-check default-server weight 1 inter 2s rise 3 server server1 server1:1234 weight 100 check server server2 server2:1234 weight 100 check "show errors" shows nothing: Total events captured on [24/Jul/2019:09:56:12.544] : 0 And there are no errors in my log file also. But look at the error-counters from the output of 'show stat': $ echo "show stat sample.service:1234 7 -1 typed"| sudo socat unix-connect:/var/run/haproxy.sock stdio | egrep -v :0$ F.194.0.0.pxname.1:KNS:str:sample.service:1234 F.194.0.1.svname.1:KNS:str:FRONTEND F.194.0.5.smax.1:MMP:u32:2 F.194.0.6.slim.1:CLP:u32:4096 F.194.0.7.stot.1:MCP:u64:37 F.194.0.12.ereq.1:MCP:u64:37 F.194.0.17.status.1:SGP:str:OPEN F.194.0.26.pid.1:KGP:u32:1 F.194.0.27.iid.1:KGS:u32:194 F.194.0.35.rate_max.1:MMP:u32:1 F.194.0.75.mode.1:CGS:str:tcp F.194.0.78.conn_rate_max.1:MMP:u32:1 F.194.0.79.conn_tot.1:MCP:u64:37 S.194.1.0.pxname.1:KNS:str:sample.service:1234 S.194.1.1.svname.1:KNS:str:server1:1234 S.194.1.17.status.1:SGP:str:UP S.194.1.18.weight.1:MaP:u32:100 S.194.1.19.act.1:SGP:u32:1 S.194.1.23.lastchg.1:MAP:u32:184 S.194.1.26.pid.1:KGP:u32:1 S.194.1.27.iid.1:KGS:u32:194 S.194.1.28.sid.1:KGS:u32:1 S.194.1.32.type.1:CGS:u32:2 S.194.1.36.check_status.1:MOP:str:L4OK S.194.1.55.lastsess.1:MAP:s32:-1 S.194.1.56.last_chk.1:MOP:str: S.194.1.65.check_desc.1:MOP:str:Layer4 check passed S.194.1.67.check_rise.1:CGS:u32:3 S.194.1.68.check_fall.1:CGS:u32:3 S.194.1.69.check_health.1:CGS:u32:5 S.194.1.73.addr.1:CGS:str:[zzz]:yyy S.194.1.75.mode.1:CGS:str:tcp S.194.2.0.pxname.1:KNS:str:sample.service:1234 S.194.2.1.svname.1:KNS:str:server2:1234 S.194.2.17.status.1:SGP:str:UP S.194.2.18.weight.1:MaP:u32:1 S.194.2.19.act.1:SGP:u32:1 S.194.2.23.lastchg.1:MAP:u32:184 S.194.2.26.pid.1:KGP:u32:1 S.194.2.27.iid.1:KGS:u32:194 S.194.2.28.sid.1:KGS:u32:2 S.194.2.32.type.1:CGS:u32:2 S.194.2.36.check_status.1:MOP:str:L4OK S.194.2.38.check_duration.1:MDP:u64:3 S.194.2.55.lastsess.1:MAP:s32:-1 S.194.2.56.last_chk.1:MOP:str: S.194.2.65.check_desc.1:MOP:str:Layer4 check passed S.194.2.67.check_rise.1:CGS:u32:3 S.194.2.68.check_fall.1:CGS:u32:3 S.194.2.69.check_health.1:CGS:u32:5 S.194.2.73.addr.1:CGS:str:[xxx]:yyy S.194.2.75.mode.1:CGS:str:tcp B.194.0.0.pxname.1:KNS:str:sample.service:1234 B.194.0.1.svname.1:KNS:str:BACKEND B.194.0.5.smax.1:MMP:u32:1 B.194.0.6.slim.1:CLP:u32:410 B.194.0.7.stot.1:MCP:u64:37 B.194.0.17.status.1:SGP:str:UP B.194.0.18.weight.1:MaP:u32:101 B.194.0.19.act.1:MGP:u32:2 B.194.0.23.lastchg.1:MAP:u32:184 B.194.0.26.pid.1:KGP:u32:1 B.194.0.27.iid.1:KGS:u32:194 B.194.0.32.type.1:CGS:u32:1 B.194.0.35.rate_max.1:MGP:u32:1 B.194.0.49.cli_abrt.1:MCP:u64:37 B.194.0.55.lastsess.1:MAP:s32:-1 B.194.0.75.mode.1:CGS:str:tcp B.194.0.76.algo.1:CGS:str:leastconn How can I find out the real reason for these errors? Could it happen if a client uses RST instead of a regular FIN/ACK sequence to close the session? Example of such behavior (tcp-connect probe from keepalived to haproxy): 10:42:48.086780 IP6 client.43930 > haproxy.12346: Flags [S], seq 4136298350, win 28800, options [mss 1440,nop,nop,sackOK,nop,wscale 7], length 0 10:42:48.086837 IP6 haproxy.12346 > client.43930: Flags [S.], seq 2234519198, ack 4136298351, win 26520, options [mss 8840,nop,nop,sackOK,nop,wscale 11], length 0 10:42:48.087177 IP6 client.43930 > haproxy.12346: Flags [.], ack 1, win 225, length 0 10:42:48.087181 IP6 client.43930 > haproxy.12346: Flags [R.], seq 1, ack 1, win 225, length 0 $ haproxy -vvv HA-Proxy version 2.0.3-1 2019/07/24 - https://haproxy.org/ Build options : TARGET = linux-glibc CPU = generic CC = gcc CFLAGS = -O2 -g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_REGPARM=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_TFO=1 USE_SYSTEMD=1 Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS Default settings : bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with multi-threading support
Haproxy reload and maps
Hi, We are using maps extensively in our architecture to map host headers to backends. The maps are seeded dynamically with a lua handler to an external service as requests arrive, there are no pre-seeded values in the map, the physical map file is empty On haproxy reload at peak traffic, the maps are emptied and I guess that is expected. But this causes a stampede to the external service which causes some failures. Is there a way to prevent emptying of the map when we do an haproxy reload? Thanks Sachin
Re: FreeBSD CI builds fail
ср, 24 июл. 2019 г. в 08:55, Willy Tarreau : > Hi guys, > > On Tue, Jul 23, 2019 at 08:37:37PM +0200, Jerome Magnin wrote: > > On Tue, Jul 23, 2019 at 07:09:57PM +0200, Tim Düsterhus wrote: > > > Jérôme, > > > Ilya, > > > > > > I noticed that FreeBSD CI fails since > > > > https://github.com/haproxy/haproxy/commit/885f64fb6da0a349dd3182d21d337b528225c517 > . > > > > > > > > > One example is here: https://github.com/haproxy/haproxy/runs/169980019 > > I also noticed the build failure but couldn't find any link to the build > history to figure when it started to fail. How did you figure that the > commit above was the first one ? > [image: Screenshot from 2019-07-24 11-43-30.png] > > > This one fails because there's a L4 timeout, I can probably update the > regex to > > take that into account, the interesting part is the failure and the step > at > > which it fails, but for now we expect a connection failure and not a > timeout. > > There's always the possibility (especially in CI environments) that some > rules are in place on the system to prevent connections to unexpected > ports. > > Ideally we'd need a level of failure in CI builds. Some should be just of > level "info" and not cause a build error because we'd know they are likely > to fail but are still interested in the logs. But I don't think we can do > this. > > Willy >