Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)
Using poll (startup with -dk) the request works properly. Op vr 13 apr. 2018 08:29 schreef Willy Tarreau : > Hi Pieter, > > On Fri, Apr 13, 2018 at 01:12:41AM +0200, PiBa-NL wrote: > > To clarify the issue is not that haproxy uses cpu by looping, the issue > is > > that haproxy prevents the page from loading in the browser. The 'fix' on > the > > old version after the commit introducing the issue was to call the EV_SET > > write delete *less* often. Or maybe my understanding of what is does is > just > > wrong :). > > Ah, sorry, maybe I'm mixing multiple problems, I understood it was looping. > So if it's indeed missing events, it's different. Maybe we have an issue > with a sequence of enable/disable on the polling, causing it to be disabled > and not re-enabled sometimes. If you can test with "-dk" to disable kqueue > and switch to poll, that will be great because I'll be able to test under > the same conditions. And if it doesn't fail it will tell us it's really > related to the kqueue poller. > > Willy >
Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)
Hi Pieter, On Fri, Apr 13, 2018 at 01:12:41AM +0200, PiBa-NL wrote: > To clarify the issue is not that haproxy uses cpu by looping, the issue is > that haproxy prevents the page from loading in the browser. The 'fix' on the > old version after the commit introducing the issue was to call the EV_SET > write delete *less* often. Or maybe my understanding of what is does is just > wrong :). Ah, sorry, maybe I'm mixing multiple problems, I understood it was looping. So if it's indeed missing events, it's different. Maybe we have an issue with a sequence of enable/disable on the polling, causing it to be disabled and not re-enabled sometimes. If you can test with "-dk" to disable kqueue and switch to poll, that will be great because I'll be able to test under the same conditions. And if it doesn't fail it will tell us it's really related to the kqueue poller. Willy
Problem installing 1.8.7 -- systemd changes
I have a script on my system that I use to handle compiling and installing a new haproxy version. That script has "EXTRA=haproxy-systemd-wrapper"on the line that does the install. It looks like that's no longer part of haproxy, and that the systemd service definition (included in contrib) just calls haproxy directly. USE_SYSTEMD must be set on the compile command when using the contrib systemd script, and the systemd development library must be present. But I don't see anything in CHANGELOG or README that talks about it at all, and definitely nothing saying that the wrapper has been removed, and nothing mentioning the new compile requirements. Was there somewhere else that I should have looked for information about how to get 1.8 working with systemd? Thanks, Shawn
Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)
Hi Willy, And a second mail as i just thought of one extra thing you wrote that maybe i misunderstand or perhaps confused you with a small remark about cpu usage in my earlier mail (that was a side effect of my other earlier but totally wrong code change..). I'm suspecting we could have something wrong with the polled_mask, maybe sometimes it's removed too early somewhere, preventing the delete(write) from being performed, which would explain why it loops. To clarify the issue is not that haproxy uses cpu by looping, the issue is that haproxy prevents the page from loading in the browser. The 'fix' on the old version after the commit introducing the issue was to call the EV_SET write delete *less* often. Or maybe my understanding of what is does is just wrong :). Op 13-4-2018 om 0:57 schreef PiBa-NL: Hi Willy, Op 13-4-2018 om 0:22 schreef Willy Tarreau: I'm suspecting we could have something wrong with the polled_mask, maybe sometimes it's removed too early somewhere, preventing the delete(write) from being performed, which would explain why it loops. By the way you must really not try to debug an old version but stick to the latest fixes. Okay testing from now on with current master, just thought it would be easier to backtrack if i knew what particular new/missing event would possibly cause it. And it could have been simpler to find a fix just after the problem was introduced, but it seems it ain't that simple :). I'm seeing two things that could be of interest to test : - remove the two "if (fdtab[fd].polled_mask & tid_bit)" conditions to delete the events. It will slightly inflate the list of events but not that much. If it fixes the problem it means that the polled_mask is sometimes wrong. Please do that with the updated master. Removing the 'if polled_mask' does not fix the issue, in fact that makes it worse. the "srvrep[0007:0008]: HTTP/1.1 401 Unauthorized" is also not shown anymore without those checks.. - switch to poll() just to see if you have the same so that we can figure if only the kqueue code triggers the issue. poll() doesn't rely on polled_mask at all. Using poll (startup with -dk) the request works properly. Many thanks for your tests. Willy Regards, PiBa-NL (Pieter) Regards, PiBa-NL (Pieter)
Health Checks not run before attempting to use backend
Hi we're evaluating haproxy for use as the load balancer in front of our mesos cluster. What we are finding is that even though we have requested the check option in the server line, haproxy attempts to serve traffic to the server on startup until the first healthcheck completes. server slot1 10.40.40.2:7070 check inter 1000 rise 3 fall 3 maxconn 32 This is because we are adding servers to haproxy as they are started in mesos, but before our backend application itself is ready to serve connections. This results in spurious 503's being handed to clients as we add backends via the admin socket or haproxy restart. I looked into possibly forcing a healthcheck during the cfgparse constructors, but that seems like it would require some significant rearchitecting. Is there a way to force haproxy to not use a backend until it passes a healthcheck? I'm also worried about the side affects this might cause as requests start to queue up in the haproxy. Thanks, Dave
Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)
Hi Willy, Op 13-4-2018 om 0:22 schreef Willy Tarreau: By the way you must really not try to debug an old version but stick to the latest fixes. Okay testing from now on with current master, just thought it would be easier to backtrack if i knew what particular new/missing event would possibly cause it. And it could have been simpler to find a fix just after the problem was introduced, but it seems it ain't that simple :). I'm seeing two things that could be of interest to test : - remove the two "if (fdtab[fd].polled_mask & tid_bit)" conditions to delete the events. It will slightly inflate the list of events but not that much. If it fixes the problem it means that the polled_mask is sometimes wrong. Please do that with the updated master. Removing the 'if polled_mask' does not fix the issue, in fact that makes it worse. the "srvrep[0007:0008]: HTTP/1.1 401 Unauthorized" is also not shown anymore without those checks.. - switch to poll() just to see if you have the same so that we can figure if only the kqueue code triggers the issue. poll() doesn't rely on polled_mask at all. Using poll (startup with -dk) the request works properly. Many thanks for your tests. Willy Regards, PiBa-NL (Pieter)
Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)
Hi Pieter, On Fri, Apr 13, 2018 at 12:05:26AM +0200, PiBa-NL wrote: > Hi Willy, > > Okay did some more digging.. > > before and after the 'offending' commit the EV_SET calls are as follows in > attached screenshot, these seem to be the main cause of some things going > wrong. > (I think also without using http-tunnel there is some things that dont fully > work right, but i havn't investigated that to get something reproducible > though, and possibly caused by the same kqueue issue..) The current issue is > 100% reproducible sofar in a (for me) easy way. OK that's nice (except for you of course). > I made some changes to haproxy's output, and captured and compared the > output. > The 'START State' is written and also the eo and en variables are added. to > the output. > the READ ADD/DEL WRITE ADD/DEL output is written with fprintf statements > just before the actual EV_SET call.. > Left side 'WORKS', right side doesn't.. > > As you can see there are some different calls to the EV_SET with the > different eo and en states.. I'm suspecting we could have something wrong with the polled_mask, maybe sometimes it's removed too early somewhere, preventing the delete(write) from being performed, which would explain why it loops. > I could 'fix' the behavior after the offending commit by adding back a check > for the eo in the second screenshot.. (At least for this 1 particular page > request..) > > But in current master that 'eo' variable nolonger exists. I'm kinda at the > end of my abilities here.. :) Indeed, it took us years to get rid of this duplicate information! And as you can see apparently doing so possibly revealed that some corner cases were still relying on it. By the way you must really not try to debug an old version but stick to the latest fixes. It's too easy to be caught by another issue that was since fixed. The fact that you could put "eo" there in your test indicates to me that the test was made on a commit prior to its removal. I just want to be sure you don't waste your time. > I hope at least some of it makes sense to you? In part, yes! I have to sit down in front of it and scratch my head, but my context switches are rather hard, so I'm very happy that you provide such detailed information! I'm seeing two things that could be of interest to test : - remove the two "if (fdtab[fd].polled_mask & tid_bit)" conditions to delete the events. It will slightly inflate the list of events but not that much. If it fixes the problem it means that the polled_mask is sometimes wrong. Please do that with the updated master. - switch to poll() just to see if you have the same so that we can figure if only the kqueue code triggers the issue. poll() doesn't rely on polled_mask at all. Many thanks for your tests. Willy
Re: HAProxy 1.8.X crashing
On Thu, Apr 12, 2018 at 01:14:58PM +0200, Olivier Houchard wrote: > > > @@ -3718,6 +3719,8 @@ int http_process_request(struct stream *s, struct > > > channel *req, int an_bit) > > > > > > return 0; > > > } > > > + /* XXX: We probably need a better mux */ > > > + conn_install_mux(conn, &mux_pt_ops, objt_cs(s->si[1].end)); > > > > > > path = http_get_path(txn); > > > url2sa(req->buf->p + msg->sl.rq.u, > > > > While I can understand how missing this can have caused trouble, I'm not > > sure it's the best solution, and I'm a bit worried about having various > > code places choose a mux and install it. I suspect we're "simply" missing > > a test in the backend code to cover the case where the connection already > > exists but a mux was not yet installed (a situation that didn't exist > > prior to muxes which is why we didn't notice this). I think that this > > happens in connect_server() around line 1177 (return SF_ERR_INTERNAL) > > when conn_xprt_ready() indicates the transport was not yet initialized. > > It's likely that the error unrolling fails to consider the case where > > the mux wasn't initialized, leading to the crash Praveen experienced. > > > > I'm not sure about this. My point was to try to install the mux when we can > choose one, the day we have more relevant muxes, if it ever happens. > This certainly fixes Praveen's crash. I totally understand, I just claim that I think we're likely buying a carpet to put it on the dust instead of removing the dust. It's more of a conceptual issue : you're forced to select an arbitrary mux at a moment where you have no idea which one will be needed for this outgoing connection. In 1.9 it could be H2, later it could be FCGI, etc. And generally when we have to fill arbitrary information, we pay it the high price later because it creates unnatural bonds between some unrelated elements. The single fact that we're forced to pick an arbitrary mux to avoid a crash somewhere else tells me we have an issue. If one specific code path doesn't support being called with a muxless connection, this same code path might also be triggered when the malloc() to create the mux fails and we want to gracefully abort and try to close this muxless connection after sending a 500 Server Error status. Also, the only reason why we create the connection this early is that we need a place where to put the IP addresses, and given that connections are small (only very slightly larger than the addresses they convey), it totally makes sense to pre-populate the address there. If we start to add more complex muxes, we'll even possibly end up allocating more resources (think h2+hpack contexts), trying to connect and to initiate a handshake to a target before the rest is finished or whatever funny stuff. Now don't get me wrong, it would very well be that for now, given the current state of the code, we're missing many checks and this solution will be an efficient short-term fix. But for the reasons above it doesn't sound logical to me. Maybe we're simply missing an "if (conn->mux)" or "if (cs->mux)" somewhere on the free path before calling something for the mux because it was assumed that it did obvously exist. > I don't even know how we could reach line 1177. I'm uncertain to be honest, I just think that there is a small possibility if the connection exists and not the transport, but I could be wrong. > > > If this is right, it would mean two fixes : > > - one in the error unrolling path to ensure we don't destroy a non > > allocated mux ; > > And what do we do then ? Without the mux we can't destroy the cs nor the > connection. That may be the problem. But if a mux fails to initialize, then we can't destroy them either ? Is cs_destroy() the problem here maybe ? If so, I suspect that this approach would be more robust and would better match the validity tests on conn->mux that are performed at many places in the connection code : if (cs->conn->mux) { cs->conn->mux->detach(cs); } else { conn_stop_tracking(cs->conn); conn_xprt_close(cs->conn); conn_free(cs->conn); } cs_free(cs); > > - one above the return SF_ERR_INTERNAL above in connect_server() to > > handle the case where the connection already exists but not the mux. > > I'm not sure how this can happen, and what we should do if that happens. At least in the current situation we pre-create a connection just to start to fill some information (source/destination address), without knowing the mux yet. It's an initialization process, it may look beautiful or ugly depending on the perspective, but in any case we must ensure that the code overall remains reliable in this situation. And to me it looks less ugly than picking a possibly wrong mux just to have a valid pointer avoiding a segfault. But if it turns out that stabilizing the destroy path (or error path) is not suffici
Re: [PATCH][REORG/MINOR]: config: Run postparser once per section instance
On Thu, Apr 12, 2018 at 3:06 PM, Willy Tarreau wrote: > Hi Ben, > > On Thu, Apr 12, 2018 at 02:25:58PM -0600, Ben Draut wrote: > > This changes the parser to run section postparsers once per section > > instance, rather than only when the section type changes. > > > > This is motivated by the work discussed in > > https://www.mail-archive.com/haproxy@formilux.org/msg29527.html. It > should > > make it easy to produce the warning mentioned in the summary at most once > > per section. > > Hmmm I don't much like this principle, because it's not a post-section > parser anymore in this case, it's a post config parser. Note that it is > very possible that you need a different type of post-processing to help > with the resolvers stuff, but I really think this approach is mistaken > because it removes the ability to perform post-section processing. From > what I'm seeing by quickly grepping in the code, at least the cache > relies on this. > Ah, I missed the cache usage, thanks for pointing that out. > I *think* we do have a way to register a post-config call, but to be > honest I really don't remember, as we've added a few such hooks over > time. Maybe you'd need to call this thing before entering > check_config_validity() which resolves every cross-dependency between > the config elements and which detect inconsistencies. We may imagine > having a pre_final_check and a post_final_check callback for this maybe. Great, I'll dig some more and give it some more thought. Thanks!
Re: [PATCH][REORG/MINOR]: config: Run postparser once per section instance
Hi Ben, On Thu, Apr 12, 2018 at 02:25:58PM -0600, Ben Draut wrote: > This changes the parser to run section postparsers once per section > instance, rather than only when the section type changes. > > This is motivated by the work discussed in > https://www.mail-archive.com/haproxy@formilux.org/msg29527.html. It should > make it easy to produce the warning mentioned in the summary at most once > per section. Hmmm I don't much like this principle, because it's not a post-section parser anymore in this case, it's a post config parser. Note that it is very possible that you need a different type of post-processing to help with the resolvers stuff, but I really think this approach is mistaken because it removes the ability to perform post-section processing. From what I'm seeing by quickly grepping in the code, at least the cache relies on this. I *think* we do have a way to register a post-config call, but to be honest I really don't remember, as we've added a few such hooks over time. Maybe you'd need to call this thing before entering check_config_validity() which resolves every cross-dependency between the config elements and which detect inconsistencies. We may imagine having a pre_final_check and a post_final_check callback for this maybe. Just a few ideas. Thanks! Willy
[PATCH][REORG/MINOR]: config: Run postparser once per section instance
This changes the parser to run section postparsers once per section instance, rather than only when the section type changes. This is motivated by the work discussed in https://www.mail-archive.com/haproxy@formilux.org/msg29527.html. It should make it easy to produce the warning mentioned in the summary at most once per section. Thanks, Ben 0001-REORG-MINOR-config-Run-postparser-once-per-section-i.patch Description: Binary data
Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)
Hi Willy, Op 12-4-2018 om 1:19 schreef Willy Tarreau: Thank you very much for pointing the exact line that causes you trouble. Well exact line.. probably not the right one. And yes just removing that line indeed breaks something else. (as expected..) Would you have the ability to try the latest 1.9-dev just by chance ? Yes when i started compiling from source, thats where i started, the current master branch. Though (sadly) it has the same broken effect. Ill try and investigate a bit more.. Though if you think of a possible patch, i'm happy to try anything out ;) Regards, PiBa-NL (Pieter)
Question regarding haproxy backend behaviour
Hi, I have a question regarding haproxy backend connection behaviour. We have following setup: +-+ +---+ | haproxy |>| nginx | +-+ +---+ We use a haproxy cluster for ssl off-loading and then load balance request to nginx cluster. We are currently benchmarking this setup with 3 nodes for haproxy cluster and 1 nginx node. Each haproxy node has two frontend/backend pair. First frontend is a router for ssl connection which redistributes request to the second frontend in the haproxy cluster. The second frontend is for ssl handshake and routing requests to nginx servers. Our configuration is as follows: ``` global maxconn 10 user haproxy group haproxy nbproc 2 cpu-map 1 1 cpu-map 2 2 defaults mode http option forwardfor timeout connect 5s timeout client 30s timeout server 30s timeout tunnel 30m timeout client-fin 5s frontend ssl_sess_id_router bind *:443 bind-process 1 mode tcp maxconn 10 log global option tcp-smart-accept option splice-request option splice-response default_backend ssl_sess_id_router_backend backend ssl_sess_id_router_backend bind-process 1 mode tcp fullconn 5 balance roundrobin .. option tcp-smart-connect server lbtest01 :8443 weight 1 check send-proxy server lbtest02 :8443 weight 1 check send-proxy server lbtest03 :8443 weight 1 check send-proxy frontend nginx_ssl_fe bind *:8443 ssl maxconn 10 bind-process 2 option tcp-smart-accept option splice-request option splice-response option forwardfor reqadd X-Forwarded-Proto:\ https timeout client-fin 5s timeout http-request 8s timeout http-keep-alive 30s default_backend nginx_backend backend nginx_backend bind-process 2 balance roundrobin http-reuse safe option tcp-smart-connect option splice-request option splice-response timeout tunnel 30m timeout http-request 8s timeout http-keep-alive 30s server testnginx :80 weight 1 check ``` The nginx node has nginx with 4 workers and 8192 max clients, therefore the max number of connection it can accept is 32768. For benchmark, we are generating ~3k new connections per second where each connection makes 1 http request and then holds the connection for next 30 seconds. This results in a high established connection on the first frontend, ssl_sess_id_router, ~25k per haproxy node (Total ~77k connections on 3 haproxy nodes). The second frontend (nginx_ssl_fe) receives the same number of connection on the frontend. On nginx node, we see that active connections increase to ~32k. Our understanding is that haproxy should keep a 1:1 connection mapping for each new connection in frontend/backend. But there is a connection count mismatch between haproxy and nginx (Total 77k connections in all 3 haproxy for both frontends vs 32k connections in nginx made by nginx_backend), We are still not facing any major 5xx or connection errors. We are assuming that this is happening because haproxy is terminating old idle ssl connections to serve the new ones. We have following questions: 1. How the nginx_backend connections are being terminated to serve the new connections? 2. Why haproxy is not terminating connections on the frontend to keep it them at 32k for 1:1 mapping? Thanks Ayush Goyal
Re: Segfault in haproxy v1.8 with Lua
But using an applet on a request will prevent the request from being sent to the backend servers; I still want backend servers to receive the request. On Thu, Apr 12, 2018 at 9:16 AM, Christopher Faulet wrote: > Le 12/04/2018 à 14:51, Hessam a écrit : > >> Thanks for your response Christopher. >> >> What I want is to overwrite each http response with another custom >> response. I'm not talking about manipulating headers only; I want to >> overwrite the whole response. >> >> > AKAIK, using Lua it is no possible. I guess you want to alter the > response. If so, the only way to achieve this is to write a filter to do > what you want. But the current API is far to be easy to use, especially for > the body rewriting. We planned to make things simpler but it takes time and > needs internal (and important) refactoring first. > > On the other hand, if you just need to generate a custom response on some > criteria of the request, you can use a Lua applet. > > -- > Christopher Faulet >
Re: DNS resolver and mixed case responses
At the moment, AWS's provided DNS servers and Route53 appear to always match the question case in the answer. (As do Google's DNS servers) Bind seems to be the odd man out in not doing that by default. On Thu, Apr 12, 2018 at 7:08 AM, Jim Freeman wrote: > It will be important to know which behavior AWS's Route53/DNS servers use ? > > Using stock Debian/Stretch BIND9 (1:9.10.3.dfsg.P4-12.3+deb9u4), we see > haproxy downing backend servers with > "Server is going DOWN for maintenance (unspecified DNS error)." > https://github.com/haproxy/haproxy/search?q=unspecified+dns+error > > We're expecting/testing to see if bind9's "no-case-compress { any; }" > directive > addresses this, but many folks do not control their DNS services (and as > requisite > AWS/Route53 capabilities mature, neither will we). > > > On Tue, Apr 10, 2018 at 3:11 PM, Ben Draut wrote: > >> It's interesting that the default behavior of HAProxy resolvers can >> conflict with the default behavior of bind. (If you're unlucky with >> whatever bind has cached) >> >> By default, bind uses case-insensitive compression, which can cause it to >> use a different case in the ANSWER than in the QUESTION. (See >> 'no-case-compress': https://ftp.isc.org/isc/bind9/cur/9.9/do >> c/arm/Bv9ARM.ch06.html) We were impacted by this recently. >> >> Also interesting: https://indico.dns-oarc.net/event/20/session/2/ >> contribution/12/material/slides/0.pdf >> >> >> On Mon, Apr 9, 2018 at 2:12 AM, Baptiste wrote: >> >>> So, it seems that responses that does not match the case should be >>> dropped: >>> https://twitter.com/PowerDNS_Bert/status/983254222694240257 >>> >>> Baptiste >>> >> >> >
Re: Segfault in haproxy v1.8 with Lua
Le 12/04/2018 à 14:51, Hessam a écrit : Thanks for your response Christopher. What I want is to overwrite each http response with another custom response. I'm not talking about manipulating headers only; I want to overwrite the whole response. AKAIK, using Lua it is no possible. I guess you want to alter the response. If so, the only way to achieve this is to write a filter to do what you want. But the current API is far to be easy to use, especially for the body rewriting. We planned to make things simpler but it takes time and needs internal (and important) refactoring first. On the other hand, if you just need to generate a custom response on some criteria of the request, you can use a Lua applet. -- Christopher Faulet
Re: DNS resolver and mixed case responses
It will be important to know which behavior AWS's Route53/DNS servers use ? Using stock Debian/Stretch BIND9 (1:9.10.3.dfsg.P4-12.3+deb9u4), we see haproxy downing backend servers with "Server is going DOWN for maintenance (unspecified DNS error)." https://github.com/haproxy/haproxy/search?q=unspecified+dns+error We're expecting/testing to see if bind9's "no-case-compress { any; }" directive addresses this, but many folks do not control their DNS services (and as requisite AWS/Route53 capabilities mature, neither will we). On Tue, Apr 10, 2018 at 3:11 PM, Ben Draut wrote: > It's interesting that the default behavior of HAProxy resolvers can > conflict with the default behavior of bind. (If you're unlucky with > whatever bind has cached) > > By default, bind uses case-insensitive compression, which can cause it to > use a different case in the ANSWER than in the QUESTION. (See > 'no-case-compress': https://ftp.isc.org/isc/bind9/cur/9.9/ > doc/arm/Bv9ARM.ch06.html) We were impacted by this recently. > > Also interesting: https://indico.dns-oarc.net/event/20/session/ > 2/contribution/12/material/slides/0.pdf > > > On Mon, Apr 9, 2018 at 2:12 AM, Baptiste wrote: > >> So, it seems that responses that does not match the case should be >> dropped: >> https://twitter.com/PowerDNS_Bert/status/983254222694240257 >> >> Baptiste >> > >
Re: Segfault in haproxy v1.8 with Lua
Thanks for your response Christopher. What I want is to overwrite each http response with another custom response. I'm not talking about manipulating headers only; I want to overwrite the whole response. Thanks, Seyed > On Apr 12, 2018, at 8:44 AM, Christopher Faulet wrote: > >> Le 11/04/2018 à 16:11, Hessam Mirsadeghi a écrit : >> Hi Christopher, >> You're right; that segfault happens with the build at the faulty commit and >> not later versions such as v1.8.5. >> However, version v1.8.5 does segfault with the attached modified Lua script. >> As far as I can tell, the problem arises after any call to "txn.res:set()". >> In the attached Lua script, if you remove the call to either of >> "txn.res:set(txn.res:get())" or "txn.res:forward(txn.res:get_in_len())", the >> segfault will disappear. >> Also, when I only have a call to "txn.res:set(txn.res:get())" in the script, >> haproxy becomes unresponsive to all but the first request on each persistent >> connection. That is, something like "curl -sig localhost:80 localhost:80" >> will only get the response for the first request; the second one times out >> on the existing connection and succeeds only on a a second connection >> established by curl. > > So, with this script, I have a segfault. But not only with HAProxy 1.8 and > higher. It also crashes on HAProxy 1.7. Actually, It will crash on all > versions. This is a Lua problem not a HTTP one. > > IMHO, "Channel.set" or "Channel.get" must never be used on HTTP messages. > These functions totally hijack the HTTP parser. There is an attempt to > restore the HTTP parser state. But this is only partially done (the headers' > analysis is done, but not all of the ensuing analysis). So the result is > undefined. It will seem to work for simpler cases but will fail most of time. > Same remarks are true for "Channel.getline" and "Channel.append". > > "Channel.forward" (and "Channel.send") must also never be used on HTTP > messages. When called, the messages analysis is in progress. The HTTP parser > expects to have all the headers at this stage. Mixing message analysis and > data forward is totally unexpected. > > I will let Thierry (the Lua maintainer) decide what to do. But, I guess he > will restrict usage of all these functions to TCP streams only. > > So now, maybe you can describe what you try to do. There is probably another > way to achieve it. I hope so, because it will never work this way. > > -- > Christopher Faulet
Re: Segfault in haproxy v1.8 with Lua
Le 11/04/2018 à 16:11, Hessam Mirsadeghi a écrit : Hi Christopher, You're right; that segfault happens with the build at the faulty commit and not later versions such as v1.8.5. However, version v1.8.5 does segfault with the attached modified Lua script. As far as I can tell, the problem arises after any call to "txn.res:set()". In the attached Lua script, if you remove the call to either of "txn.res:set(txn.res:get())" or "txn.res:forward(txn.res:get_in_len())", the segfault will disappear. Also, when I only have a call to "txn.res:set(txn.res:get())" in the script, haproxy becomes unresponsive to all but the first request on each persistent connection. That is, something like "curl -sig localhost:80 localhost:80" will only get the response for the first request; the second one times out on the existing connection and succeeds only on a a second connection established by curl. So, with this script, I have a segfault. But not only with HAProxy 1.8 and higher. It also crashes on HAProxy 1.7. Actually, It will crash on all versions. This is a Lua problem not a HTTP one. IMHO, "Channel.set" or "Channel.get" must never be used on HTTP messages. These functions totally hijack the HTTP parser. There is an attempt to restore the HTTP parser state. But this is only partially done (the headers' analysis is done, but not all of the ensuing analysis). So the result is undefined. It will seem to work for simpler cases but will fail most of time. Same remarks are true for "Channel.getline" and "Channel.append". "Channel.forward" (and "Channel.send") must also never be used on HTTP messages. When called, the messages analysis is in progress. The HTTP parser expects to have all the headers at this stage. Mixing message analysis and data forward is totally unexpected. I will let Thierry (the Lua maintainer) decide what to do. But, I guess he will restrict usage of all these functions to TCP streams only. So now, maybe you can describe what you try to do. There is probably another way to achieve it. I hope so, because it will never work this way. -- Christopher Faulet
Re: HAProxy 1.8.X crashing
Hi Willy, On Thu, Apr 12, 2018 at 08:53:51AM +0200, Willy Tarreau wrote: > Hi Olivier, > > On Wed, Apr 11, 2018 at 05:29:15PM +0200, Olivier Houchard wrote: > > From 7c9f06727cf60acf873353ac71283ff9c562aeee Mon Sep 17 00:00:00 2001 > > From: Olivier Houchard > > Date: Wed, 11 Apr 2018 17:23:17 +0200 > > Subject: [PATCH] BUG/MINOR: connection: Setup a mux when in proxy mode. > > > > We were allocating a new connection when in proxy mode, but did not provide > > it a mux, thus crashing later. > > > > This should be backported to 1.8. > > --- > > src/proto_http.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/src/proto_http.c b/src/proto_http.c > > index 80e001d69..817692c48 100644 > > --- a/src/proto_http.c > > +++ b/src/proto_http.c > > @@ -62,6 +62,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -3718,6 +3719,8 @@ int http_process_request(struct stream *s, struct > > channel *req, int an_bit) > > > > return 0; > > } > > + /* XXX: We probably need a better mux */ > > + conn_install_mux(conn, &mux_pt_ops, objt_cs(s->si[1].end)); > > > > path = http_get_path(txn); > > url2sa(req->buf->p + msg->sl.rq.u, > > While I can understand how missing this can have caused trouble, I'm not > sure it's the best solution, and I'm a bit worried about having various > code places choose a mux and install it. I suspect we're "simply" missing > a test in the backend code to cover the case where the connection already > exists but a mux was not yet installed (a situation that didn't exist > prior to muxes which is why we didn't notice this). I think that this > happens in connect_server() around line 1177 (return SF_ERR_INTERNAL) > when conn_xprt_ready() indicates the transport was not yet initialized. > It's likely that the error unrolling fails to consider the case where > the mux wasn't initialized, leading to the crash Praveen experienced. > I'm not sure about this. My point was to try to install the mux when we can choose one, the day we have more relevant muxes, if it ever happens. This certainly fixes Praveen's crash. I don't even know how we could reach line 1177. > If this is right, it would mean two fixes : > - one in the error unrolling path to ensure we don't destroy a non > allocated mux ; And what do we do then ? Without the mux we can't destroy the cs nor the connection. > - one above the return SF_ERR_INTERNAL above in connect_server() to > handle the case where the connection already exists but not the mux. I'm not sure how this can happen, and what we should do if that happens. Regards, Olivier