Re: Drain L4 host that fronts a L7 cluster
Hi Abhijeet, On Mon, May 22, 2023 at 12:30:52PM -0700, Abhijeet Rastogi wrote: > Hi Willy, > > Thank you for the response. It's great to know that this might be > considered as a feature request in future versions, pending > prioritization though. > > Could you comment on why this isn't already a feature yet? Huh ? It's not easy to comment on an existing state. It's the combination of what people needed, what is already available and what can cause trouble. > It is hard > to believe that we're the first to come across this draining problem > when using a chain of "L4 to L7 proxies". Are there any alternative > approaches that we should look at, to get around the current > limitations? Others are likely doing this using a regular soft-stop. During the soft-stop, there will be a combination of active and passive closures on idle connections (you can even decide over how long a time you want to close them all so that you don't suddenly close too many of those such as to avoid an inrush of traffic on other nodes). So the "normal" way to stop traffic to an L7 node in an L4+L7 setup is: 1) disable sending new connections to the L7 node (set its weight to zero or make it fail a health check for example) 2) send the soft-stop signal to haproxy 3) wait for the process to be gone after the last connection is closed If you're dealing with a reload, the old process automatically passes through the soft-stop phase and does this without having to fiddle with a front L4 LB. The case where users would like to close H2 connections actually is more when they want some connections to re-establish on another node without reloading the first one. Typically when moving a small portion of the traffic on a node having a new config just to test it. In such cases it would indeed be desirable to make it possible to close connections earlier. But doing so is not without consequences. For example, if you close immediately after finishing the last stream, you could make your local TCP stack send RSTs due to the client sending WINDOW_UPDATEs to acknowledge receipt of the data, because TCP doesn't know that WINDOW_UPDATES are just ACKs and do not convey useful data that could be drained like empty ACKs. I do have some plans to work around this protocol deficiency that's already in an old issue (I think it's #5 but not sure) which would consist in sending an ACKed frame after the GOAWAY (or in any case before closing), so that we know when the peer received the last frame. This could be a PING or a SETTINGS frame (more interesting as we could advertise max_stream=0 there). But it means adding a new state to the existing state machine and validating that we don't break existing implementations for example. Nothing is impossible and I would really like to have a way to gracefully close connections, at least because there are situations where it's desirable. But it must not be seen as a one-size-fits-all solution. In you case, if it's only for taking off a node from an L4 LB farm, I can't see any limitation to the existing solution. Regards, Willy
Re: Drain L4 host that fronts a L7 cluster
Hi Willy, Thank you for the response. It's great to know that this might be considered as a feature request in future versions, pending prioritization though. Could you comment on why this isn't already a feature yet? It is hard to believe that we're the first to come across this draining problem when using a chain of "L4 to L7 proxies". Are there any alternative approaches that we should look at, to get around the current limitations? Thanks, Abhijeet On Sun, May 7, 2023 at 10:25 PM Willy Tarreau wrote: > > On Fri, May 05, 2023 at 04:18:25PM -0700, Abhijeet Rastogi wrote: > > Thanks for the response Tristan. > > > > For the future reader of this thread, a feature request was created > > for this. https://github.com/haproxy/haproxy/issues/2146 > > I've looked again at the code and am seeing that in modern versions > (2.8) we can provoke a soft-stop of a connection by just calling > sc_must_kill_conn() which will set SE_FL_KILL_CONN on the stream > endpoint. This is only performed by the "reject" action in the > http_action_reject() function. > > We could imagine having an action that sets this; it would also need to > support the after-response rules so that it can easily be centralized. > We'll need to test it to make sure it doesn't kill connections a bit > too fast. > > Another approach would be to reuse "connection: close" for this, which > would make sure that the same method already used on h1 also works on > newer protocols. I'm just thinking that the risk to introduce side > effects by accident is probably not worth it, compared to the low cost > of having a specific action to say "notify the peer that we're going > to close ASAP". There are at least 2/3 levels of active close though: > - notify the client we'd like to stop (GOAWAY(2^31-1)) : the client > will close by itself when it wants, but knows we'd prefer to use > a new connection. That's the best one for reloads or migrations > since there's zero impact. > > - notify the client that the last received stream is the last one > and that other ones will be discarded (GOAWAY(last_id)): this > forces the client to resend new requests that might be in flight. > > - close immediately (typically upon an attack or so). That's normally > used by rejects, though it will still cause active requests to be > completed, so in practice it might be the same as the previous one. > > This will require more experiment, but maybe we could do something > like this for 2.8 (but as usual the devil is in the details). > > Willy -- Cheers, Abhijeet (https://abhi.host)
Re: Drain L4 host that fronts a L7 cluster
On Fri, May 05, 2023 at 04:18:25PM -0700, Abhijeet Rastogi wrote: > Thanks for the response Tristan. > > For the future reader of this thread, a feature request was created > for this. https://github.com/haproxy/haproxy/issues/2146 I've looked again at the code and am seeing that in modern versions (2.8) we can provoke a soft-stop of a connection by just calling sc_must_kill_conn() which will set SE_FL_KILL_CONN on the stream endpoint. This is only performed by the "reject" action in the http_action_reject() function. We could imagine having an action that sets this; it would also need to support the after-response rules so that it can easily be centralized. We'll need to test it to make sure it doesn't kill connections a bit too fast. Another approach would be to reuse "connection: close" for this, which would make sure that the same method already used on h1 also works on newer protocols. I'm just thinking that the risk to introduce side effects by accident is probably not worth it, compared to the low cost of having a specific action to say "notify the peer that we're going to close ASAP". There are at least 2/3 levels of active close though: - notify the client we'd like to stop (GOAWAY(2^31-1)) : the client will close by itself when it wants, but knows we'd prefer to use a new connection. That's the best one for reloads or migrations since there's zero impact. - notify the client that the last received stream is the last one and that other ones will be discarded (GOAWAY(last_id)): this forces the client to resend new requests that might be in flight. - close immediately (typically upon an attack or so). That's normally used by rejects, though it will still cause active requests to be completed, so in practice it might be the same as the previous one. This will require more experiment, but maybe we could do something like this for 2.8 (but as usual the devil is in the details). Willy
Re: Drain L4 host that fronts a L7 cluster
Isn't is a similar request to https://github.com/haproxy/haproxy/issues/969 as I mentioned in the issue https://github.com/haproxy/haproxy/issues/2149 On 06.05.23 01:18, Abhijeet Rastogi wrote: Thanks for the response Tristan. For the future reader of this thread, a feature request was created for this. https://github.com/haproxy/haproxy/issues/2146 On Fri, May 5, 2023 at 4:09 PM Tristan wrote: however, our reason to migrate to HAproxy is adding gRPC compliance to the stack, so H2 support is a must. Thanks for the workarounds, indeed interesting, I'll check them out. From a cursory look at the gRPC spec it seems like you would indeed really need the GOAWAY to get anywhere trigger the GOAWAY H2 frames (which isn't possible at the moment, as far as I can tell) *Is this a valid feature request for HAProxy?* Maybe, we can provide "setting CLO mode" via the "http-request" directive? I can't make that call, but at least it sounds quite useful to me indeed. And in particular, being able to set CLO mode is likely a little bit nicer in the long run than something like a hypothetical 'http-request send-h2-goaway', since CLO mode can account for future protocols or spec changes transparently as those eventually get added to HAProxy. Interesting problem either way! Cheers, Tristan -- Cheers, Abhijeet (https://abhi.host)
Re: Drain L4 host that fronts a L7 cluster
Thanks for the response Tristan. For the future reader of this thread, a feature request was created for this. https://github.com/haproxy/haproxy/issues/2146 On Fri, May 5, 2023 at 4:09 PM Tristan wrote: > > > however, our reason to migrate to HAproxy is adding gRPC > > compliance to the stack, so H2 support is a must. Thanks for the > > workarounds, indeed interesting, I'll check them out. > > From a cursory look at the gRPC spec it seems like you would indeed > really need the GOAWAY to get anywhere > > >> trigger the GOAWAY H2 frames (which isn't possible at > > the moment, as far as I can tell) > > > > *Is this a valid feature request for HAProxy?* > > Maybe, we can provide "setting CLO mode" via the "http-request" directive? > > I can't make that call, but at least it sounds quite useful to me indeed. > > And in particular, being able to set CLO mode is likely a little bit > nicer in the long run than something like a hypothetical 'http-request > send-h2-goaway', since CLO mode can account for future protocols or spec > changes transparently as those eventually get added to HAProxy. > > Interesting problem either way! > > Cheers, > Tristan > -- Cheers, Abhijeet (https://abhi.host)
Re: Drain L4 host that fronts a L7 cluster
however, our reason to migrate to HAproxy is adding gRPC compliance to the stack, so H2 support is a must. Thanks for the workarounds, indeed interesting, I'll check them out. From a cursory look at the gRPC spec it seems like you would indeed really need the GOAWAY to get anywhere trigger the GOAWAY H2 frames (which isn't possible at the moment, as far as I can tell) *Is this a valid feature request for HAProxy?* Maybe, we can provide "setting CLO mode" via the "http-request" directive? I can't make that call, but at least it sounds quite useful to me indeed. And in particular, being able to set CLO mode is likely a little bit nicer in the long run than something like a hypothetical 'http-request send-h2-goaway', since CLO mode can account for future protocols or spec changes transparently as those eventually get added to HAProxy. Interesting problem either way! Cheers, Tristan
Re: Drain L4 host that fronts a L7 cluster
Hi Tristan, Thanks for the *excellent* reply. Indeed, the map based solution can work for H1, however, our reason to migrate to HAproxy is adding gRPC compliance to the stack, so H2 support is a must. Thanks for the workarounds, indeed interesting, I'll check them out. >trigger the GOAWAY H2 frames (which isn't possible at the moment, as far as I can tell) *Is this a valid feature request for HAProxy?* Maybe, we can provide "setting CLO mode" via the "http-request" directive? H1: https://sourcegraph.com/github.com/haproxy/haproxy/-/blob/src/mux_h1.c?L1185 H2: https://sourcegraph.com/github.com/haproxy/haproxy/-/blob/src/mux_h2.c?L4041 QUIC: https://sourcegraph.com/github.com/haproxy/haproxy/-/blob/src/mux_quic.c?L2181 It appears that there isn't a lot going on when we set a specific connection for CLO mode. Although, I definitely lack a holistic view of HAproxy around HTX. Excited to hear responses about this, thanks! Cheers, Abhijeet On Fri, May 5, 2023 at 7:06 AM Tristan wrote: > > Hi Abhijeet, > > > Problem statement is, how do you drain a node [...] L7 constructs like > > "Connection: close" or "GOAWAY > > h2 frames" > > [...] > * For any map (L4 client IP lookup) based solution, I was unable to > > find any http-request operation that sets "drain mode". > > Indeed the managed drain mode is an all-or-nothing at the moment. The > "only" missing feature to fully replicate it with http-request rules > would be a way to trigger the GOAWAY H2 frames (which isn't possible at > the moment, as far as I can tell). > > But in the meantime, the following might do the job: > > frontend foo ># tcp-request connection sees the "real" src in accept-proxy > listeners, according to 7.3.3#src >tcp-request connection set-var(txn.src_l4_ip) src > ># assuming a stick-table here, but should work just fine with > maps/lists too >http-after-response set-header Connection close if { > var(txn.src_l4_ip),in_table(draining-clients) } > > At that point we have the HTTP/1.1 part handled but that was easy > enough... Now for H2 there's no good way that I can see. > > But I can think of a few hacky solutions that might work (ie would need > to test them in practice): > 1. Returning an HTTP 421, which H2 clients should interpret as being > instructed to use a different connection to issue the request, which > should also be automatically retried in browsers > 2. Adding an Alt-Svc to those responses advertising http/1.1 explicitly, > which clients should pick up and prefer (then they either land on > another LB while establishing the new connection, or get Connection-close'd) > 3. Returning an HTTP 426, instructing the client to "upgrade" to > http/1.1 (not sure how that interacts with ALPN but the spec seems to > suggest that the client would give the upgrade request priority), which > ends up with the same outcome as the Alt-Svc solution if it works > > Of course it'd be many times better to be able to just trigger a GOAWAY > instead, but it was fun to think about more "creative" options, so here > they are. > > Tristan > -- Cheers, Abhijeet (https://abhi.host)
Re: Drain L4 host that fronts a L7 cluster
Hi Abhijeet, Problem statement is, how do you drain a node [...] L7 constructs like "Connection: close" or "GOAWAY h2 frames" [...] > * For any map (L4 client IP lookup) based solution, I was unable to find any http-request operation that sets "drain mode". Indeed the managed drain mode is an all-or-nothing at the moment. The "only" missing feature to fully replicate it with http-request rules would be a way to trigger the GOAWAY H2 frames (which isn't possible at the moment, as far as I can tell). But in the meantime, the following might do the job: frontend foo # tcp-request connection sees the "real" src in accept-proxy listeners, according to 7.3.3#src tcp-request connection set-var(txn.src_l4_ip) src # assuming a stick-table here, but should work just fine with maps/lists too http-after-response set-header Connection close if { var(txn.src_l4_ip),in_table(draining-clients) } At that point we have the HTTP/1.1 part handled but that was easy enough... Now for H2 there's no good way that I can see. But I can think of a few hacky solutions that might work (ie would need to test them in practice): 1. Returning an HTTP 421, which H2 clients should interpret as being instructed to use a different connection to issue the request, which should also be automatically retried in browsers 2. Adding an Alt-Svc to those responses advertising http/1.1 explicitly, which clients should pick up and prefer (then they either land on another LB while establishing the new connection, or get Connection-close'd) 3. Returning an HTTP 426, instructing the client to "upgrade" to http/1.1 (not sure how that interacts with ALPN but the spec seems to suggest that the client would give the upgrade request priority), which ends up with the same outcome as the Alt-Svc solution if it works Of course it'd be many times better to be able to just trigger a GOAWAY instead, but it was fun to think about more "creative" options, so here they are. Tristan
Drain L4 host that fronts a L7 cluster
Hi HAproxy community, We've the following production setup, clusters of: IPVS (dsr) -> L4 HAproxy (tcp termination) -> L7 Proxy (via pp, L7 done here. Not HAproxy today, but it soon will be) Problem statement is, how do you drain a node in L4 tier gracefully as it has no concept of L7 constructs like "Connection: close" or "GOAWAY h2 frames" and can't send thoseon L7 proxy's behalf. In our current L7 proxy, we've custom logic written to inspect client IP of request (=L4 node) and do the drain behavior for only those requests matching the client IP of L4 that's going under maintenance. However, as we're migrating to L7 HAproxy, I am unable to find a way to initiate "soft-stop" behavior only for requests coming from a specific upstream L4 node. Note, we already have measures in place to stop sending new requests to L4 as soon as we initiate the maintenance mode, this is about gracefully draining existing connections on L4 which are spread across the whole L7 cluster. I'll be great if I can get some pointers here on what to look at, to solve this problem. * I already checked the HAproxy code around soft-top and close-spread-time etc, it doesn't have any support to only drain specific clients. * In lua, I didn't find any methods around initiating drain for specific requests. * For any map (L4 client IP lookup) based solution, I was unable to find any http-request operation that sets "drain mode". Cheers, Abhijeet (https://abhi.host)