Re: Drain L4 host that fronts a L7 cluster

2023-05-23 Thread Willy Tarreau
Hi Abhijeet,

On Mon, May 22, 2023 at 12:30:52PM -0700, Abhijeet Rastogi wrote:
> Hi Willy,
> 
> Thank you for the response. It's great to know that this might be
> considered as a feature request in future versions, pending
> prioritization though.
> 
> Could you comment on why this isn't already a feature yet?

Huh ? It's not easy to comment on an existing state. It's the combination
of what people needed, what is already available and what can cause trouble.

> It is hard
> to believe that we're the first to come across this draining problem
> when using a chain of "L4 to L7 proxies". Are there any alternative
> approaches that we should look at, to get around the current
> limitations?

Others are likely doing this using a regular soft-stop. During the
soft-stop, there will be a combination of active and passive closures
on idle connections (you can even decide over how long a time you want
to close them all so that you don't suddenly close too many of those
such as to avoid an inrush of traffic on other nodes). So the "normal"
way to stop traffic to an L7 node in an L4+L7 setup is:

  1) disable sending new connections to the L7 node (set its weight
 to zero or make it fail a health check for example)

  2) send the soft-stop signal to haproxy

  3) wait for the process to be gone after the last connection is
 closed

If you're dealing with a reload, the old process automatically passes
through the soft-stop phase and does this without having to fiddle with
a front L4 LB.

The case where users would like to close H2 connections actually is more
when they want some connections to re-establish on another node without
reloading the first one. Typically when moving a small portion of the
traffic on a node having a new config just to test it. In such cases it
would indeed be desirable to make it possible to close connections
earlier. But doing so is not without consequences. For example, if you
close immediately after finishing the last stream, you could make your
local TCP stack send RSTs due to the client sending WINDOW_UPDATEs to
acknowledge receipt of the data, because TCP doesn't know that
WINDOW_UPDATES are just ACKs and do not convey useful data that could
be drained like empty ACKs. I do have some plans to work around this
protocol deficiency that's already in an old issue (I think it's #5 but
not sure) which would consist in sending an ACKed frame after the GOAWAY
(or in any case before closing), so that we know when the peer received
the last frame. This could be a PING or a SETTINGS frame (more interesting
as we could advertise max_stream=0 there). But it means adding a new state
to the existing state machine and validating that we don't break existing
implementations for example.

Nothing is impossible and I would really like to have a way to gracefully
close connections, at least because there are situations where it's
desirable. But it must not be seen as a one-size-fits-all solution. In
you case, if it's only for taking off a node from an L4 LB farm, I can't
see any limitation to the existing solution.

Regards,
Willy



Re: Drain L4 host that fronts a L7 cluster

2023-05-22 Thread Abhijeet Rastogi
Hi Willy,

Thank you for the response. It's great to know that this might be
considered as a feature request in future versions, pending
prioritization though.

Could you comment on why this isn't already a feature yet? It is hard
to believe that we're the first to come across this draining problem
when using a chain of "L4 to L7 proxies". Are there any alternative
approaches that we should look at, to get around the current
limitations?

Thanks,
Abhijeet

On Sun, May 7, 2023 at 10:25 PM Willy Tarreau  wrote:
>
> On Fri, May 05, 2023 at 04:18:25PM -0700, Abhijeet Rastogi wrote:
> > Thanks for the response Tristan.
> >
> > For the future reader of this thread, a feature request was created
> > for this. https://github.com/haproxy/haproxy/issues/2146
>
> I've looked again at the code and am seeing that in modern versions
> (2.8) we can provoke a soft-stop of a connection by just calling
> sc_must_kill_conn() which will set SE_FL_KILL_CONN on the stream
> endpoint. This is only performed by the "reject" action in the
> http_action_reject() function.
>
> We could imagine having an action that sets this; it would also need to
> support the after-response rules so that it can easily be centralized.
> We'll need to test it to make sure it doesn't kill connections a bit
> too fast.
>
> Another approach would be to reuse "connection: close" for this, which
> would make sure that the same method already used on h1 also works on
> newer protocols. I'm just thinking that the risk to introduce side
> effects by accident is probably not worth it, compared to the low cost
> of having a specific action to say "notify the peer that we're going
> to close ASAP". There are at least 2/3 levels of active close though:
>   - notify the client we'd like to stop (GOAWAY(2^31-1)) : the client
> will close by itself when it wants, but knows we'd prefer to use
> a new connection. That's the best one for reloads or migrations
> since there's zero impact.
>
>   - notify the client that the last received stream is the last one
> and that other ones will be discarded (GOAWAY(last_id)): this
> forces the client to resend new requests that might be in flight.
>
>   - close immediately (typically upon an attack or so). That's normally
> used by rejects, though it will still cause active requests to be
> completed, so in practice it might be the same as the previous one.
>
> This will require more experiment, but maybe we could do something
> like this for 2.8 (but as usual the devil is in the details).
>
> Willy



-- 
Cheers,
Abhijeet (https://abhi.host)



Re: Drain L4 host that fronts a L7 cluster

2023-05-07 Thread Willy Tarreau
On Fri, May 05, 2023 at 04:18:25PM -0700, Abhijeet Rastogi wrote:
> Thanks for the response Tristan.
> 
> For the future reader of this thread, a feature request was created
> for this. https://github.com/haproxy/haproxy/issues/2146

I've looked again at the code and am seeing that in modern versions
(2.8) we can provoke a soft-stop of a connection by just calling
sc_must_kill_conn() which will set SE_FL_KILL_CONN on the stream
endpoint. This is only performed by the "reject" action in the
http_action_reject() function.

We could imagine having an action that sets this; it would also need to
support the after-response rules so that it can easily be centralized.
We'll need to test it to make sure it doesn't kill connections a bit
too fast.

Another approach would be to reuse "connection: close" for this, which
would make sure that the same method already used on h1 also works on
newer protocols. I'm just thinking that the risk to introduce side
effects by accident is probably not worth it, compared to the low cost
of having a specific action to say "notify the peer that we're going
to close ASAP". There are at least 2/3 levels of active close though:
  - notify the client we'd like to stop (GOAWAY(2^31-1)) : the client
will close by itself when it wants, but knows we'd prefer to use 
a new connection. That's the best one for reloads or migrations
since there's zero impact.

  - notify the client that the last received stream is the last one
and that other ones will be discarded (GOAWAY(last_id)): this
forces the client to resend new requests that might be in flight.

  - close immediately (typically upon an attack or so). That's normally
used by rejects, though it will still cause active requests to be
completed, so in practice it might be the same as the previous one.

This will require more experiment, but maybe we could do something
like this for 2.8 (but as usual the devil is in the details).

Willy



Re: Drain L4 host that fronts a L7 cluster

2023-05-05 Thread Aleksandar Lazic
Isn't is a similar request to 
https://github.com/haproxy/haproxy/issues/969 as I mentioned in the 
issue https://github.com/haproxy/haproxy/issues/2149


On 06.05.23 01:18, Abhijeet Rastogi wrote:

Thanks for the response Tristan.

For the future reader of this thread, a feature request was created
for this. https://github.com/haproxy/haproxy/issues/2146


On Fri, May 5, 2023 at 4:09 PM Tristan  wrote:

however, our reason to migrate to HAproxy is adding gRPC
compliance to the stack, so H2 support is a must. Thanks for the
workarounds, indeed interesting, I'll check them out.

  From a cursory look at the gRPC spec it seems like you would indeed
really need the GOAWAY to get anywhere


trigger the GOAWAY H2 frames (which isn't possible at

the moment, as far as I can tell)

*Is this a valid feature request for HAProxy?*
Maybe, we can provide "setting CLO mode" via the "http-request" directive?

I can't make that call, but at least it sounds quite useful to me indeed.

And in particular, being able to set CLO mode is likely a little bit
nicer in the long run than something like a hypothetical 'http-request
send-h2-goaway', since CLO mode can account for future protocols or spec
changes transparently as those eventually get added to HAProxy.

Interesting problem either way!

Cheers,
Tristan



--
Cheers,
Abhijeet (https://abhi.host)





Re: Drain L4 host that fronts a L7 cluster

2023-05-05 Thread Abhijeet Rastogi
Thanks for the response Tristan.

For the future reader of this thread, a feature request was created
for this. https://github.com/haproxy/haproxy/issues/2146


On Fri, May 5, 2023 at 4:09 PM Tristan  wrote:
>
> > however, our reason to migrate to HAproxy is adding gRPC
> > compliance to the stack, so H2 support is a must. Thanks for the
> > workarounds, indeed interesting, I'll check them out.
>
>  From a cursory look at the gRPC spec it seems like you would indeed
> really need the GOAWAY to get anywhere
>
> >> trigger the GOAWAY H2 frames (which isn't possible at
> > the moment, as far as I can tell)
> >
> > *Is this a valid feature request for HAProxy?*
> > Maybe, we can provide "setting CLO mode" via the "http-request" directive?
>
> I can't make that call, but at least it sounds quite useful to me indeed.
>
> And in particular, being able to set CLO mode is likely a little bit
> nicer in the long run than something like a hypothetical 'http-request
> send-h2-goaway', since CLO mode can account for future protocols or spec
> changes transparently as those eventually get added to HAProxy.
>
> Interesting problem either way!
>
> Cheers,
> Tristan
>


--
Cheers,
Abhijeet (https://abhi.host)



Re: Drain L4 host that fronts a L7 cluster

2023-05-05 Thread Tristan

however, our reason to migrate to HAproxy is adding gRPC
compliance to the stack, so H2 support is a must. Thanks for the
workarounds, indeed interesting, I'll check them out.


From a cursory look at the gRPC spec it seems like you would indeed 
really need the GOAWAY to get anywhere



trigger the GOAWAY H2 frames (which isn't possible at

the moment, as far as I can tell)

*Is this a valid feature request for HAProxy?*
Maybe, we can provide "setting CLO mode" via the "http-request" directive?


I can't make that call, but at least it sounds quite useful to me indeed.

And in particular, being able to set CLO mode is likely a little bit 
nicer in the long run than something like a hypothetical 'http-request 
send-h2-goaway', since CLO mode can account for future protocols or spec 
changes transparently as those eventually get added to HAProxy.


Interesting problem either way!

Cheers,
Tristan



Re: Drain L4 host that fronts a L7 cluster

2023-05-05 Thread Abhijeet Rastogi
Hi Tristan,

Thanks for the *excellent* reply. Indeed, the map based solution can
work for H1, however, our reason to migrate to HAproxy is adding gRPC
compliance to the stack, so H2 support is a must. Thanks for the
workarounds, indeed interesting, I'll check them out.

>trigger the GOAWAY H2 frames (which isn't possible at
the moment, as far as I can tell)

*Is this a valid feature request for HAProxy?*
Maybe, we can provide "setting CLO mode" via the "http-request" directive?

H1: https://sourcegraph.com/github.com/haproxy/haproxy/-/blob/src/mux_h1.c?L1185
H2: https://sourcegraph.com/github.com/haproxy/haproxy/-/blob/src/mux_h2.c?L4041
QUIC: 
https://sourcegraph.com/github.com/haproxy/haproxy/-/blob/src/mux_quic.c?L2181

It appears that there isn't a lot going on when we set a specific
connection for CLO mode. Although, I definitely lack a holistic view
of HAproxy around HTX.

Excited to hear responses about this, thanks!

Cheers,
Abhijeet


On Fri, May 5, 2023 at 7:06 AM Tristan  wrote:
>
> Hi Abhijeet,
>
> > Problem statement is, how do you drain a node [...] L7 constructs like 
> > "Connection: close" or "GOAWAY
> > h2 frames"
> > [...] > * For any map (L4 client IP lookup) based solution, I was unable to
> > find any http-request operation that sets "drain mode".
>
> Indeed the managed drain mode is an all-or-nothing at the moment. The
> "only" missing feature to fully replicate it with http-request rules
> would be a way to trigger the GOAWAY H2 frames (which isn't possible at
> the moment, as far as I can tell).
>
> But in the meantime, the following might do the job:
>
> frontend foo
># tcp-request connection sees the "real" src in accept-proxy
> listeners, according to 7.3.3#src
>tcp-request connection set-var(txn.src_l4_ip) src
>
># assuming a stick-table here, but should work just fine with
> maps/lists too
>http-after-response set-header Connection close if {
> var(txn.src_l4_ip),in_table(draining-clients) }
>
> At that point we have the HTTP/1.1 part handled but that was easy
> enough... Now for H2 there's no good way that I can see.
>
> But I can think of a few hacky solutions that might work (ie would need
> to test them in practice):
> 1. Returning an HTTP 421, which H2 clients should interpret as being
> instructed to use a different connection to issue the request, which
> should also be automatically retried in browsers
> 2. Adding an Alt-Svc to those responses advertising http/1.1 explicitly,
> which clients should pick up and prefer (then they either land on
> another LB while establishing the new connection, or get Connection-close'd)
> 3. Returning an HTTP 426, instructing the client to "upgrade" to
> http/1.1 (not sure how that interacts with ALPN but the spec seems to
> suggest that the client would give the upgrade request priority), which
> ends up with the same outcome as the Alt-Svc solution if it works
>
> Of course it'd be many times better to be able to just trigger a GOAWAY
> instead, but it was fun to think about more "creative" options, so here
> they are.
>
> Tristan
>


--
Cheers,
Abhijeet (https://abhi.host)



Re: Drain L4 host that fronts a L7 cluster

2023-05-05 Thread Tristan

Hi Abhijeet,


Problem statement is, how do you drain a node [...] L7 constructs like "Connection: 
close" or "GOAWAY
h2 frames" 
[...] > * For any map (L4 client IP lookup) based solution, I was unable to

find any http-request operation that sets "drain mode".


Indeed the managed drain mode is an all-or-nothing at the moment. The 
"only" missing feature to fully replicate it with http-request rules 
would be a way to trigger the GOAWAY H2 frames (which isn't possible at 
the moment, as far as I can tell).


But in the meantime, the following might do the job:

frontend foo
  # tcp-request connection sees the "real" src in accept-proxy 
listeners, according to 7.3.3#src

  tcp-request connection set-var(txn.src_l4_ip) src

  # assuming a stick-table here, but should work just fine with 
maps/lists too
  http-after-response set-header Connection close if { 
var(txn.src_l4_ip),in_table(draining-clients) }


At that point we have the HTTP/1.1 part handled but that was easy 
enough... Now for H2 there's no good way that I can see.


But I can think of a few hacky solutions that might work (ie would need 
to test them in practice):
1. Returning an HTTP 421, which H2 clients should interpret as being 
instructed to use a different connection to issue the request, which 
should also be automatically retried in browsers
2. Adding an Alt-Svc to those responses advertising http/1.1 explicitly, 
which clients should pick up and prefer (then they either land on 
another LB while establishing the new connection, or get Connection-close'd)
3. Returning an HTTP 426, instructing the client to "upgrade" to 
http/1.1 (not sure how that interacts with ALPN but the spec seems to 
suggest that the client would give the upgrade request priority), which 
ends up with the same outcome as the Alt-Svc solution if it works


Of course it'd be many times better to be able to just trigger a GOAWAY 
instead, but it was fun to think about more "creative" options, so here 
they are.


Tristan



Drain L4 host that fronts a L7 cluster

2023-05-04 Thread Abhijeet Rastogi
Hi HAproxy community,

We've the following production setup, clusters of:

IPVS (dsr)  ->
L4 HAproxy (tcp termination) ->
L7 Proxy (via pp, L7 done here. Not HAproxy today, but it soon will be)

Problem statement is, how do you drain a node in L4 tier gracefully as
it has no concept of L7 constructs like "Connection: close" or "GOAWAY
h2 frames" and can't send thoseon L7 proxy's behalf. In our current L7
proxy, we've custom logic written to inspect client IP of request (=L4
node) and do the drain behavior for only those requests matching the
client IP of L4 that's going under maintenance.

However, as we're migrating to L7 HAproxy, I am unable to find a way
to initiate "soft-stop" behavior only for requests coming from a
specific upstream L4 node. Note, we already have measures in place to
stop sending new requests to L4 as soon as we initiate the maintenance
mode, this is about gracefully draining existing connections on L4
which are spread across the whole L7 cluster.

I'll be great if I can get some pointers here on what to look at, to
solve this problem.

* I already checked the HAproxy code around soft-top and
close-spread-time etc, it doesn't have any support to only drain
specific clients.
* In lua, I didn't find any methods around initiating drain for
specific requests.
* For any map (L4 client IP lookup) based solution, I was unable to
find any http-request operation that sets "drain mode".

Cheers,
Abhijeet (https://abhi.host)