Re: maint, drain: the right approach
Hi Willy, > > Seems that it's considered an expected behavior to consider > > optimistically the server as UP > > when leaving MAINT mode, even if the L4 health checks are not completed > > yet. > Normally using the existing API you could forcefully > mark the server's check as down using this before leaving maintenance: > set server / health [ up | stopping | down ] > Doesn't it work to force it to down before leaving maintenance and wait > for it to succeed its checks ? That would give this to leave maintenance: > set server blah health down; set server blah state ready > By the way that reminds me that a long time ago we envisioned a server > option such as "init-state down" but with the ability to add servers on > the fly via the CLI it seemed a bit moot precisely because you should be > able to do the above. But again, do not hesitate to tell me if I'm wrong > somewhere, my goal is not to reject any change but to make sure we're not > trying to do something that's already possible (and possibly not obvious, > I concede). I just did some tests to share with you: 1) - Forcing health "DOWN" before exiting "MAINT" mode - COMMANDS: set server test_backend/s1 state maint set server test_backend/s1 health down set server test_backend/s1 state ready LOG: Server test_backend/s1 is going DOWN for maintenance. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Server test_backend/s1 is UP/READY (leaving forced maintenance). In this case, forcing health as "DOWN" when in MAINT mode isn't evaluated by haproxy, and it optimistically comes back to "UP" once state is set to READY. 2) - Entering "MAINT" mode when health is "DOWN": COMMANDS: set server test_backend/s1 health down set server test_backend/s1 state maint set server test_backend/s1 state ready LOG: Server test_backend/s1 is DOWN, changed from CLI. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Server test_backend/s1 was DOWN and now enters maintenance. Server test_backend/s1 is UP/READY (leaving forced maintenance). In this case, health is successfully forced to "DOWN" before entering MAINT mode, but it's again optimistically restored as "UP" once state is set to READY. 3) - Exiting "MAINT" mode passing through "DRAIN", forcing health "DOWN" - COMMANDS: set server test_backend/s1 state maint set server test_backend/s1 state drain set server test_backend/s1 health down set server test_backend/s1 state ready LOG: Server test_backend/s1 is going DOWN for maintenance. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Server test_backend/s1 is UP/DRAIN (leaving forced maintenance). Server test_backend/s1 remains in forced drain mode. Server test_backend/s1 is DOWN, changed from CLI. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Server test_backend/s1 remains in forced drain mode. Server test_backend/s1 is DOWN (leaving forced drain). That one is working as intended, since it's exiting from MAINT with health "DOWN", and then haproxy evaluates the health before going "UP". ... but it's passing through "DRAIN", and I don't know it's intended. Do you think something has to be different on the first two tests I did? Maybe forced health "DOWN" should have been honored once out from "MAINT" to "READY" state? Thanks, Matteo
Re: maint, drain: the right approach
> Hi Matteo, Hi Aurelien, thanks for your reply on my issue > > Once the activity on the underlying service has been completed and they > > are starting up, I switch back from MAINT to READY (without waiting the > > service to be really up). > > The haproxy backend got immediately back in the roundrobin pool, even if > > the L4 and L7 checks are still validating that the underlying service is > > still DOWN (service is still starting, could take time). > I would wait for others to confirm or infirm, but meanwhile the > situation you described makes me think of an issue that was revived on > Github a few weeks ago: https://github.com/haproxy/haproxy/issues/51 > (server assumed UP when leaving MAINT before the checks are performed) yes, I saw that issue on Github, but I can also see the lukastribus 's comment on that: When health checks are used, but did not start yet or the status is not yet determined, then the current server status will be UP. This is documented and expected behavior. Seems that it's considered an expected behavior to consider optimistically the server as UP when leaving MAINT mode, even if the L4 health checks are not completed yet. I consider that a quite annoying feature, but maybe I'm approaching at this in a wrong way. Waiting for others to comment such issue to better understand. Thanks, Matteo
Re: maint, drain: the right approach
Hi all, still trying to figure out the right way to to this. Any suggestions to share with me? Thanks, Matteo - Messaggio originale - Da: "Matteo Piva" A: "HAProxy" Inviato: Giovedì, 11 maggio 2023 11:04:11 Oggetto: maint, drain: the right approach Hi, I'm trying to get into the right maintenance procedure when I have to put down an HTTP backend for maintenance. When I put one of the two backends in MAINT mode (disabled), the traffic is then immediately routed only to the active backend. And this includes persistent connections as well. Once the activity on the underlying service has been completed and they are starting up, I switch back from MAINT to READY (without waiting the service to be really up). The haproxy backend got immediately back in the roundrobin pool, even if the L4 and L7 checks are still validating that the underlying service is still DOWN (service is still starting, could take time). So we have: - HAPROXY backend: ready->maint - Stopping service - Starting service - HAPROXY backend: maint->ready >From now we have the backend in the pool, haproxy is still checking if the >service is UP or DOWN (L4) - We have http/503s calling the frontend - HAPROXY backend: down (checked, L4) - Service up - HAPROXY backend: up (checked, L4/L7) During the window between maint->ready and L4 check DOWN, the clients got http/503 response when being routed to the starting backend. Now, I know that using DRAIN the L4/L7 checks can still be ongoing, and this can solve such issue. But this also means that I can't avoid persistent connections to be routed to this backend, so I could have http/503s during the maintenance window. Which is the right approach? Is there a way to let maint->ready transition to be pessimistic, and wait for the checks to complete before the backend be back in the pool? Or maybe is there a way to use drain the same as maint, so that can also consider persistant connection to be forcefully routed only to active backends? Thanks, Matteo
maint, drain: the right approach
Hi, I'm trying to get into the right maintenance procedure when I have to put down an HTTP backend for maintenance. When I put one of the two backends in MAINT mode (disabled), the traffic is then immediately routed only to the active backend. And this includes persistent connections as well. Once the activity on the underlying service has been completed and they are starting up, I switch back from MAINT to READY (without waiting the service to be really up). The haproxy backend got immediately back in the roundrobin pool, even if the L4 and L7 checks are still validating that the underlying service is still DOWN (service is still starting, could take time). So we have: - HAPROXY backend: ready->maint - Stopping service - Starting service - HAPROXY backend: maint->ready >From now we have the backend in the pool, haproxy is still checking if the >service is UP or DOWN (L4) - We have http/503s calling the frontend - HAPROXY backend: down (checked, L4) - Service up - HAPROXY backend: up (checked, L4/L7) During the window between maint->ready and L4 check DOWN, the clients got http/503 response when being routed to the starting backend. Now, I know that using DRAIN the L4/L7 checks can still be ongoing, and this can solve such issue. But this also means that I can't avoid persistent connections to be routed to this backend, so I could have http/503s during the maintenance window. Which is the right approach? Is there a way to let maint->ready transition to be pessimistic, and wait for the checks to complete before the backend be back in the pool? Or maybe is there a way to use drain the same as maint, so that can also consider persistant connection to be forcefully routed only to active backends? Thanks, Matteo
monitor-uri: right way to approach a maintenance in cascaded haproxys
Hi all, I'm trying to set up our haproxys according to the following cascaded schema: HTTP CLIENT: --> HAPROXY1 HTTP Frontend --> HAPROXY 1 Backend (/health monitoring, RR to Haproxy2's frontends and Haproxy3's frontends) --> HAPROXY2 HTTP Frontend (exposing /health monitor-uri) --> HAPROXY2 Backend (RR)--> Backend Servers (SITE 1) --> HAPROXY3 HTTP Frontend (exposing /health monitor-uri) --> HAPROXY3 Backend (RR)--> Backend Servers (SITE 2) HAPROXY1 : backend client-backend-name: mode http balance roundrobin option httpchk GET /health http-check expect status 200 timeout check 6s server haproxy2 1.2.3.2:80 check inter 10s fall 3 rise 2 server haproxy3 1.2.3.3:80 check inter 10s fall 3 rise 2 HAPROXY2 & HAPROXY3 : frontend haproxy2-frontend-name mode http bind 1.2.3.2:80 monitor-uri /health monitor fail if { nbsrv(haproxy2-backend-name) eq 0 } use_backend haproxy2-backend-name backend haproxy2-backend-name: mode http balance roundrobin option httpchk GET /health http-check expect status 200 timeout check 6s server one 1.2.3.10:80 check inter 10s fall 3 rise 2 server two 1.2.3.11:80 check inter 10s fall 3 rise 2 I have set up our Haproxy2's and Haproxy3's HTTP frontends to expose the monitor-uri that is returning http status != 200 when all the servers in the provided backend are unavailable. I can then have the Haproxy1 know if one of them didn't have backend servers available to handle the request anymore. Now I'd want to force our monitor-uri on Haproxy2 or Haproxy3 to return http status != 200, just to let Haproxy1 know when we are about to shutdown the site/node for maintenance before the service really becomes unavailable. Can you please suggest me the cleanest way to have such control of monitor-uri returned status, forcing the http returned status to someting != 200 when needed and leaving it sticky to the nbsrv(backend-name) eq 0 as per default? Thanks, Matteo