Re: maint, drain: the right approach

2023-06-07 Thread Matteo Piva
Hi Willy,

> > Seems that it's considered an expected behavior to consider 
> > optimistically the server as UP 
> > when leaving MAINT mode, even if the L4 health checks are not completed 
> > yet. 

> Normally using the existing API you could forcefully 
> mark the server's check as down using this before leaving maintenance: 

> set server / health [ up | stopping | down ] 

> Doesn't it work to force it to down before leaving maintenance and wait 
> for it to succeed its checks ? That would give this to leave maintenance: 

> set server blah health down; set server blah state ready 

> By the way that reminds me that a long time ago we envisioned a server 
> option such as "init-state down" but with the ability to add servers on 
> the fly via the CLI it seemed a bit moot precisely because you should be 
> able to do the above. But again, do not hesitate to tell me if I'm wrong 
> somewhere, my goal is not to reject any change but to make sure we're not 
> trying to do something that's already possible (and possibly not obvious, 
> I concede). 


I just did some tests to share with you:


1) - Forcing health "DOWN" before exiting "MAINT" mode -
COMMANDS:
set server test_backend/s1 state maint
set server test_backend/s1 health down
set server test_backend/s1 state ready

LOG:
Server test_backend/s1 is going DOWN for maintenance. 1 active and 0 backup 
servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Server test_backend/s1 is UP/READY (leaving forced maintenance).

In this case, forcing health as "DOWN" when in MAINT mode isn't evaluated by 
haproxy, and it optimistically comes back to "UP" once state is set to READY.


2) - Entering "MAINT" mode when health is "DOWN":
COMMANDS:
set server test_backend/s1 health down
set server test_backend/s1 state maint
set server test_backend/s1 state ready

LOG:
Server test_backend/s1 is DOWN, changed from CLI. 1 active and 0 backup servers 
left. 0 sessions active, 0 requeued, 0 remaining in queue.
Server test_backend/s1 was DOWN and now enters maintenance.
Server test_backend/s1 is UP/READY (leaving forced maintenance).

In this case, health is successfully forced to "DOWN" before entering MAINT 
mode, but it's again optimistically restored as "UP" once state is set to READY.


3) - Exiting "MAINT" mode passing through "DRAIN", forcing health "DOWN" -
COMMANDS:
set server test_backend/s1 state maint
set server test_backend/s1 state drain
set server test_backend/s1 health down
set server test_backend/s1 state ready

LOG:
Server test_backend/s1 is going DOWN for maintenance. 1 active and 0 backup 
servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Server test_backend/s1 is UP/DRAIN (leaving forced maintenance).
Server test_backend/s1 remains in forced drain mode.
Server test_backend/s1 is DOWN, changed from CLI. 1 active and 0 backup servers 
left. 0 sessions active, 0 requeued, 0 remaining in queue.
Server test_backend/s1 remains in forced drain mode.
Server test_backend/s1 is DOWN (leaving forced drain).

That one is working as intended, since it's exiting from MAINT with health 
"DOWN", and then haproxy evaluates the health before going "UP".
... but it's passing through "DRAIN", and I don't know it's intended.



Do you think something has to be different on the first two tests I did? Maybe 
forced health "DOWN" should have been honored once out from "MAINT" to "READY" 
state?


Thanks,

Matteo


Re: maint, drain: the right approach

2023-05-23 Thread Matteo Piva


> Hi Matteo, 

Hi Aurelien, thanks for your reply on my issue


> > Once the activity on the underlying service has been completed and they 
> > are starting up, I switch back from MAINT to READY (without waiting the 
> > service to be really up). 
> > The haproxy backend got immediately back in the roundrobin pool, even if 
> > the L4 and L7 checks are still validating that the underlying service is 
> > still DOWN (service is still starting, could take time). 

> I would wait for others to confirm or infirm, but meanwhile the 
> situation you described makes me think of an issue that was revived on 
> Github a few weeks ago: https://github.com/haproxy/haproxy/issues/51 
> (server assumed UP when leaving MAINT before the checks are performed) 


yes, I saw that issue on Github, but I can also see the lukastribus 's comment 
on that: 

  When health checks are used, but did not start yet or the status is not yet 
determined, 
  then the current server status will be UP. This is documented and expected 
behavior. 


Seems that it's considered an expected behavior to consider optimistically the 
server as UP
when leaving MAINT mode, even if the L4 health checks are not completed yet.

I consider that a quite annoying feature, but maybe I'm approaching at this in 
a wrong way.

Waiting for others to comment such issue to better understand.


Thanks, 
Matteo 



Re: maint, drain: the right approach

2023-05-23 Thread Matteo Piva
Hi all, 

still trying to figure out the right way to to this. 

Any suggestions to share with me? 


Thanks, 
Matteo 

- Messaggio originale -


Da: "Matteo Piva"  
A: "HAProxy"  
Inviato: Giovedì, 11 maggio 2023 11:04:11 
Oggetto: maint, drain: the right approach 

Hi, 

I'm trying to get into the right maintenance procedure when I have to put down 
an HTTP backend for maintenance. 

When I put one of the two backends in MAINT mode (disabled), the traffic is 
then immediately routed only to the active backend. And this includes 
persistent connections as well. 

Once the activity on the underlying service has been completed and they are 
starting up, I switch back from MAINT to READY (without waiting the service to 
be really up). 
The haproxy backend got immediately back in the roundrobin pool, even if the L4 
and L7 checks are still validating that the underlying service is still DOWN 
(service is still starting, could take time). 

So we have: 

- HAPROXY backend: ready->maint 
- Stopping service 
- Starting service 
- HAPROXY backend: maint->ready 
>From now we have the backend in the pool, haproxy is still checking if the 
>service is UP or DOWN (L4) - We have http/503s calling the frontend 
- HAPROXY backend: down (checked, L4) 
- Service up 
- HAPROXY backend: up (checked, L4/L7) 

During the window between maint->ready and L4 check DOWN, the clients got 
http/503 response when being routed to the starting backend. 



Now, I know that using DRAIN the L4/L7 checks can still be ongoing, and this 
can solve such issue. 
But this also means that I can't avoid persistent connections to be routed to 
this backend, so I could have http/503s during the maintenance window. 

Which is the right approach? 
Is there a way to let maint->ready transition to be pessimistic, and wait for 
the checks to complete before the backend be back in the pool? 
Or maybe is there a way to use drain the same as maint, so that can also 
consider persistant connection to be forcefully routed only to active backends? 


Thanks, 

Matteo 



maint, drain: the right approach

2023-05-11 Thread Matteo Piva
Hi, 

I'm trying to get into the right maintenance procedure when I have to put down 
an HTTP backend for maintenance. 

When I put one of the two backends in MAINT mode (disabled), the traffic is 
then immediately routed only to the active backend. And this includes 
persistent connections as well. 

Once the activity on the underlying service has been completed and they are 
starting up, I switch back from MAINT to READY (without waiting the service to 
be really up). 
The haproxy backend got immediately back in the roundrobin pool, even if the L4 
and L7 checks are still validating that the underlying service is still DOWN 
(service is still starting, could take time). 

So we have: 

- HAPROXY backend: ready->maint 
- Stopping service 
- Starting service 
- HAPROXY backend: maint->ready 
>From now we have the backend in the pool, haproxy is still checking if the 
>service is UP or DOWN (L4) - We have http/503s calling the frontend 
- HAPROXY backend: down (checked, L4) 
- Service up 
- HAPROXY backend: up (checked, L4/L7) 

During the window between maint->ready and L4 check DOWN, the clients got 
http/503 response when being routed to the starting backend. 



Now, I know that using DRAIN the L4/L7 checks can still be ongoing, and this 
can solve such issue. 
But this also means that I can't avoid persistent connections to be routed to 
this backend, so I could have http/503s during the maintenance window. 

Which is the right approach? 
Is there a way to let maint->ready transition to be pessimistic, and wait for 
the checks to complete before the backend be back in the pool? 
Or maybe is there a way to use drain the same as maint, so that can also 
consider persistant connection to be forcefully routed only to active backends? 


Thanks, 

Matteo 


monitor-uri: right way to approach a maintenance in cascaded haproxys

2023-05-02 Thread Matteo Piva
Hi all, 

I'm trying to set up our haproxys according to the following cascaded schema: 

HTTP CLIENT: 
--> HAPROXY1 HTTP Frontend --> HAPROXY 1 Backend (/health monitoring, RR to 
Haproxy2's frontends and Haproxy3's frontends) 
--> HAPROXY2 HTTP Frontend (exposing /health monitor-uri) --> HAPROXY2 Backend 
(RR)--> Backend Servers (SITE 1) 
--> HAPROXY3 HTTP Frontend (exposing /health monitor-uri) --> HAPROXY3 Backend 
(RR)--> Backend Servers (SITE 2) 


HAPROXY1 : 
backend client-backend-name: 
mode http 
balance roundrobin 
option httpchk GET /health 
http-check expect status 200 
timeout check 6s 
server haproxy2 1.2.3.2:80 check inter 10s fall 3 rise 2 
server haproxy3 1.2.3.3:80 check inter 10s fall 3 rise 2 


HAPROXY2 & HAPROXY3 : 
frontend haproxy2-frontend-name 
mode http 
bind 1.2.3.2:80 
monitor-uri /health 
monitor fail if { nbsrv(haproxy2-backend-name) eq 0 } 
 
use_backend haproxy2-backend-name 

backend haproxy2-backend-name: 
mode http 
balance roundrobin 
option httpchk GET /health 
http-check expect status 200 
timeout check 6s 
server one 1.2.3.10:80 check inter 10s fall 3 rise 2 
server two 1.2.3.11:80 check inter 10s fall 3 rise 2 




I have set up our Haproxy2's and Haproxy3's HTTP frontends to expose the 
monitor-uri that is returning http status != 200 when all the servers in the 
provided backend are unavailable. 
I can then have the Haproxy1 know if one of them didn't have backend servers 
available to handle the request anymore. 

Now I'd want to force our monitor-uri on Haproxy2 or Haproxy3 to return http 
status != 200, just to let Haproxy1 know when we are about to shutdown the 
site/node for maintenance before the service really becomes unavailable. 
Can you please suggest me the cleanest way to have such control of monitor-uri 
returned status, forcing the http returned status to someting != 200 when 
needed and leaving it sticky to the nbsrv(backend-name) eq 0 as per default? 


Thanks, 
Matteo