On 6/12/26 12:15 PM, Matteo Perin wrote:
> On Fri, 12 Jun 2026 at 10:07, Dumitru Ceara <[email protected]> wrote:
> 
>> I have a few questions, not necessarily on the implementation but more
>> on the support choice we're making here.
>>
>>> On Wed, Jun 10, 2026 at 9:40 AM Matteo Perin via dev
>>> <[email protected]> wrote:
>>>>
>>>> By default OVN "fails open" for load balancer health checks: a backend
>>>> whose service monitor status is not yet known (empty) is treated as
>>>> available, so traffic is forwarded to it while the first health check
>>>> probes are still pending. This avoids disrupting traffic when health
>>>> checks are first enabled, but it also means traffic can be sent to
>>
>> I really wonder if this is a valid use case we should care about.  If
>> I'm reading this correctly you're saying that the CMS would configure a
>> load balancer (no health check config), then some traffic passes through
>> the LB, then way later the CMS adds the health check config to the LB,
>> right?
>>
>> I agree, traffic would get disrupted if we make "fail closed" the default.
>>
>> But OTOH, is this really something that will happen in practice?  The
>> CMS either configures LBs without health check or configures them with
>> health check from the beginning.
> 
> 
> Hi Dumitru,
> 

Hi Matteo,

> I agree, I was not trying to make a point for the current default behavior
> with
> the CMS example, quite the opposite, actually.
> 
> To give you the full picture, this feature request is coming from our
> internal
> LXD team.
> In LXD, OVN load balancers are already supported but they are about to add
> a LB pools concept. You can, of course, configure health checks on those
> pools.
> When adding instances (VMs and/or containers) to a pool, LXD configures the
> OVN LB accordingly.
> However, to stress this point, a user may want to add an instance to a pool
> which does not yet have any workload running inside on the port the LB is
> targeting and we do not want to forward traffic to these (yet).
> 
> We only care about the "fail closed" scenario to avoid useless/unwanted
> traffic.
> 

Thanks for the additional context!

>>> backends that have never been successfully probed during the initial
>>>> monitoring grace period.
>>>>
>>
>> To be honest, I consider this to be more of a bug in the existing health
>> check implementation.  I can't really imagine a good reason for blindly
>> sending traffic towards a backend when HC is enabled but we couldn't
>> probe the status of the backend yet.
>>
>>>> Add a new "forward_unknown" key to the options column of the
>>>> Load_Balancer_Health_Check table. When set to "false", a backend with
>>>> an unknown (empty) service monitor status is treated as unavailable and
>>>> excluded from the load balancer backends until its first successful
>>>> probe. The default ("true") preserves the existing fail-open behavior,
>>>> so this change is fully backward compatible and opt-in.
>>>>
>>>> No schema change is required as this reuses the existing options map.
>>>>
>>>> Signed-off-by: Matteo Perin <[email protected]>
>>>> ---
>>>> This patch is a proposal for a simple addition to the LB health check
>>>> configuration.
>>>>
>>
>> So, I guess, my question is: should we consider _not_ adding yet another
>> knob and just fixing the behavior for all cases?
> 
> 
> I considered this myself but did not go through with changing the default
> because
> I have not taken part in the discussion about the original implementation
> (even
> though I looked for references about this design choice, without finding
> any) and
> I did not want to preclude any (I agree, arguably not something really seen
> in
> practise) "on the fly" enablement of LB health checks in HA environments
> where
> you may not want to disrupt the traffic even in the initial discovery
> period.
> 
> That said, after reading your points, I would be in favor of making the
> proposed
> behavior the default, given the peculiarity of the above case.
> 
> If you confirm that this is acceptable and the community opinion is
> positive on
> this, I will rework the patch to make the fail-closed behavior the default
> one and
> drop the added configuration option.

I'm adding Numan to the discussion as he's the original author of the
Health Check feature:

https://github.com/ovn-org/ovn/commit/ba0d6ea

> 
> Best Regards,
> Matteo
> 

Regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to