This patch adds reset n_count regardless of svc state and does only
if n_count >= configured_count.

The behavior hc with windows server as backend in load balancers show some
issue with counters, e.g.:

ovn-controller --> win_srv | SYN
ovn-controller <-- win_srv | SYN_ACK <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
                             n_success == success_count => status = online; 
n_success = 0
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
                             increase n_failures (count = 1)
After wait_time:
ovn-controller --> win_srv | SYN
ovn-controller <-- win_srv | SYN_ACK <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
                             n_success == success_count => status = online; 
n_success = 0
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
                             increase n_failures (count = 2)
After wait_time:
ovn-controller --> win_srv | SYN
ovn-controller <-- win_srv | SYN_ACK <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
                             n_success == success_count => status = online; 
n_success = 0
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
                             increase n_failures (count = 3)
                             n_failures == failure_count => status = offline; 
n_failures = 0

So, the main point is svc reset the counters only for current state if
n_success >= success_count, but if a backend of lb for some reason
sent RST to ovn-controller it's just increase count and will reset only then
state is offline and greater or equal than failure_count. The same for SYN_ACK.

Signed-off-by: Evgenii Kovalev <[email protected]>
---
 controller/pinctrl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index f98f6d70c..158e0ee75 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -7761,7 +7761,9 @@ svc_monitors_run(struct rconn *swconn,
             if (svc_mon->n_success >= svc_mon->success_count) {
                 svc_mon->status = SVC_MON_ST_ONLINE;
                 svc_mon->n_success = 0;
+                svc_mon->n_failures = 0;
             }
+
             if (current_time >= svc_mon->next_send_time) {
                 svc_monitor_send_health_check(swconn, svc_mon);
                 next_run_time = svc_mon->wait_time;
@@ -7773,6 +7775,7 @@ svc_monitors_run(struct rconn *swconn,
         case SVC_MON_S_OFFLINE:
             if (svc_mon->n_failures >= svc_mon->failure_count) {
                 svc_mon->status = SVC_MON_ST_OFFLINE;
+                svc_mon->n_success = 0;
                 svc_mon->n_failures = 0;
             }
 
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to