Hi,

Problem:
If a client have a state entry in the relayd anchor, and the target server goes down, the client will be unable to "failover" for 10 sec + (10 sec - elapsed time since last SLA check).

There are two issues here, this patch only fix the problem about delayed (10 seconds) failover.

When the host fails the SLA check, it will be marked as being down. However it will not be removed from the achor before the next SLA check.

Reproduce:
Start relayd with -dvvv, let it run for 10-20 seconds, then make a host fail its SLA check. Relayd will mark the host as being down when it reach next SLA check, but the sync_table() will not be called until 10 sec. later (at the next SLA check).

Solution:
The logic is already in the code, but right now it only handle the statistics and set the host as being down.

Call sync_table() when a host goes from UP to DOWN.


Index: pfe.c
===================================================================
RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
retrieving revision 1.79.2.1
diff -u -p -u -p -r1.79.2.1 pfe.c
--- pfe.c       20 Sep 2015 11:20:16 -0000      1.79.2.1
+++ pfe.c       1 Oct 2015 10:48:59 -0000
@@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_
                        table->conf.flags |= F_CHANGED;
                        host->flags |= F_DEL;
                        host->flags &= ~(F_ADD);
+                       pfe_sync();
                }

                host->up = st.up;


If you need more details or want to fix the scheduler issue, please contact me :)


--
bsv

Reply via email to