Re: relayd patch - delayed failover
i believe i committed the correct one, i just replied to the wrong mail here on the list. Here is what i put in: http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/relayd/pfe.c.diff?r1=1.82&r2=1.83&sortby=date /Benno Correct, thank you. Sebastian Benoit skrev den 2015-12-03 17:43: >thanks, commited > >Brian S. Vangsgaard(b...@avalanic.dk) on 2015.10.01 13:27:12 +0200: >>Hi, >> >>Problem: >>If a client have a state entry in the relayd anchor, and the target >>server goes down, the client will be unable to "failover" for 10 sec + >>(10 sec - elapsed time since last SLA check). >> >>There are two issues here, this patch only fix the problem about >>delayed >>(10 seconds) failover. >> >>When the host fails the SLA check, it will be marked as being down. >>However it will not be removed from the achor before the next SLA >>check. >> >>Reproduce: >>Start relayd with -dvvv, let it run for 10-20 seconds, then make a >>host >>fail its SLA check. Relayd will mark the host as being down when it >>reach next SLA check, but the sync_table() will not be called until 10 >>sec. later (at the next SLA check). >> >>Solution: >>The logic is already in the code, but right now it only handle the >>statistics and set the host as being down. >> >>Call sync_table() when a host goes from UP to DOWN. >> >> >>Index: pfe.c >>=== >>RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v >>retrieving revision 1.79.2.1 >>diff -u -p -u -p -r1.79.2.1 pfe.c >>--- pfe.c 20 Sep 2015 11:20:16 - 1.79.2.1 >>+++ pfe.c 1 Oct 2015 10:48:59 - >>@@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_ >>table->conf.flags |= F_CHANGED; >>host->flags |= F_DEL; >>host->flags &= ~(F_ADD); >>+ pfe_sync(); >>} >> >>host->up = st.up; >> >> >>If you need more details or want to fix the scheduler issue, please >>contact me :) >> >> >>-- >>bsv >>
Re: relayd patch - delayed failover
Brian S. Vangsgaard(b...@avalanic.dk) on 2015.12.04 09:04:19 +0100: > Hi Sebastian > > You commited the wrong patch. > > Please see http://marc.info/?l=openbsd-tech&m=144378086813524&w=2 > > The patch below, results in a relayd panic if more than one host is > available in the group. i believe i committed the correct one, i just replied to the wrong mail here on the list. Here is what i put in: http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/relayd/pfe.c.diff?r1=1.82&r2=1.83&sortby=date /Benno > Sebastian Benoit skrev den 2015-12-03 17:43: > >thanks, commited > > > >Brian S. Vangsgaard(b...@avalanic.dk) on 2015.10.01 13:27:12 +0200: > >>Hi, > >> > >>Problem: > >>If a client have a state entry in the relayd anchor, and the target > >>server goes down, the client will be unable to "failover" for 10 sec + > >>(10 sec - elapsed time since last SLA check). > >> > >>There are two issues here, this patch only fix the problem about > >>delayed > >>(10 seconds) failover. > >> > >>When the host fails the SLA check, it will be marked as being down. > >>However it will not be removed from the achor before the next SLA > >>check. > >> > >>Reproduce: > >>Start relayd with -dvvv, let it run for 10-20 seconds, then make a > >>host > >>fail its SLA check. Relayd will mark the host as being down when it > >>reach next SLA check, but the sync_table() will not be called until 10 > >>sec. later (at the next SLA check). > >> > >>Solution: > >>The logic is already in the code, but right now it only handle the > >>statistics and set the host as being down. > >> > >>Call sync_table() when a host goes from UP to DOWN. > >> > >> > >>Index: pfe.c > >>=== > >>RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v > >>retrieving revision 1.79.2.1 > >>diff -u -p -u -p -r1.79.2.1 pfe.c > >>--- pfe.c 20 Sep 2015 11:20:16 - 1.79.2.1 > >>+++ pfe.c 1 Oct 2015 10:48:59 - > >>@@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_ > >>table->conf.flags |= F_CHANGED; > >>host->flags |= F_DEL; > >>host->flags &= ~(F_ADD); > >>+ pfe_sync(); > >>} > >> > >>host->up = st.up; > >> > >> > >>If you need more details or want to fix the scheduler issue, please > >>contact me :) > >> > >> > >>-- > >>bsv > >> > --
Re: relayd patch - delayed failover
Hi Sebastian You commited the wrong patch. Please see http://marc.info/?l=openbsd-tech&m=144378086813524&w=2 The patch below, results in a relayd panic if more than one host is available in the group. Sebastian Benoit skrev den 2015-12-03 17:43: thanks, commited Brian S. Vangsgaard(b...@avalanic.dk) on 2015.10.01 13:27:12 +0200: Hi, Problem: If a client have a state entry in the relayd anchor, and the target server goes down, the client will be unable to "failover" for 10 sec + (10 sec - elapsed time since last SLA check). There are two issues here, this patch only fix the problem about delayed (10 seconds) failover. When the host fails the SLA check, it will be marked as being down. However it will not be removed from the achor before the next SLA check. Reproduce: Start relayd with -dvvv, let it run for 10-20 seconds, then make a host fail its SLA check. Relayd will mark the host as being down when it reach next SLA check, but the sync_table() will not be called until 10 sec. later (at the next SLA check). Solution: The logic is already in the code, but right now it only handle the statistics and set the host as being down. Call sync_table() when a host goes from UP to DOWN. Index: pfe.c === RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v retrieving revision 1.79.2.1 diff -u -p -u -p -r1.79.2.1 pfe.c --- pfe.c 20 Sep 2015 11:20:16 - 1.79.2.1 +++ pfe.c 1 Oct 2015 10:48:59 - @@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_ table->conf.flags |= F_CHANGED; host->flags |= F_DEL; host->flags &= ~(F_ADD); + pfe_sync(); } host->up = st.up; If you need more details or want to fix the scheduler issue, please contact me :) -- bsv
Re: relayd patch - delayed failover
thanks, commited Brian S. Vangsgaard(b...@avalanic.dk) on 2015.10.01 13:27:12 +0200: > Hi, > > Problem: > If a client have a state entry in the relayd anchor, and the target > server goes down, the client will be unable to "failover" for 10 sec + > (10 sec - elapsed time since last SLA check). > > There are two issues here, this patch only fix the problem about delayed > (10 seconds) failover. > > When the host fails the SLA check, it will be marked as being down. > However it will not be removed from the achor before the next SLA check. > > Reproduce: > Start relayd with -dvvv, let it run for 10-20 seconds, then make a host > fail its SLA check. Relayd will mark the host as being down when it > reach next SLA check, but the sync_table() will not be called until 10 > sec. later (at the next SLA check). > > Solution: > The logic is already in the code, but right now it only handle the > statistics and set the host as being down. > > Call sync_table() when a host goes from UP to DOWN. > > > Index: pfe.c > === > RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v > retrieving revision 1.79.2.1 > diff -u -p -u -p -r1.79.2.1 pfe.c > --- pfe.c 20 Sep 2015 11:20:16 - 1.79.2.1 > +++ pfe.c 1 Oct 2015 10:48:59 - > @@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_ > table->conf.flags |= F_CHANGED; > host->flags |= F_DEL; > host->flags &= ~(F_ADD); > + pfe_sync(); > } > > host->up = st.up; > > > If you need more details or want to fix the scheduler issue, please > contact me :) > > > -- > bsv > --
Re: relayd patch - delayed failover
Hi again, Just found a bug in the patch, while testing I only use one host in each group, failover using another group. This works, but only calling sync_table() with multiple hosts in a group (we want that :) ), causes the parent to exit when calling sync_table(). I'll rework the patch and do more testing before submitting again. Solution: The logic is already in the code, but right now it only handle the statistics and set the host as being down. Call sync_table() when a host goes from UP to DOWN. Index: pfe.c === RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v retrieving revision 1.79.2.1 diff -u -p -u -p -r1.79.2.1 pfe.c --- pfe.c 20 Sep 2015 11:20:16 - 1.79.2.1 +++ pfe.c 1 Oct 2015 10:48:59 - @@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_ table->conf.flags |= F_CHANGED; host->flags |= F_DEL; host->flags &= ~(F_ADD); + pfe_sync(); } host->up = st.up; -- bsv
relayd patch - delayed failover
Hi, Problem: If a client have a state entry in the relayd anchor, and the target server goes down, the client will be unable to "failover" for 10 sec + (10 sec - elapsed time since last SLA check). There are two issues here, this patch only fix the problem about delayed (10 seconds) failover. When the host fails the SLA check, it will be marked as being down. However it will not be removed from the achor before the next SLA check. Reproduce: Start relayd with -dvvv, let it run for 10-20 seconds, then make a host fail its SLA check. Relayd will mark the host as being down when it reach next SLA check, but the sync_table() will not be called until 10 sec. later (at the next SLA check). Solution: The logic is already in the code, but right now it only handle the statistics and set the host as being down. Call sync_table() when a host goes from UP to DOWN. Index: pfe.c === RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v retrieving revision 1.79.2.1 diff -u -p -u -p -r1.79.2.1 pfe.c --- pfe.c 20 Sep 2015 11:20:16 - 1.79.2.1 +++ pfe.c 1 Oct 2015 10:48:59 - @@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_ table->conf.flags |= F_CHANGED; host->flags |= F_DEL; host->flags &= ~(F_ADD); + pfe_sync(); } host->up = st.up; If you need more details or want to fix the scheduler issue, please contact me :) -- bsv