On Wed, Dec 16, 2020 at 10:41:42PM +0000, Job Snijders wrote: > On Tue, Dec 15, 2020 at 05:02:19PM +0100, Claudio Jeker wrote: > > On Mon, Dec 14, 2020 at 06:22:09PM +0000, Job Snijders wrote: > > > This patch appears to be a very elegant solution to a thorny subtle > > > problem: what to do when a peer is not accepting new routing > > > information from you? > > > > One thing I'm unsure about is the value of the SendHold timer. I reused > > the hold timer value with the assumption that for dead connections the > > regular hold timer expires before the SendHold timer (the send buffer > > needs to be full before send starts blocking). > > Let's be conservative while being progressive! :-) > > If the 'Send Hold Timer' value is moved from 'infinite' to 90 seconds > ("The suggested default value for the HoldTime", RFC 4271), I think > we'll be able to see benefits. > > > People should look out for cases where the SendHold timer triggered before > > either a NOTIFICATION form the peer arrived or where the SendHold timer > > triggered before the HoldTimer. Now that may be tricky since both SendHold > > and Hold timer trigger the same EVNT_TIMER_HOLDTIME event so they can not > > be distinguished easily. > > Separation of the cases might be helpful. > > > I think that the SendHold timer will almost never trigger and if it does > > only for the case where a session only works in one direction. > > If it is rare, maybe it should be logged as a unique message: > > "SendHoldTimer Expired". >
This diff does both suggestions. Adds a new event and a uses the default hold time of 90 sec for the send timeout (unless holdtime is larger than 90 sec in which case holdtime is used). This will show up in the logs like this: neighbor (badjojo): sending notification: HoldTimer expired neighbor (badjojo): state change Established -> Idle, reason: SendHoldTimer expired In bgpctl show nei badjojo it will show this: Last error sent: HoldTimer expired Since there is no NOTIFICATION error code for SendHoldTimer expired this is about the best we can do for now. -- :wq Claudio Index: bgpd.h =================================================================== RCS file: /cvs/src/usr.sbin/bgpd/bgpd.h,v retrieving revision 1.405 diff -u -p -r1.405 bgpd.h --- bgpd.h 5 Nov 2020 11:52:59 -0000 1.405 +++ bgpd.h 17 Dec 2020 12:01:38 -0000 @@ -1377,6 +1377,7 @@ static const char * const eventnames[] = "ConnectRetryTimer expired", "HoldTimer expired", "KeepaliveTimer expired", + "SendHoldTimer expired", "OPEN message received", "KEEPALIVE message received", "UPDATE message received", @@ -1467,6 +1468,7 @@ static const char * const timernames[] = "ConnectRetryTimer", "KeepaliveTimer", "HoldTimer", + "SendHoldTimer", "IdleHoldTimer", "IdleHoldResetTimer", "CarpUndemoteTimer", Index: session.c =================================================================== RCS file: /cvs/src/usr.sbin/bgpd/session.c,v retrieving revision 1.406 diff -u -p -r1.406 session.c --- session.c 11 Dec 2020 12:00:01 -0000 1.406 +++ session.c 17 Dec 2020 12:11:35 -0000 @@ -375,6 +375,9 @@ session_main(int debug, int verbose) case Timer_Hold: bgp_fsm(p, EVNT_TIMER_HOLDTIME); break; + case Timer_SendHold: + bgp_fsm(p, EVNT_TIMER_SENDHOLD); + break; case Timer_ConnectRetry: bgp_fsm(p, EVNT_TIMER_CONNRETRY); break; @@ -597,6 +600,7 @@ bgp_fsm(struct peer *peer, enum session_ switch (event) { case EVNT_START: timer_stop(&peer->timers, Timer_Hold); + timer_stop(&peer->timers, Timer_SendHold); timer_stop(&peer->timers, Timer_Keepalive); timer_stop(&peer->timers, Timer_IdleHold); @@ -709,6 +713,7 @@ bgp_fsm(struct peer *peer, enum session_ change_state(peer, STATE_IDLE, event); break; case EVNT_TIMER_HOLDTIME: + case EVNT_TIMER_SENDHOLD: session_notification(peer, ERR_HOLDTIMEREXPIRED, 0, NULL, 0); change_state(peer, STATE_IDLE, event); @@ -749,6 +754,7 @@ bgp_fsm(struct peer *peer, enum session_ change_state(peer, STATE_IDLE, event); break; case EVNT_TIMER_HOLDTIME: + case EVNT_TIMER_SENDHOLD: session_notification(peer, ERR_HOLDTIMEREXPIRED, 0, NULL, 0); change_state(peer, STATE_IDLE, event); @@ -784,6 +790,7 @@ bgp_fsm(struct peer *peer, enum session_ change_state(peer, STATE_IDLE, event); break; case EVNT_TIMER_HOLDTIME: + case EVNT_TIMER_SENDHOLD: session_notification(peer, ERR_HOLDTIMEREXPIRED, 0, NULL, 0); change_state(peer, STATE_IDLE, event); @@ -875,6 +882,7 @@ change_state(struct peer *peer, enum ses timer_stop(&peer->timers, Timer_ConnectRetry); timer_stop(&peer->timers, Timer_Keepalive); timer_stop(&peer->timers, Timer_Hold); + timer_stop(&peer->timers, Timer_SendHold); timer_stop(&peer->timers, Timer_IdleHold); timer_stop(&peer->timers, Timer_IdleHoldReset); session_close_connection(peer); @@ -923,6 +931,7 @@ change_state(struct peer *peer, enum ses timer_stop(&peer->timers, Timer_ConnectRetry); timer_stop(&peer->timers, Timer_Keepalive); timer_stop(&peer->timers, Timer_Hold); + timer_stop(&peer->timers, Timer_SendHold); timer_stop(&peer->timers, Timer_IdleHold); timer_stop(&peer->timers, Timer_IdleHoldReset); session_close_connection(peer); @@ -1780,6 +1789,10 @@ session_dispatch_msg(struct pollfd *pfd, return (1); } p->stats.last_write = getmonotime(); + if (p->holdtime > 0) + timer_set(&p->timers, Timer_SendHold, + p->holdtime < INTERVAL_HOLD ? INTERVAL_HOLD : + p->holdtime); if (p->throttled && p->wbuf.queued < SESS_MSG_LOW_MARK) { if (imsg_rde(IMSG_XON, p->conf.id, NULL, 0) == -1) log_peer_warn(&p->conf, "imsg_compose XON"); Index: session.h =================================================================== RCS file: /cvs/src/usr.sbin/bgpd/session.h,v retrieving revision 1.148 diff -u -p -r1.148 session.h --- session.h 11 Dec 2020 12:00:01 -0000 1.148 +++ session.h 17 Dec 2020 12:00:46 -0000 @@ -59,6 +59,7 @@ enum session_events { EVNT_TIMER_CONNRETRY, EVNT_TIMER_HOLDTIME, EVNT_TIMER_KEEPALIVE, + EVNT_TIMER_SENDHOLD, EVNT_RCVD_OPEN, EVNT_RCVD_KEEPALIVE, EVNT_RCVD_UPDATE, @@ -180,6 +181,7 @@ enum Timer { Timer_ConnectRetry, Timer_Keepalive, Timer_Hold, + Timer_SendHold, Timer_IdleHold, Timer_IdleHoldReset, Timer_CarpUndemote,