On 2023/05/25 16:15, Claudio Jeker wrote: > On Thu, May 25, 2023 at 02:20:37PM +0100, Stuart Henderson wrote: > > On 2023/05/25 15:06, Claudio Jeker wrote: > > > sthen@ reported a bgpd SE crash to me and after inspection of the report > > > it looks like he managed to trigger a mistake in session_process_msg(). > > > When for example a NOTIFICATION message is received then the state change > > > clears the rbuf. Now normally the for loop starts over afterwards and the > > > if (p->rbuf == NULL) at the top causes the function to return. > > > In his case the peer had enough messages queued that the if (++processed > > > > MSG_PROCESS_LIMIT) limit triggered after the NOTIFICATION was processed. > > > This hits a break from the for loop and the code at the end of the > > > function adjusting the rbuf trips over the NULL rbuf pointer. > > > > > > This can be fixed in many ways, I decided to just check at the end of the > > > loop that rbuf is still valid. > > > > Thanks for looking into this. OK. > > > > > Triggering this bug is not trivial so it is hard test but the problem is > > > obvious. > > > > indeed, I don't think I have hit this one at all before. > > > > Running sessions over routes maintained by ospf6d seems a fairly > > good way to trigger a number of issues ;) > > I do run ospf6d on some systems but it seems my setup is too small to > trigger all those strange issues. :(
Somehow I hit that same SE crash twice more overnight, I don't think I've spotted that one before! I have now updated a handful of routers (some to 7.1 or 2022/06/13 i.e. pre-kroute-changes plus your diff, some to -current) all built with debug symbols (I don't usually get even function names from bgpd cores without symbols). I think my main issues come around LS_REFRESH_TIME intervals, when there's loads of churn and "ospf6d: ospf engine" can be busy for minutes at a time (not always, but very often). Don't know if that rings any bells for anyone... (I am now reminded that RTM_DESYNC isn't handled by ospf6d which probably doesn't help matters). With log verbose, the fsm log messages look like this. 2023-05-26T10:30:34.648Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from BCKUP to DR 2023-05-26T10:30:34.648Z ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from FULL to INIT 2023-05-26T10:30:34.651Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from DR to DR 2023-05-26T10:30:34.660Z ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from INIT to EXSTA 2023-05-26T10:30:34.660Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from DR to OTHER 2023-05-26T10:30:34.660Z ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from EXSTA to SNAP 2023-05-26T10:30:35.145Z ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from SNAP to EXCHG 2023-05-26T10:30:42.642Z ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from EXCHG to LOAD 2023-05-26T10:30:55.520Z ospf6d[21331]: nbr_fsm: event LOADING_DONE resulted in action NOTHING and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from LOAD to FULL 2023-05-26T10:30:56.911Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan761 from DR to DR 2023-05-26T10:30:56.911Z ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) from FULL to INIT 2023-05-26T10:30:58.075Z ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) from INIT to EXSTA 2023-05-26T10:30:58.075Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan761 from DR to DR 2023-05-26T10:30:58.142Z ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) from EXSTA to SNAP 2023-05-26T10:30:58.195Z ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) from SNAP to EXCHG 2023-05-26T10:31:24.853Z ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) from EXCHG to FULL 2023-05-26T11:00:32.724Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from OTHER to OTHER 2023-05-26T11:00:32.725Z ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from FULL to INIT 2023-05-26T11:00:32.758Z ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from INIT to 2-WAY 2023-05-26T11:00:32.758Z ospf6d[21331]: nbr_fsm: event ADJ_OK resulted in action EVAL and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from 2-WAY to EXSTA 2023-05-26T11:00:32.758Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from OTHER to OTHER 2023-05-26T11:00:32.758Z ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from EXSTA to SNAP 2023-05-26T11:00:32.812Z ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from SNAP to EXCHG 2023-05-26T11:00:34.585Z ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from EXCHG to LOAD 2023-05-26T11:01:12.579Z ospf6d[21331]: nbr_fsm: event LOADING_DONE resulted in action NOTHING and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from LOAD to FULL 2023-05-26T11:03:42.207Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan761 from DR to DR 2023-05-26T11:03:42.207Z ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) from FULL to INIT 2023-05-26T11:03:42.217Z ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) from INIT to EXSTA 2023-05-26T11:03:42.217Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan761 from DR to DR 2023-05-26T11:03:42.285Z ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) from EXSTA to SNAP 2023-05-26T11:03:42.340Z ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) from SNAP to EXCHG 2023-05-26T11:03:49.011Z ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) from EXCHG to FULL 2023-05-26T11:30:37.115Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from OTHER to BCKUP 2023-05-26T11:30:41.869Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from BCKUP to BCKUP 2023-05-26T11:30:41.873Z ospf6d[21331]: nbr_fsm: event SEQ_NUM_MISMATCH resulted in action RESET_DD and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from FULL to EXSTA 2023-05-26T11:30:47.621Z ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from EXSTA to SNAP 2023-05-26T11:30:47.675Z ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from SNAP to EXCHG 2023-05-26T11:30:49.601Z ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from EXCHG to LOAD 2023-05-26T11:30:55.030Z ospf6d[21331]: nbr_fsm: event LOADING_DONE resulted in action NOTHING and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from LOAD to FULL 2023-05-26T12:00:33.259Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from BCKUP to DR 2023-05-26T12:00:33.259Z ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from FULL to INIT 2023-05-26T12:00:33.262Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from DR to DR 2023-05-26T12:00:33.308Z ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from INIT to EXSTA 2023-05-26T12:00:33.309Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from DR to OTHER 2023-05-26T12:00:33.309Z ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from EXSTA to SNAP 2023-05-26T12:00:34.359Z ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from SNAP to EXCHG 2023-05-26T12:00:42.988Z ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from EXCHG to LOAD 2023-05-26T12:00:54.868Z ospf6d[21331]: nbr_fsm: event LOADING_DONE resulted in action NOTHING and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from LOAD to FULL 2023-05-26T12:30:33.093Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from OTHER to OTHER 2023-05-26T12:30:33.093Z ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from FULL to INIT 2023-05-26T12:30:33.096Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from OTHER to BCKUP 2023-05-26T12:30:38.112Z ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from INIT to EXSTA 2023-05-26T12:30:38.112Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from BCKUP to BCKUP 2023-05-26T12:30:38.112Z ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from EXSTA to SNAP 2023-05-26T12:30:38.113Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from BCKUP to BCKUP 2023-05-26T12:30:38.166Z ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from SNAP to EXCHG 2023-05-26T12:30:39.822Z ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from EXCHG to LOAD 2023-05-26T12:30:54.063Z ospf6d[21331]: nbr_fsm: event LOADING_DONE resulted in action NOTHING and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from LOAD to FULL 2023-05-26T12:31:48.241Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from BCKUP to DR 2023-05-26T12:31:48.241Z ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from FULL to INIT 2023-05-26T12:31:48.242Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from DR to DR 2023-05-26T12:31:48.251Z ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from INIT to EXSTA 2023-05-26T12:31:48.251Z ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing state for interface vlan760 from DR to OTHER 2023-05-26T12:31:48.251Z ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from EXSTA to SNAP 2023-05-26T12:31:48.305Z ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from SNAP to EXCHG 2023-05-26T12:31:50.920Z ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from EXCHG to LOAD 2023-05-26T12:31:52.620Z ospf6d[21331]: nbr_fsm: event LOADING_DONE resulted in action NOTHING and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from LOAD to FULL