On 2023/05/25 16:15, Claudio Jeker wrote:
> On Thu, May 25, 2023 at 02:20:37PM +0100, Stuart Henderson wrote:
> > On 2023/05/25 15:06, Claudio Jeker wrote:
> > > sthen@ reported a bgpd SE crash to me and after inspection of the report
> > > it looks like he managed to trigger a mistake in session_process_msg().
> > > When for example a NOTIFICATION message is received then the state change
> > > clears the rbuf. Now normally the for loop starts over afterwards and the
> > > if (p->rbuf == NULL) at the top causes the function to return.
> > > In his case the peer had enough messages queued that the if (++processed >
> > > MSG_PROCESS_LIMIT) limit triggered after the NOTIFICATION was processed.
> > > This hits a break from the for loop and the code at the end of the
> > > function adjusting the rbuf trips over the NULL rbuf pointer.
> > > 
> > > This can be fixed in many ways, I decided to just check at the end of the
> > > loop that rbuf is still valid.
> > 
> > Thanks for looking into this. OK.
> > 
> > > Triggering this bug is not trivial so it is hard test but the problem is
> > > obvious.
> > 
> > indeed, I don't think I have hit this one at all before.
> > 
> > Running sessions over routes maintained by ospf6d seems a fairly
> > good way to trigger a number of issues ;)
> 
> I do run ospf6d on some systems but it seems my setup is too small to
> trigger all those strange issues. :(

Somehow I hit that same SE crash twice more overnight, I don't think
I've spotted that one before! I have now updated a handful of routers
(some to 7.1 or 2022/06/13 i.e. pre-kroute-changes plus your diff,
some to -current) all built with debug symbols (I don't usually get
even function names from bgpd cores without symbols).

I think my main issues come around LS_REFRESH_TIME intervals, when
there's loads of churn and "ospf6d: ospf engine" can be busy for
minutes at a time (not always, but very often). Don't know if that rings
any bells for anyone... (I am now reminded that RTM_DESYNC isn't handled
by ospf6d which probably doesn't help matters).

With log verbose, the fsm log messages look like this.

2023-05-26T10:30:34.648Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from BCKUP to DR
2023-05-26T10:30:34.648Z  ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted 
in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from FULL to INIT
2023-05-26T10:30:34.651Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from DR to DR
2023-05-26T10:30:34.660Z  ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted 
in action EVAL and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from 
INIT to EXSTA
2023-05-26T10:30:34.660Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from DR to OTHER
2023-05-26T10:30:34.660Z  ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE 
resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from EXSTA to SNAP
2023-05-26T10:30:35.145Z  ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted 
in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from SNAP to EXCHG
2023-05-26T10:30:42.642Z  ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted 
in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from EXCHG to LOAD
2023-05-26T10:30:55.520Z  ospf6d[21331]: nbr_fsm: event LOADING_DONE resulted 
in action NOTHING and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) 
from LOAD to FULL
2023-05-26T10:30:56.911Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan761 from DR to DR
2023-05-26T10:30:56.911Z  ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted 
in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) 
from FULL to INIT
2023-05-26T10:30:58.075Z  ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted 
in action EVAL and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) from 
INIT to EXSTA
2023-05-26T10:30:58.075Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan761 from DR to DR
2023-05-26T10:30:58.142Z  ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE 
resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.7 
(vlan761) from EXSTA to SNAP
2023-05-26T10:30:58.195Z  ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted 
in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.7 
(vlan761) from SNAP to EXCHG
2023-05-26T10:31:24.853Z  ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted 
in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.7 
(vlan761) from EXCHG to FULL
2023-05-26T11:00:32.724Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from OTHER to OTHER
2023-05-26T11:00:32.725Z  ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted 
in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from FULL to INIT
2023-05-26T11:00:32.758Z  ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted 
in action EVAL and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from 
INIT to 2-WAY
2023-05-26T11:00:32.758Z  ospf6d[21331]: nbr_fsm: event ADJ_OK resulted in 
action EVAL and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from 
2-WAY to EXSTA
2023-05-26T11:00:32.758Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from OTHER to OTHER
2023-05-26T11:00:32.758Z  ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE 
resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from EXSTA to SNAP
2023-05-26T11:00:32.812Z  ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted 
in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from SNAP to EXCHG
2023-05-26T11:00:34.585Z  ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted 
in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from EXCHG to LOAD
2023-05-26T11:01:12.579Z  ospf6d[21331]: nbr_fsm: event LOADING_DONE resulted 
in action NOTHING and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) 
from LOAD to FULL
2023-05-26T11:03:42.207Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan761 from DR to DR
2023-05-26T11:03:42.207Z  ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted 
in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) 
from FULL to INIT
2023-05-26T11:03:42.217Z  ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted 
in action EVAL and changing state for neighbor ID xxxxxxxxxx.7 (vlan761) from 
INIT to EXSTA
2023-05-26T11:03:42.217Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan761 from DR to DR
2023-05-26T11:03:42.285Z  ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE 
resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.7 
(vlan761) from EXSTA to SNAP
2023-05-26T11:03:42.340Z  ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted 
in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.7 
(vlan761) from SNAP to EXCHG
2023-05-26T11:03:49.011Z  ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted 
in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.7 
(vlan761) from EXCHG to FULL
2023-05-26T11:30:37.115Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from OTHER to BCKUP
2023-05-26T11:30:41.869Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from BCKUP to BCKUP
2023-05-26T11:30:41.873Z  ospf6d[21331]: nbr_fsm: event SEQ_NUM_MISMATCH 
resulted in action RESET_DD and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from FULL to EXSTA
2023-05-26T11:30:47.621Z  ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE 
resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from EXSTA to SNAP
2023-05-26T11:30:47.675Z  ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted 
in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from SNAP to EXCHG
2023-05-26T11:30:49.601Z  ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted 
in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from EXCHG to LOAD
2023-05-26T11:30:55.030Z  ospf6d[21331]: nbr_fsm: event LOADING_DONE resulted 
in action NOTHING and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) 
from LOAD to FULL
2023-05-26T12:00:33.259Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from BCKUP to DR
2023-05-26T12:00:33.259Z  ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted 
in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from FULL to INIT
2023-05-26T12:00:33.262Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from DR to DR
2023-05-26T12:00:33.308Z  ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted 
in action EVAL and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from 
INIT to EXSTA
2023-05-26T12:00:33.309Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from DR to OTHER
2023-05-26T12:00:33.309Z  ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE 
resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from EXSTA to SNAP
2023-05-26T12:00:34.359Z  ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted 
in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from SNAP to EXCHG
2023-05-26T12:00:42.988Z  ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted 
in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from EXCHG to LOAD
2023-05-26T12:00:54.868Z  ospf6d[21331]: nbr_fsm: event LOADING_DONE resulted 
in action NOTHING and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) 
from LOAD to FULL
2023-05-26T12:30:33.093Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from OTHER to OTHER
2023-05-26T12:30:33.093Z  ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted 
in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from FULL to INIT
2023-05-26T12:30:33.096Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from OTHER to BCKUP
2023-05-26T12:30:38.112Z  ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted 
in action EVAL and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from 
INIT to EXSTA
2023-05-26T12:30:38.112Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from BCKUP to BCKUP
2023-05-26T12:30:38.112Z  ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE 
resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from EXSTA to SNAP
2023-05-26T12:30:38.113Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from BCKUP to BCKUP
2023-05-26T12:30:38.166Z  ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted 
in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from SNAP to EXCHG
2023-05-26T12:30:39.822Z  ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted 
in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from EXCHG to LOAD
2023-05-26T12:30:54.063Z  ospf6d[21331]: nbr_fsm: event LOADING_DONE resulted 
in action NOTHING and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) 
from LOAD to FULL
2023-05-26T12:31:48.241Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from BCKUP to DR
2023-05-26T12:31:48.241Z  ospf6d[21331]: nbr_fsm: event 1_WAY_RECEIVED resulted 
in action CLEAR_LISTS and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from FULL to INIT
2023-05-26T12:31:48.242Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from DR to DR
2023-05-26T12:31:48.251Z  ospf6d[21331]: nbr_fsm: event 2_WAY_RECEIVED resulted 
in action EVAL and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) from 
INIT to EXSTA
2023-05-26T12:31:48.251Z  ospf6d[21331]: if_fsm: event NEIGHBORCHANGE resulted 
in action ELECT and changing state for interface vlan760 from DR to OTHER
2023-05-26T12:31:48.251Z  ospf6d[21331]: nbr_fsm: event NEGOTIATION_DONE 
resulted in action SNAPSHOT and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from EXSTA to SNAP
2023-05-26T12:31:48.305Z  ospf6d[21331]: nbr_fsm: event SNAPSHOT_DONE resulted 
in action SNAPSHOT_DONE and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from SNAP to EXCHG
2023-05-26T12:31:50.920Z  ospf6d[21331]: nbr_fsm: event EXCHANGE_DONE resulted 
in action EXCHANGE_DONE and changing state for neighbor ID xxxxxxxxxx.13 
(vlan760) from EXCHG to LOAD
2023-05-26T12:31:52.620Z  ospf6d[21331]: nbr_fsm: event LOADING_DONE resulted 
in action NOTHING and changing state for neighbor ID xxxxxxxxxx.13 (vlan760) 
from LOAD to FULL

Reply via email to