Hello,

ospfd can't handle changing router-ids.

Consider the following setup, two machines, border3 / superm:

border3:
--------
border3# cat /etc/ospfd.conf
router-id 10.12.95.250
redistribute static set type 2
redistribute connected set type 2
redistribute 10.12.95.250/32 set type 2
redistribute default set { metric 300 type 2 }
area 0.0.0.0 {
        interface em0
}

border3# ifconfig lo0
[...]
        inet 10.12.95.250 netmask 0xffffffff

border3# ifconfig em0
[...]
        inet 10.12.95.162 netmask 0xffffffe0 broadcast 10.12.95.191

superm
------
superm# cat /etc/ospfd.conf
router-id 10.12.80.1
area 0.0.0.0 {
        interface re0
}
superm# ifconfig re0
[...]
        inet 10.12.95.165 netmask 0xffffffe0 broadcast 10.12.95.191

superm# ifconfig em0
[...]
        inet 10.12.80.1 netmask 0xffffff00 broadcast 10.12.80.255



starting ospfd on both machines results in:

border3# ospfctl sh nei
ID              Pri State        DeadTime Address         Iface     Uptime
10.12.80.1      1   FULL/BCKUP   00:00:36 10.12.95.165    em0       00:08:52

superm# ospfctl sh nei
ID              Pri State        DeadTime Address         Iface     Uptime
10.12.95.250    1   FULL/DR      00:00:30 10.12.95.162    re0       00:09:47

changing the router-id on superm from 10.12.80.1 to 10.12.95.165 and
restarting ospfd on superm results in:

border3# ospfctl sh nei
ID              Pri State        DeadTime Address         Iface     Uptime
10.12.80.1      1   EXSTA/OTHER  00:00:35 10.12.95.165    em0       -

border3# tail /var/log/daemon
Jan  6 13:08:43 border3 ospfd[21043]: nbr_adj_timer: failed to form
adjacency with 10.12.80.1

superm# ospfctl sh nei
ID              Pri State        DeadTime Address         Iface     Uptime
10.12.95.250    1   INIT/OTHER   00:00:37 10.12.95.162    re0       -

Note that border3 still tries to talk to router with the old id
10.12.80.1

Restarting ospfd on border3 (where no config changes were done!) is a
workaround.

The problem is in hello.c which tries to match the src IP (which
didn't change). This patch apparently fixes the problem (I couldn't
figure out why hello.c matches the src IP in this case and not the
router-id which is already done in other places)

Index: hello.c
===================================================================
RCS file: /cvs/src/usr.sbin/ospfd/hello.c,v
retrieving revision 1.15
diff -u -r1.15 hello.c
--- hello.c     31 Jan 2009 08:55:00 -0000      1.15
+++ hello.c     6 Jan 2010 13:46:30 -0000
@@ -175,11 +175,11 @@
        case IF_TYPE_BROADCAST:
        case IF_TYPE_NBMA:
        case IF_TYPE_POINTOMULTIPOINT:
-               /* match src IP */
+               /* match router-id */
                LIST_FOREACH(nbr, &iface->nbr_list, entry) {
                        if (nbr == iface->self)
                                continue;
-                       if (nbr->addr.s_addr == src.s_addr)
+                       if (nbr->id.s_addr == rtr_id)
                                break;
                }
                break;

Stopping both ospfds, changing the router-id on superm back to
10.12.80.1 and starting both ospfds (superm: -current ospfd, border3
ospfd with the patch applied) results in:

[r...@border3.ffm2-test.hsgate.de:/usr/src/usr.sbin/ospfd]# ospfctl sh nei
ID              Pri State        DeadTime Address         Iface     Uptime
10.12.80.1      1   FULL/BCKUP   00:00:31 10.12.95.165    em0       00:00:36

[r...@superm.build.hsgate.de:/etc]# ospfctl sh nei
ID              Pri State        DeadTime Address         Iface     Uptime
10.12.95.250    1   FULL/DR      00:00:38 10.12.95.162    re0       00:00:50

If the router-id on superm is now changed to 10.12.95.165 it looks
like this:

border3# ospfctl sh nei
ID              Pri State        DeadTime Address         Iface     Uptime
10.12.95.165    1   FULL/BCKUP   00:00:33 10.12.95.165    em0       00:03:47
10.12.80.1      1   DOWN/OTHER   00:03:17 10.12.95.165    em0       -

superm# ospfctl sh nei
ID              Pri State        DeadTime Address         Iface     Uptime
10.12.95.250    1   FULL/DR      00:00:35 10.12.95.162    re0       00:04:14

The DOWN/OTHER line will disappear after 24 hours.

--------------------------------------------------

While debugging this I observed another problem which I currently
cannot reproduce completely:

If changing router-ids in the reverse order, starting with
10.12.95.165 and then changing it to 10.12.80.1 results in the
following log entry (with the hello.c patch applied):

Jan  6 13:28:36 border3 ospfd[4393]: nbr_fsm: neighbor ID
10.12.95.165, event ADJ_TIMEOUT not expected in state INIT

The state changes to DOWN/OTHER so the line will disappear after 24
hours.

Somehow I manged to get the state to get stuck in INIT/OTHER(?) and ospfd
writing every n minutes "event ADJ_TIMEOUT not expected in state
INIT".
I can no longer reproduce this case but I did write a
patch for this, too:

Index: neighbor.c
===================================================================
RCS file: /cvs/src/usr.sbin/ospfd/neighbor.c,v
retrieving revision 1.39
diff -u -r1.39 neighbor.c
--- neighbor.c  30 Sep 2009 14:39:07 -0000      1.39
+++ neighbor.c  6 Jan 2010 13:46:30 -0000
@@ -62,6 +62,7 @@
     {NBR_STA_ACTIVE,   NBR_EVT_HELLO_RCVD,     NBR_ACT_RST_ITIMER,     0},
     {NBR_STA_BIDIR,    NBR_EVT_2_WAY_RCVD,     NBR_ACT_NOTHING,        0},
     {NBR_STA_INIT,     NBR_EVT_1_WAY_RCVD,     NBR_ACT_NOTHING,        0},
+    {NBR_STA_INIT,     NBR_EVT_ADJTMOUT,       NBR_ACT_DEL,            
NBR_STA_DOWN},
     {NBR_STA_DOWN,     NBR_EVT_HELLO_RCVD,     NBR_ACT_STRT_ITIMER,    
NBR_STA_INIT},
     {NBR_STA_ATTEMPT,  NBR_EVT_HELLO_RCVD,     NBR_ACT_RST_ITIMER,
NBR_STA_INIT},
     {NBR_STA_INIT,     NBR_EVT_2_WAY_RCVD,     NBR_ACT_EVAL,           0},



--------------------------------------------------

The canonical case when this is a problem is forgetting to set the
router-id in the config and ospfd using the smallest configured
IP. This gets much more interesting if the smallest IP happens to be
configured on a carp interface ;)

Regards,
Florian

Reply via email to