On 4/22/2016 12:24 AM, Joakim Tjernlund wrote:
On Thu, 2016-04-21 at 16:22 -0500, Jafar Al-Gharaibeh wrote:
  ospfd keeps a list of neighbor routers for each configured interface. This
  list is indexed using the neighbor router id in case of point-to-point and
  virtual link types, otherwise the list is indexed using the neighbor's
  source IP (RFC 2328, page 96). The router adds itself as a "pseudo" neighbor
  on each link, and also keeps a pointer called (nbr_self) to the neighbor
  structure. This takes place when the interface is first configured. Currently
  ospfd adds this pseudo neighbor before the link parameters are fully 
configure,
  including whether the link type is point-to-point or virtual link. This causes
  the pseudo neighbor to be always indexed using the source IP address 
regardless
  of th link type. For point-to-point and virtual links, this causes the lookup
  for the pseudo neighbor to always fail because the lookup is done using the
  router id whereas the neighbor was added using its source IP address.
  This becomes really problematic if there is a state change that requires a
  rebuild of nbr_self, changing the router id for example. When resetting
  nbr_self, the router first tries to remove the pseudo neighbor form its
  neighbor list on each link by looking it up and resetting any references to it
  before freeing the neighbor structure. since the lookup fails to retrieve any
  references in the case of point-to-point and virtual links the neighbor
  structure is freed leaving dangling references to it. Any access to the
  neighbor list after that is bound to stumble over this dangling pointer
  causing ospfd to crash.

Signed-off-by: Jafar Al-Gharaibeh <[email protected]>
---
  ospfd/ospf_interface.c |  8 +++++---
  ospfd/ospf_neighbor.c  | 33 ++++++++++++++++++++++++++++++++-
  ospfd/ospfd.c          |  6 +++---
  3 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/ospfd/ospf_interface.c b/ospfd/ospf_interface.c
index f41c4f1..a221851 100644
--- a/ospfd/ospf_interface.c
+++ b/ospfd/ospf_interface.c
@@ -232,8 +232,8 @@ ospf_if_new (struct ospf *ospf, struct interface *ifp, 
struct prefix *p)
    /* Set default values. */
    ospf_if_reset_variables (oi);
- /* Add pseudo neighbor. */
-  oi->nbr_self = ospf_nbr_new (oi);
+  /* Set pseudo neighbor to Null */
+  oi->nbr_self = NULL;
oi->ls_upd_queue = route_table_init ();
    oi->t_ls_upd_event = NULL;
@@ -910,7 +910,9 @@ ospf_vl_new (struct ospf *ospf, struct ospf_vl_data 
*vl_data)
    if (IS_DEBUG_OSPF_EVENT)
      zlog_debug ("ospf_vl_new(): set associated area to the backbone");
- ospf_nbr_add_self (voi);
+  /* Add pseudo neighbor. */
+  ospf_nbr_self_reset (voi);
+
    ospf_area_add_if (voi->area, voi);
ospf_if_stream_set (voi);
diff --git a/ospfd/ospf_neighbor.c b/ospfd/ospf_neighbor.c
index 23a6417..4d87d27 100644
--- a/ospfd/ospf_neighbor.c
+++ b/ospfd/ospf_neighbor.c
@@ -181,6 +181,35 @@ ospf_nbr_delete (struct ospf_neighbor *nbr)
route_unlock_node (rn);
      }
+  else
+    {
+      /*
+       * This neighbor was not found, but before we move on and
+       * free the neighbor structre, make sure that it was not
+       * indexed incorrectly and ended up in the "worng" place
+       */
Is this part still needed? You seem to have fixed the underlying problem now so
this should not happen any more?

  Yes, still needed.

If it still is needed I think an error log msg is appropriate so the remaining 
errors
can be eliminated.

If you have seen my earlier patch, that was what I did, an assert to capture such cases. While my earlier fix was mainly targeting nbr_self and initial setup of p2p links and also changing the router id I suspected that there are other cases that get us into the same bad state, and in fact I did find more scenarios that causes the same crash, like flipping the link type one way or another anytime during runtime.

After analyzing the code and the scenarios of how that happens, I could see different ways to fix the problem, most of them are more intrusive than I'd like. I decided to stop treating the wrong index as a bug, but as a change of state, so instead of finding the neighbor directly (first part of the if statement) and freeing it, we try to find it using a different index and deal with it and free it int the same we we do in the first part of the if statement (no logging there). We just make sure we don't free a neighbor with live pointers.

If you think this still should be treated as a bug and reported I'd be happy to log it, maybe under debug ospf nsm?

I will probably revisit this problem in the [not near] future and test more use case scenarios and probably come up with a new solution, but for now this covered all tested cases on my end, and worked really well.
ospfd doesn't crash anymore because of it.

Cheers,
Jafar


   Jocke
_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev



_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

Reply via email to