Li Zhao wrote:
I added a new protocol and I can start it in CLI by command "create protocol XXX", but 
the rtrmgr crashed after command "delete protocol XXX".
I can also easily reproduce the exactlt same crash via the following steps:

0. I am running xorp processes on an embedded system.
1. start rtrmgr from linux shell on the system;
2. manually start xorp_static_routes from linux shell. This static will hijack 
the xrl channels to rtrmgr;
3. use cli command "create protocol static" to start a second 
xorp_static_routes.
4. use cli command "delete protocol static" to stop static. both 
xorp_static_routes were terminated. depended process like fea, rib and policy were also 
terminated. rtrmgr crash.
Ok, the crash is because if you do a pop_front() on an empty list, it's going to crash.

I'm not sure why the list is empty here. Seems to indicate task-manager logic is busted with regard to task list management and/or callbacks are being called against a wrong
task-manager.

Do you actually need to do this operation for your project? If so, you probably will want to investigate task-manager logic in detail to figure out why this is happening.

The attached patch fixes the crash, but the underlying bug persists. Most of the patch is debugging code, but I'm leaving it in my tree because it will help next time we hit a similar problem.

Thanks,
Ben

--
Ben Greear <[email protected]> Candela Technologies Inc http://www.candelatech.com


diff --git a/libxorp/SConscript b/libxorp/SConscript
index 7043ed0..0a7f6c9 100644
--- a/libxorp/SConscript
+++ b/libxorp/SConscript
@@ -47,6 +47,7 @@ sources = [
 	# C++ files
 	'asyncio.cc',
 	'buffered_asyncio.cc',
+        'bug_catcher.cc',
 	'c_format.cc',
 	'callback.cc',
 	'clock.cc',
diff --git a/libxorp/bug_catcher.cc b/libxorp/bug_catcher.cc
new file mode 100644
index 0000000..bb853da
--- /dev/null
+++ b/libxorp/bug_catcher.cc
@@ -0,0 +1,6 @@
+
+#include "bug_catcher.hh"
+
+
+unsigned int BugCatcher::_cnt = 0;
+
diff --git a/libxorp/bug_catcher.hh b/libxorp/bug_catcher.hh
new file mode 100644
index 0000000..a206db7
--- /dev/null
+++ b/libxorp/bug_catcher.hh
@@ -0,0 +1,24 @@
+#ifndef __XORP_BUG_CATCHER_INC__
+#define __XORP_BUG_CATCHER_INC__
+
+#include <assert.h>
+
+class BugCatcher {
+private:
+   unsigned int magic;
+   static unsigned int _cnt;
+
+public:
+#define X_BUG_CATCHER_MAGIC 0x1234543
+   BugCatcher() { magic = X_BUG_CATCHER_MAGIC; _cnt++; }
+   BugCatcher(const BugCatcher& rhs) { magic = rhs.magic; _cnt++; }
+   virtual ~BugCatcher() { assert_not_deleted(); magic = 0xdeadbeef; _cnt--; }
+
+   virtual void assert_not_deleted() const {
+      assert(magic == X_BUG_CATCHER_MAGIC);
+   }
+
+   static int get_instance_count() { return _cnt; }
+};
+
+#endif
diff --git a/rtrmgr/task.cc b/rtrmgr/task.cc
index 12fef6b..0e1d7be 100644
--- a/rtrmgr/task.cc
+++ b/rtrmgr/task.cc
@@ -2245,6 +2245,7 @@ TaskManager::run_task()
 void
 TaskManager::task_done(bool success, const string& errmsg)
 {
+    assert_not_deleted();
     debug_msg("TaskManager::task_done, success: %i errmsg: %s\n", (int)(success), errmsg.c_str());
 
     if (! success) {
@@ -2253,7 +2254,24 @@ TaskManager::task_done(bool success, const string& errmsg)
 	reset();
 	return;
     }
-    _tasklist.pop_front();
+    if (_tasklist.empty()) {
+	/** This indicates we are badly out of sync.  We got some callback we
+	 * weren't expecting, basically.  That this happens with the scenario below
+	 * indicates the task-manager & task logic is busted somewhere.
+	 * Reported by:  Li Zhao
+	 * 1. start rtrmgr from linux shell on the system;
+	 * 2. manually start xorp_static_routes from linux shell. This static will hijack the xrl channels to rtrmgr;
+	 * 3. use cli command "create protocol static" to start a second xorp_static_routes.
+	 * 4. use cli command "delete protocol static" to stop static. both xorp_static_routes
+	 *    were terminated. depended process like fea, rib and policy were also terminated. rtrmgr crash.
+	 *
+	 * With this check for empty task list, it at least doesn't crash, but the logic is still busted somewhere.
+	 */
+	XLOG_ERROR("ERROR:  _tasklist empty in TaskManager::task_done.");
+    }
+    else {
+	_tasklist.pop_front();
+    }
     run_task();
 }
 
diff --git a/rtrmgr/task.hh b/rtrmgr/task.hh
index dc636fe..6424786 100644
--- a/rtrmgr/task.hh
+++ b/rtrmgr/task.hh
@@ -25,7 +25,7 @@
 
 #include <map>
 #include <vector>
-
+#include "libxorp/bug_catcher.hh"
 #include "libxorp/run_command.hh"
 #include "libxorp/status_codes.h"
 
@@ -505,7 +505,7 @@ private:
     bool	_verbose;	 // Set to true if output is verbose
 };
 
-class TaskManager {
+class TaskManager : public BugCatcher {
     typedef XorpCallback2<void, bool, string>::RefPtr CallBack;
 
 public:
_______________________________________________
Xorp-hackers mailing list
[email protected]
http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers

Reply via email to