Li Zhao wrote:
I added a new protocol and I can start it in CLI by command "create protocol XXX", but
the rtrmgr crashed after command "delete protocol XXX".
I can also easily reproduce the exactlt same crash via the following steps:
0. I am running xorp processes on an embedded system.
1. start rtrmgr from linux shell on the system;
2. manually start xorp_static_routes from linux shell. This static will hijack
the xrl channels to rtrmgr;
3. use cli command "create protocol static" to start a second
xorp_static_routes.
4. use cli command "delete protocol static" to stop static. both
xorp_static_routes were terminated. depended process like fea, rib and policy were also
terminated. rtrmgr crash.
Ok, the crash is because if you do a pop_front() on an empty list, it's
going to crash.
I'm not sure why the list is empty here. Seems to indicate task-manager
logic is busted
with regard to task list management and/or callbacks are being called
against a wrong
task-manager.
Do you actually need to do this operation for your project? If so, you
probably will want
to investigate task-manager logic in detail to figure out why this is
happening.
The attached patch fixes the crash, but the underlying bug persists.
Most of the patch is debugging
code, but I'm leaving it in my tree because it will help next time we
hit a similar problem.
Thanks,
Ben
--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com
diff --git a/libxorp/SConscript b/libxorp/SConscript
index 7043ed0..0a7f6c9 100644
--- a/libxorp/SConscript
+++ b/libxorp/SConscript
@@ -47,6 +47,7 @@ sources = [
# C++ files
'asyncio.cc',
'buffered_asyncio.cc',
+ 'bug_catcher.cc',
'c_format.cc',
'callback.cc',
'clock.cc',
diff --git a/libxorp/bug_catcher.cc b/libxorp/bug_catcher.cc
new file mode 100644
index 0000000..bb853da
--- /dev/null
+++ b/libxorp/bug_catcher.cc
@@ -0,0 +1,6 @@
+
+#include "bug_catcher.hh"
+
+
+unsigned int BugCatcher::_cnt = 0;
+
diff --git a/libxorp/bug_catcher.hh b/libxorp/bug_catcher.hh
new file mode 100644
index 0000000..a206db7
--- /dev/null
+++ b/libxorp/bug_catcher.hh
@@ -0,0 +1,24 @@
+#ifndef __XORP_BUG_CATCHER_INC__
+#define __XORP_BUG_CATCHER_INC__
+
+#include <assert.h>
+
+class BugCatcher {
+private:
+ unsigned int magic;
+ static unsigned int _cnt;
+
+public:
+#define X_BUG_CATCHER_MAGIC 0x1234543
+ BugCatcher() { magic = X_BUG_CATCHER_MAGIC; _cnt++; }
+ BugCatcher(const BugCatcher& rhs) { magic = rhs.magic; _cnt++; }
+ virtual ~BugCatcher() { assert_not_deleted(); magic = 0xdeadbeef; _cnt--; }
+
+ virtual void assert_not_deleted() const {
+ assert(magic == X_BUG_CATCHER_MAGIC);
+ }
+
+ static int get_instance_count() { return _cnt; }
+};
+
+#endif
diff --git a/rtrmgr/task.cc b/rtrmgr/task.cc
index 12fef6b..0e1d7be 100644
--- a/rtrmgr/task.cc
+++ b/rtrmgr/task.cc
@@ -2245,6 +2245,7 @@ TaskManager::run_task()
void
TaskManager::task_done(bool success, const string& errmsg)
{
+ assert_not_deleted();
debug_msg("TaskManager::task_done, success: %i errmsg: %s\n", (int)(success), errmsg.c_str());
if (! success) {
@@ -2253,7 +2254,24 @@ TaskManager::task_done(bool success, const string& errmsg)
reset();
return;
}
- _tasklist.pop_front();
+ if (_tasklist.empty()) {
+ /** This indicates we are badly out of sync. We got some callback we
+ * weren't expecting, basically. That this happens with the scenario below
+ * indicates the task-manager & task logic is busted somewhere.
+ * Reported by: Li Zhao
+ * 1. start rtrmgr from linux shell on the system;
+ * 2. manually start xorp_static_routes from linux shell. This static will hijack the xrl channels to rtrmgr;
+ * 3. use cli command "create protocol static" to start a second xorp_static_routes.
+ * 4. use cli command "delete protocol static" to stop static. both xorp_static_routes
+ * were terminated. depended process like fea, rib and policy were also terminated. rtrmgr crash.
+ *
+ * With this check for empty task list, it at least doesn't crash, but the logic is still busted somewhere.
+ */
+ XLOG_ERROR("ERROR: _tasklist empty in TaskManager::task_done.");
+ }
+ else {
+ _tasklist.pop_front();
+ }
run_task();
}
diff --git a/rtrmgr/task.hh b/rtrmgr/task.hh
index dc636fe..6424786 100644
--- a/rtrmgr/task.hh
+++ b/rtrmgr/task.hh
@@ -25,7 +25,7 @@
#include <map>
#include <vector>
-
+#include "libxorp/bug_catcher.hh"
#include "libxorp/run_command.hh"
#include "libxorp/status_codes.h"
@@ -505,7 +505,7 @@ private:
bool _verbose; // Set to true if output is verbose
};
-class TaskManager {
+class TaskManager : public BugCatcher {
typedef XorpCallback2<void, bool, string>::RefPtr CallBack;
public:
_______________________________________________
Xorp-hackers mailing list
[email protected]
http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers