I have tried the example code in the link and the SIGABRT stack is very similar except that in stack #4 in function malloc_printerr() str = munmap_chunk() instead of free().
--- On Mon, 11/2/09, Li Zhao <[email protected]> wrote: > From: Li Zhao <[email protected]> > Subject: Re: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in > task_done > To: "Ben Greear" <[email protected]> > Cc: [email protected] > Date: Monday, November 2, 2009, 12:31 PM > This is a good link which might be > interesting. > > http://www.ece.ucsb.edu/~kastner/labyrinth/bug1.txt > > > --- On Fri, 10/30/09, Li Zhao <[email protected]> > wrote: > > > From: Li Zhao <[email protected]> > > Subject: Re: [Xorp-hackers] rtrmgr crash on SIGABRT > because of pop_front in task_done > > To: "Ben Greear" <[email protected]> > > Cc: [email protected] > > Date: Friday, October 30, 2009, 10:30 AM > > I thought task manager was fine. But > > it might be that the first node was deleted twice, one > of > > which is this pop_front and another hidden one. > > > > --- On Thu, 10/29/09, Ben Greear <[email protected]> > > wrote: > > > > > From: Ben Greear <[email protected]> > > > Subject: Re: [Xorp-hackers] rtrmgr crash on > SIGABRT > > because of pop_front in task_done > > > To: "Li Zhao" <[email protected]> > > > Cc: [email protected] > > > Date: Thursday, October 29, 2009, 1:26 PM > > > On 10/29/2009 08:16 AM, Li Zhao > > > wrote: > > > > I am puzzled by operator delete(prt=0x0). > But > > inside > > > deallocate(this=0x8d55238, __p=0x8d55238), the > __p is > > not > > > 0x0. pop_front means "removes and deletes". So > > somewhere > > > else this list node was deleted again? > > > > > > > > --- On Thu, 10/29/09, Li Zhao<[email protected]> > > > wrote: > > > > > > > >> From: Li Zhao<[email protected]> > > > >> Subject: [Xorp-hackers] rtrmgr crash on > > SIGABRT > > > because of pop_front in task_done > > > >> To: [email protected] > > > >> Date: Thursday, October 29, 2009, 10:54 > AM > > > >> I added a new protocol and I can > > > >> start it in CLI by command "create > protocol > > XXX", > > > but the > > > >> rtrmgr crashed after command "delete > > protocol > > > XXX". > > > >> I can also easily reproduce the exactlt > same > > crash > > > via the > > > >> following steps: > > > >> > > > >> 0. I am running xorp processes on an > > embedded > > > system. > > > >> 1. start rtrmgr from linux shell on the > > system; > > > >> 2. manually start xorp_static_routes > from > > linux > > > shell. This > > > >> static will hijack the xrl channels to > > rtrmgr; > > > >> 3. use cli command "create protocol > static" > > to > > > start a > > > >> second xorp_static_routes. > > > >> 4. use cli command "delete protocol > static" > > to > > > stop static. > > > >> both xorp_static_routes were > terminated. > > depended > > > process > > > >> like fea, rib and policy were also > > terminated. > > > rtrmgr > > > >> crash. > > > > > > I ran under valgrind, and saw this info: > > > > > > ==27820== Invalid free() / delete / delete[] > > > ==27820== at 0x4A05E3F: operator > delete(void*) > > > (vg_replace_malloc.c:342) > > > ==27820== by 0x463531: > > > > > > __gnu_cxx::new_allocator<std::_List_node<Task*> > > > >::deallocate(std::_List_node<Task*>*, > > unsigned > > > long) (new_a > > > llocator.h:95) > > > ==27820== by 0x462427: > > > std::_List_base<Task*, > std::allocator<Task*> > > > >::_M_put_node(std::_List_node<Task*>*) > > > (stl_list.h:320) > > > ==27820== by 0x46143B: std::list<Task*, > > > std::allocator<Task*> > > > >::_M_erase(std::_List_iterator<Task*>) > > > (stl_list.h:1431) > > > ==27820== by 0x45FF0B: std::list<Task*, > > > std::allocator<Task*> >::pop_front() > > > (stl_list.h:906) > > > ==27820== by 0x45DB73: > > > TaskManager::task_done(bool, std::string > const&) > > > (task.cc:2256) > > > ==27820== by 0x465970: > > > XorpMemberCallback2B0<void, TaskManager, > bool, > > > std::string const&>::dispatch(bool, > > std::string > > > const&) (call > > > back_nodebug.hh:4636) > > > ==27820== by 0x45C540: Task::step8_report() > > > (task.cc:1998) > > > ==27820== by 0x4659DF: > > > XorpMemberCallback0B0<void, > Task>::dispatch() > > > (callback_nodebug.hh:306) > > > ==27820== by 0x449613: > > > > > > Module::terminate_with_prejudice(ref_ptr<XorpCallback0<void> > > > >) (module_manager.cc:218) > > > ==27820== by 0x44F63C: > > > XorpMemberCallback0B1<void, Module, > > > ref_ptr<XorpCallback0<void> > > > >::dispatch() > > > (callback_nodebug.hh:598) > > > ==27820== by 0x549D72: > > > OneoffTimerNode2::expire(XorpTimer&, void*) > > > (timer.cc:167) > > > ==27820== Address 0x50c9340 is 80 bytes inside > a > > > block of size 200 alloc'd > > > ==27820== at 0x4A06FFC: operator > new(unsigned > > > long) (vg_replace_malloc.c:230) > > > ==27820== by 0x42C81F: > > > MasterConfigTree::MasterConfigTree(std::string > > const&, > > > MasterTemplateTree*, ModuleManager&, > > XorpClient&, > > > boo > > > l, bool) (master_conf_tree.cc:119) > > > ==27820== by 0x406ED6: Rtrmgr::run() > > > (main_rtrmgr.cc:319) > > > ==27820== by 0x407E57: main > > > (main_rtrmgr.cc:665) > > > > > > > > > It appears to me that the task-manager object > (this) > > is > > > already deleted when > > > the taskmanager::task_done() method is called. > > > > > > Could probably add some debugging to the > destructors > > and > > > constructors of TaskManager > > > to verify. I have some other things to do > > first..but > > > will look at this a bit later > > > if no one beats me to it. > > > > > > Thanks, > > > Ben > > > > > > -- > > > Ben Greear <[email protected]> > > > Candela Technologies Inc http://www.candelatech.com > > > > > > > > > > > > > > > > > > > _______________________________________________ > Xorp-hackers mailing list > [email protected] > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers > _______________________________________________ Xorp-hackers mailing list [email protected] http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers
