Hi Andreas, we had a public holiday over here yesterday. Thanks for your analysis, this sounds very interesting (and probably explains why I haven't seen it so far). I need to review the code, it's quite a while since I looked at that place the last time, seems it has "worked" for a couple of years. I'll do so later today and let you know the findings. We may need to further instrument/try out some patches...
rainer > -----Original Message----- > From: Andreas Piesk [mailto:[email protected]] > Sent: Tuesday, November 01, 2011 10:28 AM > To: [email protected]; Rainer Gerhards > Subject: Re: [rsyslog] rsyslog hangs at startup > > On 31.10.2011 14:09, Andreas Piesk wrote: > > > > after looking at runtime/modules.c i suspected that the mutex doesn't > get unlocked. > > > > after inserting some more debug printf: > > > > 2426.965520000:2ad3d4b29320: cfline: '$ModLoad imfile' > > 2426.965545000:2ad3d4b29320: Requested to load module 'imfile' > > 2426.965548000:2ad3d4b29320: setting module load/unlock lock > > 2426.965551000:2ad3d4b29320: module load/unload lock set > > 2426.965555000:2ad3d4b29320: loading module > '/lib64/rsyslog/imfile.so' > > 2426.965591000:419d5940: strm 0x2ad3e9475220: file 8 read 537 bytes > > 2426.965733000:419d5940: MsgSetTAG in: len 9, pszBuf: rsyslogd: > > 2426.965740000:419d5940: MsgSetTAG exit: pMsg->iLenTAG 9, pMsg- > >TAG.szBuf: rsyslogd: > > 2426.965776000:419d5940: XXXXX: tryDoAction 0x2ad3e94705a0, pnElem > 1, nElem 1 > > 2426.965780000:419d5940: Action 0x2ad3e94705a0 transitioned to state: > itx > > 2426.965783000:419d5940: entering actionCalldoAction(), state: itx > > 2426.965797000:419d5940: 10.10.0.254 > > 2426.965818000:419d5940: caller requested object 'nsd_ptcp', not > found (iRet -3003) > > 2426.965843000:419d5940: Requested to load module 'lmnsd_ptcp' > > 2426.965846000:419d5940: setting module load/unlock lock > > > > i don't see "imfile: version %s initializing\n", so the problem seems > to be imfile because it > > doesn't get initialized and blocks the mutex. > > > > i've debugged the issue a little bit further and it turned out that > imfile:modInit() hangs at > UseObj(), or to be precise at pthread_mutex_lock(&mutObjGlobalOp). > > i think it's a deadlock between mutObjGlobalOp and mutLoadUnload: > > - thread 2ad3d4b29320 calls modules:Load() which acquires lock > mutLoadUnload > - thread 419d5940 calls obj:UseObj() which aquires lock mutObjGlobalOp > - thread 2ad3d4b29320 calls imfile:modInit() which calls obj:UseObj() > and hangs at lock mutObjGlobalOp > - thread 419d5940 calls modules:Load() and hangs at lock mutLoadUnload > > Rainer, am i right so far? if yes, do you have an idea how to fix it? > > regards, > -ap > > BTW: the issue is not easily reproducible because the timing has to be > right. in a test-VM i > ususally get a hanging rsyslogd every 5 reboots. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

