I'm seeing a strange startup deadlock issue. The basic problem is when rsyslog
starts, it hangs during /etc/init.d script processing.
This is using the centos/redhat adiscon repo for centos 5. I only encountered
this problem when upgrading from rsyslog v7 to rsyslog v8.
1) rsyslog 7 does not have this problem.
2) using the v7_stable and v8_stable RPMs.
3) it hangs on some of the servers but not all. It is repeatable though.
4) When it happens, it happens on both CentOS 5.9 and 5.10.
5) I'm using the stock rsyslog config from the RPMs.
5) if I kill the child, it is a <zombie>
It is some kind of race condition. Here's what it looks like:
DR/UAT [root@drukrf56 ~]# ps -efL| grep rsyslog
root 2699 2203 2699 0 1 12:37 ? 00:00:00 /bin/bash
/etc/rc3.d/S26rsyslog start
root 2702 2699 2702 0 1 12:37 ? 00:00:00 /bin/bash -c ulimit
-S -c 0 >/dev/null 2>&1 ; /sbin/rsyslogd -i /var/run/rsyslogd.pid -d
root 2703 2702 2703 0 1 12:37 ? 00:00:00 /sbin/rsyslogd -i
/var/run/rsyslogd.pid -d
root 2704 2703 2704 0 4 12:37 ? 00:00:00 /sbin/rsyslogd -i
/var/run/rsyslogd.pid -d
root 2704 2703 2705 0 4 12:37 ? 00:00:00 /sbin/rsyslogd -i
/var/run/rsyslogd.pid -d
root 2704 2703 2706 0 4 12:37 ? 00:00:00 /sbin/rsyslogd -i
/var/run/rsyslogd.pid -d
root 2704 2703 2707 0 4 12:37 ? 00:00:00 /sbin/rsyslogd -i
/var/run/rsyslogd.pid -d
root 2912 2845 2912 0 1 13:47 pts/0 00:00:00 grep rsyslog
DR/UAT [root@drukrf56 ~]# strace -p 2703
Process 2703 attached - interrupt to quit
futex(0x2b27ab66a200, FUTEX_WAIT_PRIVATE, 1, NULL
<unfinished ...>
Process 2703 detached
DR/UAT [root@drukrf56 ~]# strace -p 2704
Process 2704 attached - interrupt to quit
select(1, NULL, NULL, NULL, {83914, 94000}
<unfinished ...>
Process 2704 detached
Messages are being saved to /var/log/messages as expected. It just seems like
the parent doesn't realize the child has completed startup.
If I kill the child, the parent doesn't notice.
DR/UAT [root@drukrf56 ~]# kill 2704
DR/UAT [root@drukrf56 ~]# ps aux | grep rsys
root 2699 0.0 0.0 10924 1448 ? S 12:37 0:00 /bin/bash
/etc/rc3.d/S26rsyslog start
root 2702 0.0 0.0 10788 1172 ? S 12:37 0:00 /bin/bash -c
ulimit -S -c 0 >/dev/null 2>&1 ; /sbin/rsyslogd -i /var/run/rsyslogd.pid -d
root 2703 0.0 0.0 42724 1916 ? S 12:37 0:00 /sbin/rsyslogd
-i /var/run/rsyslogd.pid -d
root 2704 0.0 0.0 0 0 ? Z 12:37 0:00 [rsyslogd]
<defunct>
root 2939 0.0 0.0 6056 624 pts/0 S+ 13:53 0:00 grep rsys
If I kill -9 the child and the parent, then start rsyslog (/etc/init.d/rsyslog
start), it starts normally.
One other interesting thing is if I remove the rsyslog from the startup, and
put /etc/init.d/rsyslog start in /etc/rc.local (ie - at the end), it works.
Has anyone else seen this kind of problem?
Alan Edmonds
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.