I'm seeing a strange startup deadlock issue.  The basic problem is when rsyslog 
starts, it hangs during /etc/init.d script processing.

This is using the centos/redhat adiscon repo for centos 5.  I only encountered 
this problem when upgrading from rsyslog v7 to rsyslog v8.

1) rsyslog 7 does not have this problem.
2) using the v7_stable and v8_stable RPMs.
3) it hangs on some of the servers but not all.  It is repeatable though.  
4) When it happens, it happens on both CentOS 5.9 and 5.10.
5) I'm using the stock rsyslog config from the RPMs.
5) if I kill the child, it is a <zombie>

It is some kind of race condition.  Here's what it looks like:

DR/UAT [root@drukrf56 ~]# ps -efL| grep rsyslog
root      2699  2203  2699  0    1 12:37 ?        00:00:00 /bin/bash 
/etc/rc3.d/S26rsyslog start
root      2702  2699  2702  0    1 12:37 ?        00:00:00 /bin/bash -c ulimit 
-S -c 0 >/dev/null 2>&1 ; /sbin/rsyslogd -i /var/run/rsyslogd.pid -d
root      2703  2702  2703  0    1 12:37 ?        00:00:00 /sbin/rsyslogd -i 
/var/run/rsyslogd.pid -d
root      2704  2703  2704  0    4 12:37 ?        00:00:00 /sbin/rsyslogd -i 
/var/run/rsyslogd.pid -d
root      2704  2703  2705  0    4 12:37 ?        00:00:00 /sbin/rsyslogd -i 
/var/run/rsyslogd.pid -d
root      2704  2703  2706  0    4 12:37 ?        00:00:00 /sbin/rsyslogd -i 
/var/run/rsyslogd.pid -d
root      2704  2703  2707  0    4 12:37 ?        00:00:00 /sbin/rsyslogd -i 
/var/run/rsyslogd.pid -d
root      2912  2845  2912  0    1 13:47 pts/0    00:00:00 grep rsyslog

DR/UAT [root@drukrf56 ~]# strace -p 2703
Process 2703 attached - interrupt to quit
futex(0x2b27ab66a200, FUTEX_WAIT_PRIVATE, 1, NULL

 <unfinished ...>
Process 2703 detached
DR/UAT [root@drukrf56 ~]# strace -p 2704
Process 2704 attached - interrupt to quit
select(1, NULL, NULL, NULL, {83914, 94000}

 <unfinished ...>
Process 2704 detached

Messages are being saved to /var/log/messages as expected.  It just seems like 
the parent doesn't realize the child has completed startup.

If I kill the child, the parent doesn't notice.

DR/UAT [root@drukrf56 ~]# kill 2704
DR/UAT [root@drukrf56 ~]# ps aux | grep rsys
root      2699  0.0  0.0  10924  1448 ?        S    12:37   0:00 /bin/bash 
/etc/rc3.d/S26rsyslog start
root      2702  0.0  0.0  10788  1172 ?        S    12:37   0:00 /bin/bash -c 
ulimit -S -c 0 >/dev/null 2>&1 ; /sbin/rsyslogd -i /var/run/rsyslogd.pid -d
root      2703  0.0  0.0  42724  1916 ?        S    12:37   0:00 /sbin/rsyslogd 
-i /var/run/rsyslogd.pid -d
root      2704  0.0  0.0      0     0 ?        Z    12:37   0:00 [rsyslogd] 
<defunct>
root      2939  0.0  0.0   6056   624 pts/0    S+   13:53   0:00 grep rsys


If I kill -9 the child and the parent, then start rsyslog (/etc/init.d/rsyslog 
start), it starts normally.

One other interesting thing is if I remove the rsyslog from the startup, and 
put /etc/init.d/rsyslog start in /etc/rc.local (ie - at the end), it works.

Has anyone else seen this kind of problem?

Alan Edmonds





_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to