Issue #10418 has been updated by Drew Gibson. Assignee changed from eric sorenson to Patrick Otto
Hi Eric, I have a group of 18 CentOS 5 servers that are managed together and some others with similar base builds. I updated my test server and with no issues immediately apparent, I updated the rest of the group. Monitoring system started alerting some time later due to puppetdlock present and server report files too old. >From /var/log/yum.log: `Aug 10 17:33:47 Updated: glibc-common-2.5-81.el5_8.4.i386 Aug 10 17:33:49 Updated: 1:facter-1.6.11-1.el5.i686 Aug 10 17:33:55 Updated: glibc-2.5-81.el5_8.4.i686 Aug 10 17:33:55 Updated: nspr-4.9.1-4.el5_8.i386 Aug 10 17:33:56 Updated: nss-3.13.5-4.el5_8.i386 Aug 10 17:33:58 Updated: initscripts-8.45.42-1.el5.centos.1.i386 Aug 10 17:33:59 Updated: nss-tools-3.13.5-4.el5_8.i386 Aug 10 17:33:59 Updated: 12:dhclient-3.0.5-31.el5_8.1.i386 Aug 10 17:34:00 Updated: sudo-1.7.2p1-14.el5_8.2.i386 Aug 10 17:34:00 Updated: nscd-2.5-81.el5_8.4.i386` After finding there was an issue, I updated a similar server and excluded the above packages before adding the suspect packages one by one while checking for the file /var/lib/puppet/state/puppetdlock, content of last_run_report.yaml and the server-side reports. On failure, the lockfile is created and none of the logs are touched until the service is stopped (service puppet stop) when the last_run_report.yaml and server side report is written with "message: Caught TERM; calling stop" and "status: failed". This was repeated on two more servers with same results. Each server worked until after the facter update. On the test server, I cleared the SSL keys, restarted the puppet service and even rebooted the whole machine. No problems with "puppetd --test" or with "listen=false". Initial thought that x86_64 systems were not affected was due to disabled alerts in our monitoring system from previous testing. Removed custom fact plugin, no change. `yum downgrade facter <snip> Removed: facter.i686 1:1.6.11-1.el5 Installed: facter.i686 1:1.6.10-1.el5 Complete!` and all is well again. Would be happy to run "strace" or other tool but I am not familiar with its use. ---------------------------------------- Bug #10418: Puppet agent hangs when listen is true and reading from /proc filesystem on redhat https://projects.puppetlabs.com/issues/10418#change-69233 Author: Jo Rhett Status: Re-opened Priority: Normal Assignee: Patrick Otto Category: agent Target version: Affected Puppet version: 2.6.12 Keywords: enabledisable hang select proc listen redhat Branch: Mon Oct 31 23:03:31 +0000 2011 Puppet (notice): Caught TERM; calling stop Ever since the 2.6.12 upgrade I've been seeing these reports reach us. As in, about a hundred of a half thou machines. Most of the time we find that $vardir/state/puppetdlock is in place and blocking further puppet runs, which requires a manual resolution. I wrote a quick cron script to look for puppetdlock files older than one hour, remove them and mail me a report and I've received several dozen in the last few hours. Something is clearly broken in 2.6.12, we are backgrading our systems to 2.6.11. No-- I have no other information than that it crosses all of our machine types, and we have had no significant changes in our modules in this time period. Many of the machines which have failed have had zero module or manifest changes which would apply to them. I cannot get this to replicate on the command line. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://projects.puppetlabs.com/my/account -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-bugs?hl=en.
