Issue #10418 has been updated by Jason Smith.

I also observed the exact same behavior, right down to the identical strace for 
the hung puppet daemon.  When looking at lsof and the strace together I see 
that puppet is waiting on the puppet agent listening port (as expected) and a 
read fd open to a /proc path, often /proc/cpuinfo, but not always.  At first I 
also thought it was the updated puppet version 2.6.12, but after testing 
several combinations, I think I narrowed it down to the RHEL5.7 kernel version 
(2.6.18-274.7.1.el5).  Any system using this kernel, no matter what puppet 
version I try, always hangs.  If I reboot a system with a hung puppet daemon 
into an earlier RHEL5.7 kernel then puppet starts to work again.  Note, the 
supposed bad RHEL5.7 kernel was released just a few days before the most recent 
puppet security update, on October 20th, see: 
[RHSA-2011-1386](http://rhn.redhat.com/errata/RHSA-2011-1386.html).  Could 
puppet be hung waiting to read info from /proc and this kernel has a bug 
somewhere in /proc?  I also tried searching RedHat's bugzilla and didn't see 
any obvious related bugs yet, but it has only been 2 weeks since the kernel was 
released.
----------------------------------------
Bug #10418: "Caught TERM; calling stop" with state/puppetdlock left in place
https://projects.puppetlabs.com/issues/10418

Author: Jo Rhett
Status: Investigating
Priority: Normal
Assignee: 
Category: agent
Target version: 
Affected Puppet version: 2.6.12
Keywords: enabledisable
Branch: 


Mon Oct 31 23:03:31 +0000 2011 Puppet (notice): Caught TERM; calling stop

Ever since the 2.6.12 upgrade I've been seeing these reports reach us. As in, 
about a hundred of a half thou machines. Most of the time we find that 
$vardir/state/puppetdlock is in place and blocking further puppet runs, which 
requires a manual resolution.

I wrote a quick cron script to look for puppetdlock files older than one hour, 
remove them and mail me a report and I've received several dozen in the last 
few hours. Something is clearly broken in 2.6.12, we are backgrading our 
systems to 2.6.11.

No-- I have no other information than that it crosses all of our machine types, 
and we have had no significant changes in our modules in this time period.  
Many of the machines which have failed have had zero module or manifest changes 
which would apply to them.  I cannot get this to replicate on the command line.


-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

Reply via email to