Issue #10418 has been updated by Drew Gibson.

Assignee changed from eric sorenson to Patrick Otto

Hi Eric, I have a group of 18 CentOS 5 servers that are managed together and 
some others with similar base builds.
I updated my test server and with no issues immediately apparent, I updated the 
rest of the group. Monitoring system started alerting some time later due to 
puppetdlock present and server report files too old.

>From /var/log/yum.log:

`Aug 10 17:33:47 Updated: glibc-common-2.5-81.el5_8.4.i386
Aug 10 17:33:49 Updated: 1:facter-1.6.11-1.el5.i686
Aug 10 17:33:55 Updated: glibc-2.5-81.el5_8.4.i686
Aug 10 17:33:55 Updated: nspr-4.9.1-4.el5_8.i386
Aug 10 17:33:56 Updated: nss-3.13.5-4.el5_8.i386
Aug 10 17:33:58 Updated: initscripts-8.45.42-1.el5.centos.1.i386
Aug 10 17:33:59 Updated: nss-tools-3.13.5-4.el5_8.i386
Aug 10 17:33:59 Updated: 12:dhclient-3.0.5-31.el5_8.1.i386
Aug 10 17:34:00 Updated: sudo-1.7.2p1-14.el5_8.2.i386
Aug 10 17:34:00 Updated: nscd-2.5-81.el5_8.4.i386`

After finding there was an issue, I updated a similar server and excluded the 
above packages before adding the suspect packages one by one while checking for 
the file /var/lib/puppet/state/puppetdlock, content of last_run_report.yaml and 
the server-side reports. On failure, the lockfile is created and none of the 
logs are touched until the service is stopped (service puppet stop) when the 
last_run_report.yaml and server side report is written with "message: Caught 
TERM; calling stop" and "status: failed".

This was repeated on two more servers with same results.

Each server worked until after the facter update. On the test server, I cleared 
the SSL keys, restarted the puppet service and even rebooted the whole machine. 
No problems with "puppetd --test" or with "listen=false". Initial thought that 
x86_64 systems were not affected was due to disabled alerts in our monitoring 
system from previous testing.

Removed custom fact plugin, no change.

`yum downgrade facter
<snip>
Removed:
  facter.i686 1:1.6.11-1.el5

Installed:
  facter.i686 1:1.6.10-1.el5

Complete!`

and all is well again.

Would be happy to run "strace" or other tool but I am not familiar with its use.


----------------------------------------
Bug #10418: Puppet agent hangs when listen is true and reading from /proc 
filesystem on redhat
https://projects.puppetlabs.com/issues/10418#change-69233

Author: Jo Rhett
Status: Re-opened
Priority: Normal
Assignee: Patrick Otto
Category: agent
Target version: 
Affected Puppet version: 2.6.12
Keywords: enabledisable hang select proc listen redhat
Branch: 


Mon Oct 31 23:03:31 +0000 2011 Puppet (notice): Caught TERM; calling stop

Ever since the 2.6.12 upgrade I've been seeing these reports reach us. As in, 
about a hundred of a half thou machines. Most of the time we find that 
$vardir/state/puppetdlock is in place and blocking further puppet runs, which 
requires a manual resolution.

I wrote a quick cron script to look for puppetdlock files older than one hour, 
remove them and mail me a report and I've received several dozen in the last 
few hours. Something is clearly broken in 2.6.12, we are backgrading our 
systems to 2.6.11.

No-- I have no other information than that it crosses all of our machine types, 
and we have had no significant changes in our modules in this time period.  
Many of the machines which have failed have had zero module or manifest changes 
which would apply to them.  I cannot get this to replicate on the command line.


-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

Reply via email to