Issue #10418 has been updated by Jo Rhett.
More info: not all systems have puppetdlock files. However, they always seem
to stop after a run. They even accept kicks, but do nothing with them. Here's
an example of "grep puppet /var/log/messages"
Nov 2 01:55:59 us0101acdc008 puppet-agent[143582]:
(/File[/local/tomcat/webapps/abregistrar/WEB-INF/property-configurer.xml])
Filebucketed /local/tomcat/webapps/abregistrar/WEB-INF/property-configurer.xml
to puppet with sum 83231d183f9d40f0ed44db880504a3f4
Nov 2 01:55:59 us0101acdc008 puppet-agent[143582]:
(/File[/local/tomcat/webapps/abregistrar/WEB-INF/property-configurer.xml]/content)
content changed '{md5}83231d183f9d40f0ed44db880504a3f4' to
'{md5}1a4d31b2b0df03846e51017226a7ead6'
Nov 2 01:55:59 us0101acdc008 puppet-agent[143582]:
(/Stage[main]/Webapps::Deploy/File[webinf]) Scheduling refresh of
Exec[start-tomcat]
Nov 2 01:55:59 us0101acdc008 puppet-agent[143582]:
(/Stage[main]/Webapps::Deploy/Exec[start-tomcat]/returns) executed successfully
Nov 2 01:55:59 us0101acdc008 puppet-agent[143582]:
(/Stage[main]/Webapps::Deploy/Exec[start-tomcat]) Triggered 'refresh' from 127
events
Nov 2 01:56:14 us0101acdc008 puppet-agent[143582]: Finished catalog run in
91.72 seconds
Nov 2 03:26:31 us0101acdc008 puppet-agent[90299]: triggered run
[04:23 root@us0101acdc008 ~]$
As you can see, it observed the kick request but did nothing about it. System
was bored silly in the same period:
[04:26 root@us0101acdc008 ~]$ sar
Linux 2.6.18-274.7.1.el5 (us0101acdc008.tangome.gbl) 11/02/2011
12:00:01 AM CPU %user %nice %system %iowait %steal
%idle
12:10:01 AM all 1.25 0.00 0.79 0.00 0.00
97.96
12:20:01 AM all 1.36 0.00 0.83 0.00 0.00
97.81
12:30:01 AM all 2.32 0.00 0.90 0.00 0.00
96.79
12:40:01 AM all 1.48 0.00 0.92 0.00 0.00
97.60
12:50:01 AM all 1.34 0.00 0.84 0.00 0.00
97.82
01:00:01 AM all 1.24 0.00 0.78 0.00 0.00
97.98
01:10:01 AM all 1.31 0.00 0.82 0.00 0.00
97.87
01:20:01 AM all 1.18 0.00 0.71 0.00 0.00
98.11
01:30:01 AM all 1.27 0.00 0.89 0.00 0.00
97.84
01:40:01 AM all 1.09 0.00 0.66 0.00 0.00
98.25
01:50:01 AM all 0.91 0.00 0.59 0.00 0.00
98.50
02:00:01 AM all 4.82 0.00 0.50 0.03 0.00
94.65
02:10:01 AM all 0.14 0.00 0.16 0.00 0.00
99.70
02:20:01 AM all 0.12 0.00 0.17 0.00 0.00
99.72
02:30:01 AM all 1.22 0.00 0.42 0.00 0.00
98.35
02:40:01 AM all 0.87 0.00 0.57 0.00 0.00
98.56
02:50:01 AM all 0.78 0.00 0.53 0.00 0.00
98.69
03:00:01 AM all 0.65 0.00 0.51 0.00 0.00
98.84
03:10:01 AM all 0.69 0.00 0.47 0.00 0.00
98.84
03:20:01 AM all 0.78 0.00 0.53 0.00 0.00
98.69
03:30:01 AM all 0.83 0.00 0.58 0.00 0.00
98.59
03:40:01 AM all 0.84 0.00 0.57 0.00 0.00
98.59
03:50:01 AM all 0.58 0.00 0.43 0.00 0.00
98.99
04:00:01 AM all 0.65 0.00 0.45 0.00 0.00
98.90
04:10:01 AM all 0.60 0.00 0.42 0.14 0.00
98.83
04:20:01 AM all 1.20 0.00 0.40 0.00 0.00
98.39
Average: all 1.13 0.00 0.59 0.01 0.00
98.26
----------------------------------------
Bug #10418: "Caught TERM; calling stop" with state/puppetdlock left in place
https://projects.puppetlabs.com/issues/10418
Author: Jo Rhett
Status: Unreviewed
Priority: Normal
Assignee:
Category: agent
Target version:
Affected Puppet version: 2.6.12
Keywords:
Branch:
Mon Oct 31 23:03:31 +0000 2011 Puppet (notice): Caught TERM; calling stop
Ever since the 2.6.12 upgrade I've been seeing these reports reach us. As in,
about a hundred of a half thou machines. Most of the time we find that
$vardir/state/puppetdlock is in place and blocking further puppet runs, which
requires a manual resolution.
I wrote a quick cron script to look for puppetdlock files older than one hour,
remove them and mail me a report and I've received several dozen in the last
few hours. Something is clearly broken in 2.6.12, we are backgrading our
systems to 2.6.11.
No-- I have no other information than that it crosses all of our machine types,
and we have had no significant changes in our modules in this time period.
Many of the machines which have failed have had zero module or manifest changes
which would apply to them. I cannot get this to replicate on the command line.
--
You have received this notification because you have either subscribed to it,
or are involved in it.
To change your notification preferences, please click here:
http://projects.puppetlabs.com/my/account
--
You received this message because you are subscribed to the Google Groups
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-bugs?hl=en.