Issue #16315 has been updated by Andreas Unterkircher.
> Hallo Andreas - I have not heard of this happening before. Can you post this > report to puppet-users and reference the bug in case other users have seen > this? Good tip! I will try. > One thing I don't understand, if it is your agents, why do they have a > /var/log/puppet/http.log logrotate , and a reload of puppetmaster afterwards > , configured on them at all? Only if puppetmaster is installed - what is an extra package on Debian (probably related #12591). > It almost seems like the name resolution is failing after the reload, like it > is trying to connect to a server name that times out. I will strace now one of them. Maybe it gives a hint what the agents are doing when they stumble. > When you remove this logrotate entry on the agents, they run normally through > the night? Exactly. Then they keep on running. ---------------------------------------- Bug #16315: agents stop by execution-expired after logrotate having reloading them https://projects.puppetlabs.com/issues/16315#change-70798 Author: Andreas Unterkircher Status: Needs More Information Priority: Normal Assignee: Andreas Unterkircher Category: Target version: Affected Puppet version: Keywords: Branch: Having about 350 agents active in our network, now for some weeks all of them are suffering the same problem after being reloaded by logrotate on Sunday morning. Agents and master are running version 2.7.18 on Debian Squeeze (ruby 1.8.7). Our agent config is <pre> [main] logdir=/var/log/puppet vardir=/var/lib/puppet rundir=/var/run/puppet [agent] server=puppet.example.com storeconfigs=false listen=false report=false splay=true runinterval = 3600 syslogfacility = local3 masterport=8140 </pre> As so many of our agents were affected I first thought it is related to our master. I have reconfigured it from Webrick (which was running very well at that time) to Passenger. But the issue remained the same. What happens is that on Sunday 6:25 logrotate gets activated by cron. Logrotate then rotates /var/log/puppet/http.log according to /etc/logrotate.d/puppet. <pre> /var/log/puppet/*log { missingok create 0644 puppet puppet compress rotate 4 postrotate [ -e /etc/init.d/puppetmaster ] && /etc/init.d/puppetmaster reload >/dev/null 2>&1 || true [ -e /etc/init.d/puppet ] && /etc/init.d/puppet reload > /dev/null 2>&1 || true endscript } </pre> After being reloaded, all agents are successfully executing one (and only one) more run. After that all of them start to log execution-expired messages and stop working. In syslog it looks like this. <pre> >>>> after being reloaded at 6:25, this is the first run Sep 9 06:38:47 server puppet-agent[3316]: (/Stage[main]/Nagios-common/Modulefile[/var/run/nagios.tick]/File[/var/run/nagios.tick]/content) content changed '{md5}611d0e76b249daa2ba57118ccd63bc4b' to '{md5}0e11de68be51c68e1d3d2f947479b756' Sep 9 06:39:00 server puppet-agent[3316]: Finished catalog run in 15.11 seconds >>>> starting from here, only execution-expired's Sep 9 07:27:18 server puppet-agent[3316]: Could not retrieve catalog from remote server: execution expired Sep 9 07:27:19 server puppet-agent[3316]: Using cached catalog Sep 9 08:27:18 server puppet-agent[3316]: Could not retrieve catalog from remote server: execution expired Sep 9 08:27:19 server puppet-agent[3316]: Using cached catalog Sep 9 08:27:19 server puppet-agent[3316]: Could not retrieve catalog; skipping run Sep 9 09:27:22 server puppet-agent[3316]: Could not retrieve catalog from remote server: execution expired Sep 9 09:27:23 server puppet-agent[3316]: Using cached catalog Sep 9 09:27:23 server puppet-agent[3316]: Could not retrieve catalog; skipping run Sep 9 10:27:28 server puppet-agent[3316]: Could not retrieve catalog from remote server: execution expired Sep 9 10:27:28 server puppet-agent[3316]: Using cached catalog </pre> The same picture on all of our other nodes. I can trigger a successful manual run on those machine at any time. So I don't think that it is caused by the master. I have tried already changing the "reload" in logrotate to "restart". But the issue remains the same. Even keeping the agent running in foreground and having debug & trace enabled doesn't give me a hint. It just starts logging those execution-expired messages and is even not attempting to connect to the master (verified by tcpdump). It's even hard to reproduce. When I'm sending a manual SIGHUP to a running agent it doesn't trigger this behaviour. It seems that only if the agent was running for some time (days) and then being reloaded/restart by logrotate triggers our problem. Does anyone have a hint how I can further try to track down this issue? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://projects.puppetlabs.com/my/account -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-bugs?hl=en.
