My ceil theory didn't pan out. But here's a working fix: --- puppet-3.0.1.orig/lib/puppet/daemon.rb 2012-10-20 00:57:50.000000000 -0400 +++ puppet-3.0.1/lib/puppet/daemon.rb 2012-12-21 08:27:17.000000000 -0500 @@ -164,6 +164,8 @@ next_reparse = 0 reparse_interval = Puppet[:filetimeout] + next_event = 0 + loop do now = Time.now.to_i @@ -172,7 +174,9 @@ # the math messes up or something; it should be inexpensive enough to # wake once an hour, then go back to sleep after doing nothing, if # someone only wants listen mode. - next_event = now + 60 * 60 + if now >= next_event || next_event == 0 + next_event = now + 60 * 60 + end # Handle reparsing of configuration files, if desired and required. # `reparse` will just check if the action is required, and would be @@ -194,7 +198,7 @@ # next time it is scheduled to run, just in case. In the event that # we made no change the result will be a zero second adjustment. new_run_interval = [Puppet[:runinterval], 0].max - next_agent_run += agent_run_interval - new_run_interval + next_agent_run += new_run_interval - agent_run_interval agent_run_interval = new_run_interval end
On Thursday, December 20, 2012 4:09:09 PM UTC-5, David Mesler wrote: > > I’ve added some debugging messages to run_event_loop and figured out what > was going on. I’m going to reference line numbers in > https://github.com/puppetlabs/puppet/blob/master/lib/puppet/daemon.rb to > make it easier on me to explain. I found that occasionally the “if” > statement on line 180 was failing because the value of “now” was one second > behind “next_reparse”. I believe this is because line 168 uses “to_i” when > it should use “ceil”. I’m deploying a patched version of puppet with this > change to my servers now. > > But I think a bigger issue is line 175 where next_event is set to the > current time plus one hour. That’s a pretty arbitrary and unpleasant value. > Why not set next_event to the lower vale of :runinterval or :filetimeout? > The way it is now, if line 180 fails, you’re stuck with an hour long wait > even though your :runinterval may be far less. > > Also, line 197 is backwards. It should be “next_agent_run += > new_run_interval – agent_run_interval”. > > On Tuesday, December 18, 2012 7:00:42 PM UTC-5, David Mesler wrote: >> >> I've noticed when I strace a puppet agent that has failed to run after >> its 900 second runinterval, it's blocking on a really long select call. >> Like: >> select(1, [], [], [], {1032, 350000} <unfinished ...> >> >> When that's done it finally re-reads pupet.conf and stars a catalog run. >> I have no idea where that long select call comes from. >> >> On Friday, December 14, 2012 3:15:25 PM UTC-5, David Mesler wrote: >>> >>> I've recently upgraded from 2.6.9 to 3.0.1 and have noticed an oddity. >>> Our puppet agents are configured with a runinterval of 900 and a splaylimit >>> of 450. Since upgrading I've noticed that once or twice a day our puppet >>> agents simply won't run for about an hour or so. Has anyone else >>> experienced anything like this? >>> >>> --david >>> >> -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/mcnppG9SNo4J. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.