On Fri, May 02, 2014 at 07:37:48AM -0700, jcbollinger wrote:
> On Thursday, May 1, 2014 9:42:39 AM UTC-5, Christopher Wood wrote:
>
> (inline)
>
> On Wed, Apr 30, 2014 at 08:21:15AM -0700, jcbollinger wrote:
> > On Tuesday, April 29, 2014 10:15:35 AM UTC-5, Christopher Wood
> wrote:
> >
> > Can't hosts already stagger their agent checkin times by using
> per-host
> > runinterval settings?
> >
> > No. Different agents with different runintervals will still all
> hit the
> > server at nearly the same time when they are started together, and
> they
> > will do so again periodically thereafter (just not every run).
> Moreover,
> > it's nasty to use a policy knob such as runinterval to address a
> technical
> > issue such as avoiding a thundering herd effect.
>
> In theory the agent runs will intersect and kill the puppetmaster in the
> timespan around when the lowest common denominator of all the
> runintervals comes around.
>
> Yes, subject to a fuzz factor that is correlated with how expensive your
> catalogs are to compile. The master is more sensitive to load leveling
> problems when catalog compilation is expensive, which is exactly the
> condition in which balancing via a spread of runinterval is least
> effective.
I will have to keep that in mind. So far I haven't loaded the puppetmaster
hosts too much but that could happen.
> In practice if this ever happens to me (hasn't so far) I will shrug and
> say to wait for the next agent run in less than an hour. Right now my
> runinterval defaults to (1800 + fqdn_rand(600)), implying that any LCD
> intersections aren't so frequent yet the agents update pretty
> frequently.
>
> Out of personal preference I don't make a distinction between technical
> versus policy issues; that they are both components of the same
> services.
>
> > Puppet does have the 'splay' and 'splaylimit' configuration
> settings as a
> > possible solution, however. If you can accept some variation in
> the
> > interval between one agent run and the next then those are pretty
> > effective, albeit non-deterministic.
>
> I abandoned splay use when it interfered with 'puppet kick'.
>
> That's a fair criticism, where applicable. Personally, I don't much like
> 'puppet kick', and I don't configure agents to respond to it. If I wanted
> something along those lines then I would set up something more general,
> such as MCollective, or something that leverages nodes' ordinary features,
> such as a script around 'ssh -c'. YMMV.
I don't have any responding to kick either, but I don't really have any reason
to revisit splay use what with the open bug and all.
> I probably wouldn't use it these days because it's not obvious how long
> the splay will wait
>
> Well, yes, that's the point of splay.
>
>
>
> and I'm trying to get away from using inferences in preference to
> literal values.
>
> Ok. I have no problem with that.
>
> If you follow that path, however, then doesn't it lead past fixed run
> intervals, on to scheduling each agent's runs at specific times? That's
> even more literal, and it's independent of most timing variables (when was
> Puppet started, how long does each run take, etc.).
If I start running into trouble with the current setup, sure. At scale I wind
up with interesting scheduling issues, for example that fqdn_rand doesn't seem
to distribute things evenly (just like with round-robin load balancing), and my
number of hosts may not fit tidily into cron, and managing schedules for a
respectable number of hosts stops being trivial. (Those are of course functions
of my skill level.) Consider, with fictional numbers:
for x in `seq 4000 6999`; do echo "notice(fqdn_rand(60, ${x}))"; done
>/tmp/xx.pp
puppet apply --color=false /tmp/xx.pp | grep Scope | awk '{print $3}' | sort |
uniq -c | sort -rn >/tmp/y
head -5 /tmp/y
tail -5 /tmp/y
There are obviously ways around all those, but they seem to create a lot of
work for much the same result. I'm not so concerned about individual agent runs
as I am that everything stays in sync over the course of a day.
> I sometimes don't infer the same results as other people do.
>
> [1]https://groups.google.com/forum/#!topic/puppet-users/EaoiHSd4eEM
> [2]http://projects.puppetlabs.com/issues/1100
>
> >
> >
> > At some point, sure, agents may not be the best path forward but
> I don't
> > see when I'd reach that point.
>
> (agents as in puppet agent daemons/services)
>
> > I'm uncertain whether by "agents" you mean running the agent as a
> service,
> > or whether you mean using the agent at all (as opposed to using
> "puppet
> > apply"). Garrett was not suggesting the latter; he was suggesting
> using
> > cron to schedule runs of the agent in non-daemon mode. You can
> also
> > schedule runs of "puppet apply" that way, but that's a whole
> different
> > ball game.
>
> This gets back to, how do I schedule my agent runs to avoid puppetmaster
> service issues due to load? There are only so many cron slots in a day,
> manually scheduling that many machines could get boring, and doing it
> automatically could lead to the same herd issues if I get the algorithm
> wrong.
>
> You can't be serious. Why on earth would anyone consider doing it
> manually? If you are going to schedule Puppet runs via cron, then you use
> fqdn_rand() or something similar to set the schedules. The likelihood of
> fqdn_rand() producing a problematically non-uniform distribution is
> minute, and it is anyway equally applicable to setting run intervals.
Not manually as in me with my spreadsheet, manually as in something requiring
more supervision and human time. Team A wants their agent runs before 03:00,
team B wants them at 09:00, et cetera.
Automatically as in making it happen by itself, for cheap example:
list of hosts from puppetdb
arbitrary time interval (86400 seconds)
desired time between agent runs
algorithm to schedule them given the above
scheduling is recalculated when somebody adds/removes a host
This is where I start furrowing my brow. Perhaps I'm overthinking.
(I know I keep harping on all the hard work, but chasing after minutiae takes
its toll. I hear about the Cloudflare/Whatsapp people:result ratio and get
jealous.)
> The answer sounds like it might be a control host kicking off an "mco
> puppet runall $limit" job via cron but I'm not there yet.
>
> That's the general direction in which I might turn for an alternative to
> 'puppet kick', but I don't like it for scheduling regular agent runs. For
> one thing, although it caps the load on the master, it doesn't do a good
> job of balancing unless you tune it carefully, and keep it in tune.
>
>
>
> > There is a lot to be said for scheduling the agent via cron. In
> addition
> > to possible applications in load leveling, it can make Puppet more
> > resilient. For example, I was recently working on the provider of
> a
> > custom type, and I managed to let a broken version escape to some
> of my
> > systems, where it crashed the agent daemon. I had to manually
> restart the
> > daemons on those systems. If I were launching Puppet from cron
> then the
> > Puppet runs would still have failed, but the next runs, when a
> fixed
> > version of the provider was available, would have gone fine without
> any
> > manual intervention.
>
> This is where I plug how you should already be killing your own daemons
> so that you can build service resilience via some form of watcher, and
> doing dev->stage->prod incremental rollouts so that most of your
> problems don't happen in production. In my case monit would have brought
> up puppet on the affected dev hosts, and those hosts would have grabbed
> the new provider as soon as it was available on the puppetmasters.
> (Obviously the first time I tried this technique I learned the hard way
> about monit supervising sshd and init supervising monit.)
>
> That's all good advice, but monit would not be relevant for puppet if I
> weren't running the agent as a daemon in the first place. Since I don't
> want 'puppet kick', I have no reason to do so other than scheduling, which
> cron can achieve for me just fine, yielding also resiliency benefits
> comparable to what monit could give me (for Puppet) and other benefits.
Given the chaos monkey I've seen a puppet agent killed and the agent-run child
still alive hours later, but I take your point.
> John
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [3][email protected].
> To view this discussion on the web visit
>
> [4]https://groups.google.com/d/msgid/puppet-users/d8744267-58bc-44a1-b496-7d0f140ee773%40googlegroups.com.
> For more options, visit [5]https://groups.google.com/d/optout.
>
> References
>
> Visible links
> 1. https://groups.google.com/forum/#!topic/puppet-users/EaoiHSd4eEM
> 2. http://projects.puppetlabs.com/issues/1100
> 3. mailto:[email protected]
> 4.
> https://groups.google.com/d/msgid/puppet-users/d8744267-58bc-44a1-b496-7d0f140ee773%40googlegroups.com?utm_medium=email&utm_source=footer
> 5. https://groups.google.com/d/optout
--
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-users/20140502175238.GA1720%40iniquitous.heresiarch.ca.
For more options, visit https://groups.google.com/d/optout.