On Fri, May 02, 2014 at 07:37:48AM -0700, jcbollinger wrote:
>    On Thursday, May 1, 2014 9:42:39 AM UTC-5, Christopher Wood wrote:
> 
>      (inline)
> 
>      On Wed, Apr 30, 2014 at 08:21:15AM -0700, jcbollinger wrote:
>      >    On Tuesday, April 29, 2014 10:15:35 AM UTC-5, Christopher Wood
>      wrote:
>      >
>      >      Can't hosts already stagger their agent checkin times by using
>      per-host
>      >      runinterval settings?
>      >
>      >    No.  Different agents with different runintervals will still all
>      hit the
>      >    server at nearly the same time when they are started together, and
>      they
>      >    will do so again periodically thereafter (just not every run). 
>      Moreover,
>      >    it's nasty to use a policy knob such as runinterval to address a
>      technical
>      >    issue such as avoiding a thundering herd effect.
> 
>      In theory the agent runs will intersect and kill the puppetmaster in the
>      timespan around when the lowest common denominator of all the
>      runintervals comes around.
> 
>    Yes, subject to a fuzz factor that is correlated with how expensive your
>    catalogs are to compile.  The master is more sensitive to load leveling
>    problems when catalog compilation is expensive, which is exactly the
>    condition in which balancing via a spread of runinterval is least
>    effective.

I will have to keep that in mind. So far I haven't loaded the puppetmaster 
hosts too much but that could happen.

>      In practice if this ever happens to me (hasn't so far) I will shrug and
>      say to wait for the next agent run in less than an hour. Right now my
>      runinterval defaults to (1800 + fqdn_rand(600)), implying that any LCD
>      intersections aren't so frequent yet the agents update pretty
>      frequently.
> 
>      Out of personal preference I don't make a distinction between technical
>      versus policy issues; that they are both components of the same
>      services.
> 
>      >    Puppet does have the 'splay' and 'splaylimit' configuration
>      settings as a
>      >    possible solution, however.  If you can accept some variation in
>      the
>      >    interval between one agent run and the next then those are pretty
>      >    effective, albeit non-deterministic.
> 
>      I abandoned splay use when it interfered with 'puppet kick'.
> 
>    That's a fair criticism, where applicable.  Personally, I don't much like
>    'puppet kick', and I don't configure agents to respond to it.  If I wanted
>    something along those lines then I would set up something more general,
>    such as MCollective, or something that leverages nodes' ordinary features,
>    such as a script around 'ssh -c'.  YMMV.

I don't have any responding to kick either, but I don't really have any reason 
to revisit splay use what with the open bug and all.

>      I probably wouldn't use it these days because it's not obvious how long
>      the splay will wait
> 
>    Well, yes, that's the point of splay.
> 
>     
> 
>      and I'm trying to get away from using inferences in preference to
>      literal values.
> 
>    Ok.  I have no problem with that.
> 
>    If you follow that path, however, then doesn't it lead past fixed run
>    intervals, on to scheduling each agent's runs at specific times?  That's
>    even more literal, and it's independent of most timing variables (when was
>    Puppet started, how long does each run take, etc.).

If I start running into trouble with the current setup, sure. At scale I wind 
up with interesting scheduling issues, for example that fqdn_rand doesn't seem 
to distribute things evenly (just like with round-robin load balancing), and my 
number of hosts may not fit tidily into cron, and managing schedules for a 
respectable number of hosts stops being trivial. (Those are of course functions 
of my skill level.) Consider, with fictional numbers:

for x in `seq 4000 6999`; do echo "notice(fqdn_rand(60, ${x}))"; done 
>/tmp/xx.pp
puppet apply --color=false /tmp/xx.pp | grep Scope | awk '{print $3}' | sort | 
uniq -c | sort -rn >/tmp/y
head -5 /tmp/y
tail -5 /tmp/y

There are obviously ways around all those, but they seem to create a lot of 
work for much the same result. I'm not so concerned about individual agent runs 
as I am that everything stays in sync over the course of a day.

>      I sometimes don't infer the same results as other people do.
> 
>      [1]https://groups.google.com/forum/#!topic/puppet-users/EaoiHSd4eEM
>      [2]http://projects.puppetlabs.com/issues/1100
> 
>      >     
>      >
>      >      At some point, sure, agents may not be the best path forward but
>      I don't
>      >      see when I'd reach that point.
> 
>      (agents as in puppet agent daemons/services)
>       
>      >    I'm uncertain whether by "agents" you mean running the agent as a
>      service,
>      >    or whether you mean using the agent at all (as opposed to using
>      "puppet
>      >    apply").  Garrett was not suggesting the latter; he was suggesting
>      using
>      >    cron to schedule runs of the agent in non-daemon mode.  You can
>      also
>      >    schedule runs of "puppet apply" that way, but that's a whole
>      different
>      >    ball game.
> 
>      This gets back to, how do I schedule my agent runs to avoid puppetmaster
>      service issues due to load? There are only so many cron slots in a day,
>      manually scheduling that many machines could get boring, and doing it
>      automatically could lead to the same herd issues if I get the algorithm
>      wrong.
> 
>    You can't be serious.  Why on earth would anyone consider doing it
>    manually?  If you are going to schedule Puppet runs via cron, then you use
>    fqdn_rand() or something similar to set the schedules.  The likelihood of
>    fqdn_rand() producing a problematically non-uniform distribution is
>    minute, and it is anyway equally applicable to setting run intervals.

Not manually as in me with my spreadsheet, manually as in something requiring 
more supervision and human time. Team A wants their agent runs before 03:00, 
team B wants them at 09:00, et cetera.

Automatically as in making it happen by itself, for cheap example:

list of hosts from puppetdb
arbitrary time interval (86400 seconds)
desired time between agent runs
algorithm to schedule them given the above
scheduling is recalculated when somebody adds/removes a host

This is where I start furrowing my brow. Perhaps I'm overthinking.

(I know I keep harping on all the hard work, but chasing after minutiae takes 
its toll. I hear about the Cloudflare/Whatsapp people:result ratio and get 
jealous.) 

>      The answer sounds like it might be a control host kicking off an "mco
>      puppet runall $limit" job via cron but I'm not there yet.
> 
>    That's the general direction in which I might turn for an alternative to
>    'puppet kick', but I don't like it for scheduling regular agent runs.  For
>    one thing, although it caps the load on the master, it doesn't do a good
>    job of balancing unless you tune it carefully, and keep it in tune.
> 
>     
> 
>      >    There is a lot to be said for scheduling the agent via cron.  In
>      addition
>      >    to possible applications in load leveling, it can make Puppet more
>      >    resilient.  For example, I was recently working on the provider of
>      a
>      >    custom type, and I managed to let a broken version escape to some
>      of my
>      >    systems, where it crashed the agent daemon.  I had to manually
>      restart the
>      >    daemons on those systems.  If I were launching Puppet from cron
>      then the
>      >    Puppet runs would still have failed, but the next runs, when a
>      fixed
>      >    version of the provider was available, would have gone fine without
>      any
>      >    manual intervention.
> 
>      This is where I plug how you should already be killing your own daemons
>      so that you can build service resilience via some form of watcher, and
>      doing dev->stage->prod incremental rollouts so that most of your
>      problems don't happen in production. In my case monit would have brought
>      up puppet on the affected dev hosts, and those hosts would have grabbed
>      the new provider as soon as it was available on the puppetmasters.
>      (Obviously the first time I tried this technique I learned the hard way
>      about monit supervising sshd and init supervising monit.)
> 
>    That's all good advice, but monit would not be relevant for puppet if I
>    weren't running the agent as a daemon in the first place.  Since I don't
>    want 'puppet kick', I have no reason to do so other than scheduling, which
>    cron can achieve for me just fine, yielding also resiliency benefits
>    comparable to what monit could give me (for Puppet) and other benefits.

Given the chaos monkey I've seen a puppet agent killed and the agent-run child 
still alive hours later, but I take your point.

>    John
> 
>    --
>    You received this message because you are subscribed to the Google Groups
>    "Puppet Users" group.
>    To unsubscribe from this group and stop receiving emails from it, send an
>    email to [3][email protected].
>    To view this discussion on the web visit
>    
> [4]https://groups.google.com/d/msgid/puppet-users/d8744267-58bc-44a1-b496-7d0f140ee773%40googlegroups.com.
>    For more options, visit [5]https://groups.google.com/d/optout.
> 
> References
> 
>    Visible links
>    1. https://groups.google.com/forum/#!topic/puppet-users/EaoiHSd4eEM
>    2. http://projects.puppetlabs.com/issues/1100
>    3. mailto:[email protected]
>    4. 
> https://groups.google.com/d/msgid/puppet-users/d8744267-58bc-44a1-b496-7d0f140ee773%40googlegroups.com?utm_medium=email&utm_source=footer
>    5. https://groups.google.com/d/optout

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/20140502175238.GA1720%40iniquitous.heresiarch.ca.
For more options, visit https://groups.google.com/d/optout.

Reply via email to