[Puppet-dev] Re: Rethinking caching in Puppet

Peter Hoeg Wed, 09 Apr 2008 10:05:32 -0700

Luke, I'm not trying to pretend that I know anything concerning the
fine art of cache theory, so this is purely based on my layman's
interpretation and you should therefore take it with a truckload of
salt.


But before we enter the discussion of what (and how) caching should
take place, wouldn't it make sense to see where the current
bottlenecks are and then seeing how those bottleneck could addressed?

Based on my experience (admittedly running on fairly low-end
hardware), the server spends a fairly large amount of time creating
the catalog. So would it make sense to simply do the catalog in
parallel by a different thread/daemon? One example would be:

a) I know the clients connect every [runinterval])

b) [runinterval - 5 minutes] minutes after the last connection run,
the process starts compiling the manifest for the client that will
connect in 5 minutes.

c) Client delivered facts are then combined with the pre-compiled main
catalogue when the connection occurs and downloaded.

I know I'm completely ignoring the issue of how to force a compilation
on demand in a development scenario and the fact that I don't know how
the time spent compiling the catalogue is spent.

Does it make sense?

/peter

>  On 09/04/2008, Luke Kanies <[EMAIL PROTECTED]> wrote:
>  >
>  >  Hi all,
>  >
>  >  Those who follow my blog have some inkling of this already[1], but I'm
>  >  nearly done with refactoring how the Indirector code handles caching.
>  >  This email is a quick description of how caching now works, gives you
>  >  an opportunity to comment, and provides the stepping stone for later
>  >  discussion. I've got at least one more email about catalog caching
>  >  primed, but figured I should start with this one.
>  >
>  >  The purpose of this email is background; it's not really intended to
>  >  spur discussion, but please ask any questions you might have.
>  >
>  >  Also, does anyone have any interest in starting some documentation on
>  >  the indirector and caching and such?  Should I just copy this text
>  >  into a wiki page?
>  >
>  >  First, a brief refresher on the Indirector:
>  >
>  >  0.24 introduced a new Indirector module, whose job is to provide an
>  >  indirect interface between information stores and the classes that
>  >  model them.  For instance, there's a Node class, and we can have Node
>  >  data in ldap, external node sources, etc.  The Indirector module adds
>  >  'find', 'search', and 'destroy' methods to the class, so you can do
>  >  "node = Puppet::Node.find('mynode')", and depending on how it's
>  >  configured, that'll look in different sources transparently.
>  >
>  >  This is obviously how we're going to provide network transparency for
>  >  the REST support (which is nearly done, finished by Rick Bradley,
>  >  waiting for me to merge it in).  REST will become one of those sources.
>  >
>  >  Now, how does caching play a role there?
>  >
>  >  Two ways:
>  >
>  >  First, because we're dealing with network transparency and external
>  >  information sources, some of these sources are relatively expensive.
>  >  We don't want to, say, run the external node command every time we
>  >  need node information, especially if we need it often.
>  >
>  >  Second, there are cases where you need information that you can't
>  >  collect yourself.  For instance, the server needs client facts in
>  >  multiple situations, but it can't collect them directly from the
>  >  client; instead, it has to wait for the client to push them up.  In
>  >  this situation, it makes sense for the server to cache the facts, so
>  >  that they're always available.
>  >
>  >  Another case is the client caching its catalog -- sometimes the client
>  >  is not connected to the 'net but it still needs the catalog.
>  >
>  >  How do we decide whether to use the cache?
>  >
>  >  Generally, you're only concerned about the cache when you're seeking
>  >  information, and the big question is always, should I use the cache,
>  >  or hit the source directly?  In some situations (like #2 above), you
>  >  don't really have a choice, but generally if the cache is dirty you
>  >  can hit the source.
>  >
>  >  Previously, Puppet had some arbitrary "version", and tried to compare
>  >  versions of the cached and fresh objects, but this was stupid because
>  >  it's expensive (you always have to talk to the source to get the
>  >  version of the instance in the source).
>  >
>  >  TTL
>  >
>  >  The changes I'll (hopefully) be merging into 0.24.x today introduce a
>  >  TTL for all instances that go through the Indirector.  This TTL is
>  >  used to calculate an expiration date for each instance.  This makes it
>  >  essentially trivial to determine whether a cached instance is still
>  >  valid -- is the current time later than the expiration date?
>  >
>  >  The TTL and expiration are handled entirely transparently by the
>  >  Indirector (except configuration; see below), and it just never
>  >  returns expired information from the cache.  This way the caching is
>  >  completely transparent, as long as the TTLs are configured correctly.
>  >
>  >  Configuring the TTL
>  >
>  >  Here's kinda the kicker, though -- how do you make sure your TTLs are
>  >  good?  For client facts, it's pretty easy -- the client should upload
>  >  new data every half an hour (or whatever the runinterval is), so set
>  >  the TTL to the runinterval and you're basically done.
>  >
>  >  It's pretty simple for catalogs, too, when caching on the client -- it
>  >  gets a new catalog every runinterval, so the ttl of a given catalog
>  >  should be the runinterval again.
>  >
>  >  Other classes might need a different TTL.  Or, even better, you might
>  >  want to keep your runinterval at half an hour but only recompile once
>  >  a day.
>  >
>  >  At some point, Puppet will likely need to expose configuration points
>  >  for the ttl for most, if not all, of the indirected classes.  If
>  >  you've got a custom node source, you might want a ttl of 30 seconds,
>  >  for instance.
>  >
>  >  Configuring a Cache
>  >
>  >  Like the normal indirection, the cache is usually hard-coded into the
>  >  system.  Each executable would normally use a different set of sources
>  >  -- e.g., for catalogs, the server would use the compiler as the
>  >  ultimate source, with ActiveRecord as the cache, while the client
>  >  would use REST as the ultimate source with a YAML cache.
>  >
>  >  Plenty of classes wouldn't have any caches.
>  >
>  >  Expiring a Cache
>  >
>  >  There's currently no interface you can use to say "don't use the
>  >  cache", either globally or individually.  This is clearly a problem,
>  >  at least for some situations, because you want to be able to do things
>  >  like force a recompile when testing new Puppet configurations.  This
>  >  is mostly what my next email will be about.
>  >
>  >  That's basically it, in terms of caching and the indirector.
>  >
>  >  1 - http://www.madstop.com/programming/caching_and_rest.html
>  >
>  >  --
>  >  Levy's Law:
>  >      The truth is always more interesting than your preconception of
>  >      what it might be.
>  >  ---------------------------------------------------------------------
>  >  Luke Kanies | http://reductivelabs.com | http://madstop.com
>  >
>  >
>  >  >  >
>  >
>
>
>
> --
>
> /peter
>


-- 
/peter

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Puppet-dev] Re: Rethinking caching in Puppet

Reply via email to