Hi all,
Those who follow my blog have some inkling of this already[1], but I'm
nearly done with refactoring how the Indirector code handles caching.
This email is a quick description of how caching now works, gives you
an opportunity to comment, and provides the stepping stone for later
discussion. I've got at least one more email about catalog caching
primed, but figured I should start with this one.
The purpose of this email is background; it's not really intended to
spur discussion, but please ask any questions you might have.
Also, does anyone have any interest in starting some documentation on
the indirector and caching and such? Should I just copy this text
into a wiki page?
First, a brief refresher on the Indirector:
0.24 introduced a new Indirector module, whose job is to provide an
indirect interface between information stores and the classes that
model them. For instance, there's a Node class, and we can have Node
data in ldap, external node sources, etc. The Indirector module adds
'find', 'search', and 'destroy' methods to the class, so you can do
"node = Puppet::Node.find('mynode')", and depending on how it's
configured, that'll look in different sources transparently.
This is obviously how we're going to provide network transparency for
the REST support (which is nearly done, finished by Rick Bradley,
waiting for me to merge it in). REST will become one of those sources.
Now, how does caching play a role there?
Two ways:
First, because we're dealing with network transparency and external
information sources, some of these sources are relatively expensive.
We don't want to, say, run the external node command every time we
need node information, especially if we need it often.
Second, there are cases where you need information that you can't
collect yourself. For instance, the server needs client facts in
multiple situations, but it can't collect them directly from the
client; instead, it has to wait for the client to push them up. In
this situation, it makes sense for the server to cache the facts, so
that they're always available.
Another case is the client caching its catalog -- sometimes the client
is not connected to the 'net but it still needs the catalog.
How do we decide whether to use the cache?
Generally, you're only concerned about the cache when you're seeking
information, and the big question is always, should I use the cache,
or hit the source directly? In some situations (like #2 above), you
don't really have a choice, but generally if the cache is dirty you
can hit the source.
Previously, Puppet had some arbitrary "version", and tried to compare
versions of the cached and fresh objects, but this was stupid because
it's expensive (you always have to talk to the source to get the
version of the instance in the source).
TTL
The changes I'll (hopefully) be merging into 0.24.x today introduce a
TTL for all instances that go through the Indirector. This TTL is
used to calculate an expiration date for each instance. This makes it
essentially trivial to determine whether a cached instance is still
valid -- is the current time later than the expiration date?
The TTL and expiration are handled entirely transparently by the
Indirector (except configuration; see below), and it just never
returns expired information from the cache. This way the caching is
completely transparent, as long as the TTLs are configured correctly.
Configuring the TTL
Here's kinda the kicker, though -- how do you make sure your TTLs are
good? For client facts, it's pretty easy -- the client should upload
new data every half an hour (or whatever the runinterval is), so set
the TTL to the runinterval and you're basically done.
It's pretty simple for catalogs, too, when caching on the client -- it
gets a new catalog every runinterval, so the ttl of a given catalog
should be the runinterval again.
Other classes might need a different TTL. Or, even better, you might
want to keep your runinterval at half an hour but only recompile once
a day.
At some point, Puppet will likely need to expose configuration points
for the ttl for most, if not all, of the indirected classes. If
you've got a custom node source, you might want a ttl of 30 seconds,
for instance.
Configuring a Cache
Like the normal indirection, the cache is usually hard-coded into the
system. Each executable would normally use a different set of sources
-- e.g., for catalogs, the server would use the compiler as the
ultimate source, with ActiveRecord as the cache, while the client
would use REST as the ultimate source with a YAML cache.
Plenty of classes wouldn't have any caches.
Expiring a Cache
There's currently no interface you can use to say "don't use the
cache", either globally or individually. This is clearly a problem,
at least for some situations, because you want to be able to do things
like force a recompile when testing new Puppet configurations. This
is mostly what my next email will be about.
That's basically it, in terms of caching and the indirector.
1 - http://www.madstop.com/programming/caching_and_rest.html
--
Levy's Law:
The truth is always more interesting than your preconception of
what it might be.
---------------------------------------------------------------------
Luke Kanies | http://reductivelabs.com | http://madstop.com
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Puppet Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---