[Puppet-dev] Rethinking caching in Puppet

Luke Kanies Wed, 09 Apr 2008 09:37:35 -0700

Hi all,

Those who follow my blog have some inkling of this already[1], but I'm  
nearly done with refactoring how the Indirector code handles caching.   
This email is a quick description of how caching now works, gives you  
an opportunity to comment, and provides the stepping stone for later  
discussion. I've got at least one more email about catalog caching  
primed, but figured I should start with this one.


The purpose of this email is background; it's not really intended to  
spur discussion, but please ask any questions you might have.

Also, does anyone have any interest in starting some documentation on  
the indirector and caching and such?  Should I just copy this text  
into a wiki page?

First, a brief refresher on the Indirector:

0.24 introduced a new Indirector module, whose job is to provide an  
indirect interface between information stores and the classes that  
model them.  For instance, there's a Node class, and we can have Node  
data in ldap, external node sources, etc.  The Indirector module adds  
'find', 'search', and 'destroy' methods to the class, so you can do  
"node = Puppet::Node.find('mynode')", and depending on how it's  
configured, that'll look in different sources transparently.

This is obviously how we're going to provide network transparency for  
the REST support (which is nearly done, finished by Rick Bradley,  
waiting for me to merge it in).  REST will become one of those sources.

Now, how does caching play a role there?

Two ways:

First, because we're dealing with network transparency and external  
information sources, some of these sources are relatively expensive.   
We don't want to, say, run the external node command every time we  
need node information, especially if we need it often.

Second, there are cases where you need information that you can't  
collect yourself.  For instance, the server needs client facts in  
multiple situations, but it can't collect them directly from the  
client; instead, it has to wait for the client to push them up.  In  
this situation, it makes sense for the server to cache the facts, so  
that they're always available.

Another case is the client caching its catalog -- sometimes the client  
is not connected to the 'net but it still needs the catalog.

How do we decide whether to use the cache?

Generally, you're only concerned about the cache when you're seeking  
information, and the big question is always, should I use the cache,  
or hit the source directly?  In some situations (like #2 above), you  
don't really have a choice, but generally if the cache is dirty you  
can hit the source.

Previously, Puppet had some arbitrary "version", and tried to compare  
versions of the cached and fresh objects, but this was stupid because  
it's expensive (you always have to talk to the source to get the  
version of the instance in the source).

TTL

The changes I'll (hopefully) be merging into 0.24.x today introduce a  
TTL for all instances that go through the Indirector.  This TTL is  
used to calculate an expiration date for each instance.  This makes it  
essentially trivial to determine whether a cached instance is still  
valid -- is the current time later than the expiration date?

The TTL and expiration are handled entirely transparently by the  
Indirector (except configuration; see below), and it just never  
returns expired information from the cache.  This way the caching is  
completely transparent, as long as the TTLs are configured correctly.

Configuring the TTL

Here's kinda the kicker, though -- how do you make sure your TTLs are  
good?  For client facts, it's pretty easy -- the client should upload  
new data every half an hour (or whatever the runinterval is), so set  
the TTL to the runinterval and you're basically done.

It's pretty simple for catalogs, too, when caching on the client -- it  
gets a new catalog every runinterval, so the ttl of a given catalog  
should be the runinterval again.

Other classes might need a different TTL.  Or, even better, you might  
want to keep your runinterval at half an hour but only recompile once  
a day.

At some point, Puppet will likely need to expose configuration points  
for the ttl for most, if not all, of the indirected classes.  If  
you've got a custom node source, you might want a ttl of 30 seconds,  
for instance.

Configuring a Cache

Like the normal indirection, the cache is usually hard-coded into the  
system.  Each executable would normally use a different set of sources  
-- e.g., for catalogs, the server would use the compiler as the  
ultimate source, with ActiveRecord as the cache, while the client  
would use REST as the ultimate source with a YAML cache.

Plenty of classes wouldn't have any caches.

Expiring a Cache

There's currently no interface you can use to say "don't use the  
cache", either globally or individually.  This is clearly a problem,  
at least for some situations, because you want to be able to do things  
like force a recompile when testing new Puppet configurations.  This  
is mostly what my next email will be about.

That's basically it, in terms of caching and the indirector.

1 - http://www.madstop.com/programming/caching_and_rest.html

-- 
Levy's Law:
     The truth is always more interesting than your preconception of
     what it might be.
---------------------------------------------------------------------
Luke Kanies | http://reductivelabs.com | http://madstop.com


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Puppet-dev] Rethinking caching in Puppet

Reply via email to