Re: [Puppet-dev] Environment Caching - RFC

Andy Parker Mon, 21 Apr 2014 14:38:14 -0700

On Mon, Apr 21, 2014 at 2:29 PM, Henrik Lindberg <
[email protected]> wrote:


> Hi,
> We have been looking into environment caching and have some thoughts and
> ideas about how this can be done. Love to get your input on those ideas,
> and your thoughts about their usefulness.
>
> There is a google document that has the long story - it is open for
> commenting. It is not required reading as the essence is outlined here.
> The doc is here: https://docs.google.com/a/puppetlabs.com/document/d/1G-
> 4Z6vi6Tv5xZtzVh7aT2zNWbOxJ3BGfJu31pAHxS7g/edit?disco=
> AAAAAGtMYOI#heading=h.rpgaxghcfqol
>
> The current state of caching environments
> ---
> A legacy environment caches the result or parsing manifests and loading
> functions / types, and reacts to changed files. It does this by recording
> the mtime of each file as it is parsed / read. Later, if the same file
> would be parsed again, it will use the already cached produced result. If
> the file is stale, the entire cache is cleared and it starts from scratch.
>
> It does not however react to added files. It also does not recognize
> changes in files evaluated as a consequence of evaluating ruby logic (i.e.
> if a function, type, etc. required something, that is not recorded).
>
>
It will react to added files, but only after the filetimeout has expired on
another file that will cause it to pick up the new file. It all gets very
complicated.


> The new directory based environments does not support caching. (And now we
> want to address this).
>
> The problem with caching
> ---
> The problem with caching is that it can be quite costly to compute and we
> found that different scenarios benefits from different caching strategies.
>
> In an environment where the ratio of modules/manifests present in the
> environment vs. the number actually used per individual node is low
> checking caching can be slower than starting with a clean slate every time.
>
>
In all of the following strategies, does this also involve removing the
known_resource_types WatchedFile caching system?


> Proposed Strategies
> ---
> We think there is a core set of strategies that a user should be able to
> select. These should cover the typical usage scenarios.
>
> * NONE - no caching, each catalog product starts with a clean slate.
>   This is the current state of directory based environments, and it
>   could also be made to apply to legacy environments. This is good in
>   a very dynamic environment / development or low "signal/noise" ratio.
>
> * REBOOT - (the opposite of NONE) - cache everything, never check for
>   changes. A reboot of the  master is required for it to react to
>   changes.
>   This is good for a  static configuration, and where the organization
>   always takes down the master for other reasons when there are changes.
>   This strategy avoids scanning, and is thus a speed improvement for
>   configurations with a large set of files.
>
> * TIMEOUT - cache all environments with a 'time to live' (TTL). When a
>   request is made for an environment where the TTL has expired it
>   starts that environment with a clean slate.
>   This is a compromise - it will pick up all changes (even additions),
>   but it will take one "TTL" before they are picked up (say 5 minutes;
>   configurable).
>
> These three schemes are believed to cover the different usage scenarios.
> They all have the benefit that they do not require watching any files
> (thereby drastically reducing the number of stat calls).
>
> Strategy that is probably not needed:
>
> * ENVDIRCHANGE - watches the directory that represents
>   the environment. Reloads if the directory itself is stale (using
>   filetimeout setting to cap the number of times it checks). Thus, it
>   will reaact to changes to the environment root only (which typically
>   does not happen when changing content in the environment, but is
>   triggered if the environments configuration file is added or removed).
>   To pick up any other changes, the user would need to touch the
>   directory.
>
> Strategies we think are not needed:
>
> * SCAN - like today where every file is watched.
> * CONFCHANGE - watch/scan all configuration files.
>
> Feedback ?
> ---
> Here are a couple of questions to start with...
>
> * What do you think of the proposed strategies?
> * If you like the scanning strategy, what use cases do you see it would
> benefit that the proposed strategies does not handle?
> * Any other ideas?
> * Any use cases you feel strongly about? Scenarios we need to consider...
>
> Regards
> - henrik
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/puppet-dev/lj42jj%24b59%241%40ger.gmane.org.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Andrew Parker
[email protected]
Freenode: zaphod42
Twitter: @aparker42
Software Developer

*Join us at PuppetConf 2014 <http://www.puppetconf.com/>, September
22-24 in San Francisco*
*Register by May 30th to take advantage of the Early Adopter discount
<http://links.puppetlabs.com/puppetconf-early-adopter> **—**save $349!*

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/CANhgQXsahKHFO-7nWM%3Drn1fQR3KE9kzy8o4C2Tb4-0JsPsjrAA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet-dev] Environment Caching - RFC

Reply via email to