[Puppet-dev] Re: Environment Caching - RFC

Henrik Lindberg Mon, 21 Apr 2014 16:18:07 -0700

On 2014-21-04 23:37, Andy Parker wrote:

On Mon, Apr 21, 2014 at 2:29 PM, Henrik Lindberg
<[email protected] <mailto:[email protected]>>
wrote:

Hi,
We have been looking into environment caching and have some thoughts
and ideas about how this can be done. Love to get your input on
those ideas, and your thoughts about their usefulness.

There is a google document that has the long story - it is open for
commenting. It is not required reading as the essence is outlined here.
The doc is here:

https://docs.google.com/a/__puppetlabs.com/document/d/1G-__4Z6vi6Tv5xZtzVh7aT2zNWbOxJ3BGf__Ju31pAHxS7g/edit?disco=__AAAAAGtMYOI#heading=h.__rpgaxghcfqol

<https://docs.google.com/a/puppetlabs.com/document/d/1G-4Z6vi6Tv5xZtzVh7aT2zNWbOxJ3BGfJu31pAHxS7g/edit?disco=AAAAAGtMYOI#heading=h.rpgaxghcfqol>

The current state of caching environments
---
A legacy environment caches the result or parsing manifests and
loading functions / types, and reacts to changed files. It does this
by recording the mtime of each file as it is parsed / read. Later,
if the same file would be parsed again, it will use the already
cached produced result. If the file is stale, the entire cache is
cleared and it starts from scratch.

It does not however react to added files. It also does not recognize
changes in files evaluated as a consequence of evaluating ruby logic
(i.e. if a function, type, etc. required something, that is not
recorded).

It will react to added files, but only after the filetimeout has expired
on another file that will cause it to pick up the new file. It all gets
very complicated.

The new directory based environments does not support caching. (And
now we want to address this).

The problem with caching
---
The problem with caching is that it can be quite costly to compute
and we found that different scenarios benefits from different
caching strategies.

In an environment where the ratio of modules/manifests present in
the environment vs. the number actually used per individual node is
low checking caching can be slower than starting with a clean slate
every time.

In all of the following strategies, does this also involve removing the
known_resource_types WatchedFile caching system?

Yes, that is the idea. The NONE, and REBOOT are simple. The TIMEOUToperates on the request for the environment only (does not need to look

at any files).

The ENVDIRCHANGE would only need to look at a single file (the envdirectory).


The strategies that we think are not needed (SCAN - obviously needs

to watch, and we do not want that). CONFCHANGE would need to watchconfiguration files (unclear exactly what). That is also somewhat fuzzy,and also something it seems we do not need to support if

we have the other.


    Proposed Strategies
    ---
    We think there is a core set of strategies that a user should be
    able to select. These should cover the typical usage scenarios.

    * NONE - no caching, each catalog product starts with a clean slate.
       This is the current state of directory based environments, and it
       could also be made to apply to legacy environments. This is good in
       a very dynamic environment / development or low "signal/noise" ratio.

    * REBOOT - (the opposite of NONE) - cache everything, never check for
       changes. A reboot of the  master is required for it to react to
       changes.
       This is good for a  static configuration, and where the organization
       always takes down the master for other reasons when there are
    changes.
       This strategy avoids scanning, and is thus a speed improvement for
       configurations with a large set of files.

    * TIMEOUT - cache all environments with a 'time to live' (TTL). When a
       request is made for an environment where the TTL has expired it
       starts that environment with a clean slate.
       This is a compromise - it will pick up all changes (even additions),
       but it will take one "TTL" before they are picked up (say 5 minutes;
       configurable).

    These three schemes are believed to cover the different usage
    scenarios. They all have the benefit that they do not require
    watching any files (thereby drastically reducing the number of stat
    calls).

    Strategy that is probably not needed:

    * ENVDIRCHANGE - watches the directory that represents
       the environment. Reloads if the directory itself is stale (using
       filetimeout setting to cap the number of times it checks). Thus, it
       will reaact to changes to the environment root only (which typically
       does not happen when changing content in the environment, but is
       triggered if the environments configuration file is added or
    removed).
       To pick up any other changes, the user would need to touch the
       directory.

    Strategies we think are not needed:

    * SCAN - like today where every file is watched.
    * CONFCHANGE - watch/scan all configuration files.

    Feedback ?
    ---
    Here are a couple of questions to start with...

    * What do you think of the proposed strategies?
    * If you like the scanning strategy, what use cases do you see it
    would benefit that the proposed strategies does not handle?
    * Any other ideas?
    * Any use cases you feel strongly about? Scenarios we need to
    consider...

    Regards
    - henrik



--
You received this message because you are subscribed to the Google Groups "Puppet 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/lj48tk%2448n%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.

[Puppet-dev] Re: Environment Caching - RFC

Reply via email to