[Puppet Users] Re: puppet catalog compilation job queue idea

jcbollinger Mon, 13 Jan 2014 08:17:40 -0800


On Sunday, January 12, 2014 7:31:03 PM UTC-6, Patrick wrote:
>
>
>
> On Friday, January 10, 2014 3:57:03 PM UTC-5, jcbollinger wrote:
>>
>>
>> The key idea here seems to be to improve the master's average response 
>> time for catalog requests by pre-compiling and caching catalogs, then 
>> serving those cached catalogs on demand.  There is a secondary idea of the 
>> master automatically keeping its cache fresh, and a tertiary idea of the 
>> master hinting to agents when it thinks their most recently-retrieved 
>> catalog may be out of date.  There is a separate, but coupled idea that the 
>> agent might keep track of when significant facts change, and thereby be 
>> better informed from its side whether its catalog is out of date.
>>
>> The key idea is also about not wasting resources. The clients waste a lot 
> of resources doing puppet runs that change nothing. The puppet master is 
> generally a box meant to compile catalogs. It's going to have the 
> resources, and so it should spend it's time doing just that so the clients 
> don't waste their time.
> You can have a hundred clients frequently doing puppet runs because they 
> can't know if the catalog has been updated (which also uses the masters's 
> resources), or you can have a few masters chewing through the catalogs and 
> notify the clients when they need to run. Which do you think is more 
> efficient? 
>  
>



Continuously recompiling catalogs on the master is not a good way to 
approach the issue of client-side resource consumption.

For one thing, the client already caches a copy of its most recently 
applied catalog.  It would be straightforward to have the client check its 
own cached catalog against the new one to see whether anything changed.  
That would be simpler to implement, and would spread out the workload more 
evenly over the whole site.

More importantly, however, a change in catalog is by no means the only 
thing that requires the agent to act.  One of its key responsibilities is 
to maintain the node's target state in the face of unwanted changes *applied 
by other processes*.  Checking whether it needs to do that for any declared 
resource is an essential component of each catalog run -- indeed, that is 
the only component in the common case that nothing does need to be 
changed.  In other words, that's usually what consumes the bulk of the 
agent's runtime.  There are only two ways to reduce that, both of which are 
largely orthogonal to catalog changes:

   1. Improve nodes' catalogs to be less burdensome, and
   2. Increase the agent's run internal.

Note in particular that (2) is not much related timeliness of obtaining 
changed catalogs.  If rapid response to catalog changes is essential then 
you already need some means to trigger catalog runs outside of or instead 
of scheduled periodic runs, and in that case the scheduled run interval 
isn't even relevant to catalog freshness.  On the other hand, if rapid 
response to catalog changes is not essential then that's not a big factor 
in choosing a run interval. 

Moreover, where catalog runtimes are large, there are usually a lot of 
gains available via improving catalogs (item 1).  That's usually the best 
solution to excessive agent runtimes.

 

> Fundamentally, this is really about catalog caching.  The specific 
>> suggestion to implement the master's part of it via a continuously cycling 
>> queue of compilation jobs is mostly an implementation detail.  Although 
>> it's hard to argue with the high-level objectives, I'm not enthusiastic 
>> about the proposed particulars.
>>
>> As far as I can determine, the main point of a *continuously-cycling*queue 
>> of compile jobs is to enable the master to detect changes in the 
>> compiled results.  That's awfully brute-force.  Supposing that manifests, 
>> data, and relevant node facts change infrequently, compilation results also 
>> will rarely change, and therefore continuously recompiling catalogs would 
>> mostly be a waste of resources.  The master could approach the problem a 
>> lot more intelligently by monitoring the local artifacts that contribute to 
>> catalog compilation -- ideally on a per-catalog basis -- to inform it when 
>> to invalidate cached catalogs.  Exported resource collections present a bit 
>> of a challenge in that regard, but one that should be surmountable.
>>
> There is no way around it. You can't just rely upon facts or exported 
> resources to change before re-compiling a catalog. Lets say you have a 
> function that generates different results based on some external variable. 
> o fact or exported resource has changed, but the catalog will be different. 
> While this may not be common, I think it would be a bad idea to build an 
> implementation that can't handle it.
>


It would be possible to keep a digest for each catalog compilation of 
everything that could possibly change the result, even function calls.  At 
some point it would cease to provide much performance advantage to do so, 
however, and before that it would probably become prohibitive to develop 
and maintain.  So I'll accept that it's not practical for the master to 
determine whether a cached catalog is stale, except by recompiling.

But how, then, is it any more reasonable to serve a cached catalog that was 
built via a continuously-cycling catalog compiler?  The master still cannot 
guarantee that it is up to date.  We have just established that 
demand-built catalogs are the only ones that we can be confident in being 
up to date as of the time of the request.

 

> However it would be trivial to change it so that instead of constantly 
> compiling catalogs, it re-compiles any catalog that is more than X minutes 
> old. This would be equivalent to the puppet agent cron job on a client.
>  
>


Roughly.  It would also compile catalogs for nodes that have been taken out 
of service, nodes with an intentionally longer run interval (or that run 
the agent itself only on demand), and probably for other nodes I haven't 
thought of that otherwise would not have catalogs compiled for them.  And 
the master *still* could not serve those catalogs with confidence that they 
were up to date.

 

>
>> Furthermore, if we accept that master-side catalog caching will be 
>> successful in the first place, then it is distinctly unclear that compiling 
>> catalogs prospectively would provide a net advantage over compiling them on 
>> demand.  Prospective compilation's only real advantage over on-demand 
>> compilation is better load leveling in the face of clumped catalog requests 
>> (but only where multiple requests in the clump cannot be served from 
>> cache).  On the other hand, prospective compilation will at least 
>> occasionally yield slower catalog service as a catalog request has to wait 
>> on other catalog compilations.  
>>
> I completely lost you. Why would it be slower? Pre-compiling catalogs 
> means the client doesn't have to wait at all. Instead of having to wait 30 
> seconds for a catalog to be compiled, it'll take less than 1 second, 
> whatever the response time is to fetch it from the cache and transfer it 
> over the network.
>


The client has to wait if its facts have changed, unless it's willing to 
accept a known-stale catalog.  It also has to wait (or should) if the 
master knows its cached catalog is stale, but the recompilation job hasn't 
yet reached the front of the queue. Even if the request gets pushed to the 
front of the queue in such cases (which doesn't necessarily work well when 
the master is hit by a bunch of requests at nearly the same time), the 
continuously-running compiler almost certainly is initially busy compiling 
some other node's catalog.  The client therefore has to wait longer than it 
otherwise would.

And of course, that all assumes that the master actually can tell whether a 
node's cached catalog is stale, which I've agreed is impractical.
 

>  
>
>> Also, prospective compilation will sometimes devote resources to 
>> compiling catalogs that ultimately go unused.
>>
> That's the goal. If resources are unused, then they're being wasted.
>  
>


If resources go to building something that goes unused then they are 
certainly wasted.  Some resources, however, are conserved for future use if 
they go unused now, or can be used for other purposes if not allocated to 
catalog compilation, or carry a monetary, environmental, or other cost to 
use that would be better avoided where possible.


John

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/a56986a8-a4bc-44ee-b154-a50710d838ab%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[Puppet Users] Re: puppet catalog compilation job queue idea

Reply via email to