[Puppet - Refactor #4499] extlookup parses files multiple times

tickets Thu, 19 Aug 2010 00:53:11 -0700

Issue #4499 has been updated by Alan Barrett.

Markus Roberts wrote:
> There are tricky design issues here (thread and environment safety, for one 
> thing).

Even if the cache were private to the thread that compiles the catalog, it 
would be a huge win over not having caching at all.

> If, for example, the data is read in once per catalog and stored in a puppet 
> variable, caching wouldn't help (it would actually hurt slightly).

Could you explain the reasoning behind the idea that caching would hurt?

> Do we have example use cases where the performance impact is being felt, that 
> we might use to think about possible solutions?

Let me give my case in more detail.  I have more then 200 calls to extlookup(), 
most of which are used to fetch version numbers for packages, which are used in 
constructs like this:
<pre>
package { "FOO": ensure => extlookup("FOO_version"), [more options] }
</pre>
There are more than 200 such variables in the CSV files used by extlookup().  
Every time puppetmasterd compiles the catalog, it parses the CSV files more 
than 200 times.

According to the output from "ruby -r profile", more than 50% of the run time 
is spent in parsing CSV files over and over.   A typical "puppetmasterd 
--compile" run takes about 90 seconds, and halving this through improved 
caching would make a noticeable difference.
----------------------------------------
Refactor #4499: extlookup parses files multiple times
http://projects.puppetlabs.com/issues/4499

Author: Alan Barrett
Status: Needs design decision
Priority: Normal
Assigned to: James Turnbull
Category: functions
Target version: queued
Affected version: 0.25.5
Branch: 

According to "ruby -r profile", more than 50% of the runtime for "puppetmasterd 
--compile someclient" is in CSV#parse_body.  In a typical example, extlookup is 
called 204 times, CSV::Reader#each is called 225 times (only a little more than 
the number of calls to extlookup, because almost all variables are found in the 
first file), CSV#parse_row is called 42486 times (which is not much less than 
the product of {number of lines in the file} * {number of calls to extlookup}), 
and CSV#parse_body is called 84747 times (which is approximately double the 
number of calls to CSV#parse_body).  I

The code in extlookup.rb appears to read and parse the data files on every 
call.  It would be better if it kept a cache of {file name, variable name, raw 
value} tuples, invalidated the cache when the file timestamp changed, and 
otherwise avoided unnecessary re-reading of unchanged files.

The decision about which files to search should probably still be done on every 
call, in case extlookup_datadir or extlookup_precedence has changed.  Expansion 
of "#{variable}" embedded in the raw value should also be done on every call.

-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

[Puppet - Refactor #4499] extlookup parses files multiple times

Reply via email to