Hi
so I just created #5464 - One reason is that I still have the old thread
regarding optimizing file resource calls in mind [1].
Another issue which might be related to this feature request and which
might be addressable through this feature request, is something with
which I was recently confronted while evaluating an existing puppet
installation regarding performance issues. In short: The conclusion was
that puppet compile (and run) time could heavily be improved by
rearranging the manifests by simply tweaking the outcome of the compiled
catalog.
In a nutshell the situation was similar to the following code snippet:
define our_user($uid){
user{$name:
uid => $uid,
comment => "User managed by puppet!"
}
}
And then they had about 2000 resources of our_user managed:
our_user{
'user1':
uid => '10001';
'user2':
[...]
'user2000':
uid => '12001';
}
Actually their define structure was a bit more complex and included also
one class per user, and a lot more parameters, but which were more or
less all default. But in the end the outcome is the same and this
example should give you the picture.
So this kind of setup is quite common and at least for me it is a common
way to abstract things away by defines. I have a lot of modules which
provide an abstract interface to a stack of defines with multiple
levels, where each one might tune a resource or parameter according to
its task.
So what is the "problem" with that?
After compiling the catalog you end up getting this nested structure
also within the catalog. This means that for 2000 our_user resources you
get 2000 our_user defines and 2000 user resources. With all their
default values etc. However the instances of the "our_user" define are
actually more or less useless (except reporting, more about that later)
as they do not manage a "real" resource themself and are simply passing
parameters down to the user resource.
While benchmarking master and client runs we figured out that client and
master are both spending a lot of time while (de-)serializing the
catalog (uhh, remember #2892). Also the catalog was "quite huge" (~6MB
of YAML) and we already optimized quite everything else (latest 2.6,
passenger, serve files directly, etc.).
As I was looking at the catalog yaml it became clear to me that the
our_user define within the catalog was completely useless for the client
(only passing variables) and only makes the manifests more readable for
the sysadmin/coders. So we started getting rid off the all the defines
which were similar to my our_user example. As we had like 3 to 4 levels
of defines with a lot of parameters. We were able to reduce the size of
the catalog by factor 3 (only 2MB instead of 6MB) and more importantly
we were able to reduce compile (and run) time by factor 4. So for
example before optimization the master had about 16s per node to
compile, after optimization it was less than 4s.
If we are talking about up to 1k nodes that spend 12s less time on the
master we can already save a few cores and some memory. (And use them
for example for the dashboard :p )
So I'm questioning if it is really necessary to send down the plain
catalog to the client as we are doing it now. For sure it contains a lot
of usefull information for reporting etc. But I think still not
everything is necessary and most things could be passed differently,
especially if your not interested in that detailed reports.
For example can't we flatten the catalog? Means: sending only "real"
resources such as user, file, exec down to the client. Why do we need to
send down defines? Actually all the relations for the graph could be
mapped down to "real" resources contained within the defines, not?
The calculation I do is simple: Less "noise" in the compiled catalog
means less overhead during serialization and less things to be parsed by
the client. Especially if I'm looking for example at defines such as [3]
which is a nice interface for a complex hierarchy of defines, where each
of these defines is doing its little job. If you actually look at it,
all the defines could be put together into one huge define, which would
make the whole thing less readable (if not unreadable). However, the
size of the catalog could probably be reduced down to a fifth (or even
more) of its current size. Simply because in the end only a couple of
resources are managed, but their state is tweaked by each define
according to its task.
For sure this is not really a problem if we are talking about a bunch of
resources on a bunch of nodes. But if we are going to speak about > 10k
resources (hence in > 50k defines) on a couple of 100 nodes, this is
going to matter.
So as I said before I'm an advocat of abstracting things out in defines,
as this makes manifests more readable and manageable, as you know it
from software engineering. However, discovering these possible
perfomance improvements, which could be done by software, I think we
should have a possiblity to address these issues.
One solution is to write less readable manifests (doh!), the other
solution would be that the master would take care of such optimizations
of the catalog. Hence the mentioned feature request above.
For sure, this would introduce some more cycles on the master, but it
would probably also make the serialization process much faster as well
as the de-serialization process on the client. Which means that we end
up in faster puppet runs. (Sooooooo easy.... ;) )
And in combination with the other discussion in [1], if we would have
such an architecture to optimize catalogs before sending them down to
the client, this would give many other opportunities for possible
optimizations.
Probably the problem and the solution is not that easy as I stated now,
as well as there are probably good reasons that certain things are
currently the way they are now. However, I was still amazed how we could
improve the perfomance on that infrastructure and it raised a couple of
questions and ideas I wanted to bring up and share them here.
So tell me that I'm wrong....
~pete
[0] http://projects.puppetlabs.com/issues/5464
[1]
http://groups.google.com/group/puppet-dev/browse_thread/thread/1c8ac2c2d6fab46#2c8d8baa39d1d717
[2] http://projects.puppetlabs.com/issues/2892
[3]
http://git.puppet.immerda.ch/?p=module-webhosting.git;a=blob;f=manifests/static.pp;h=82ad8f3a956d3ace64c07c2417b0ecd39ec830ef;hb=f5ee140e01885dfc6cf6729ea0fdd4a7e726d50a
--
You received this message because you are subscribed to the Google Groups
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-dev?hl=en.