On 16/06/12 20:42, Patrick Carlisle wrote:
> 
> 
> On Sat, Jun 16, 2012 at 9:11 AM, Brice Figureau
> <[email protected] <mailto:[email protected]>>
> wrote:
> 
>     On 13/06/12 23:46, Daniel Pittman wrote:
>     > On Wed, Jun 13, 2012 at 1:32 PM, Peter Meier
>     <[email protected] <mailto:[email protected]>> wrote:
>     >>
>     >>> Actually, this is a group where y'all probably *can* deliver on
>     >>> this request:
>     >>>
>     >>> If you are experiencing poor performance, it would be awesome if
>     >>> you could generate some profiling data for us.
>     >>>
>     >>> perftools.rb is the nicest profiler for Ruby I have found, and if
>     >>> y'all could install it, grab a sample of the system while it is
>     >>> being slow, and give us the data that would be awesome.
>     >>
>     >> I can do that. What would be helpful is to get a step by step
>     >> description on how to do that. Otherwise, I need more time to figure
>     >> that out and I will probably send you the right thing in the
>     first place.
>     >
>     > It is pretty simple: `gem install perftools.rb` - once that completes
>     > successfully, generate the profile:
>     >
>     > env CPUPROFILE=/tmp/puppet-agent.profile \
>     >     CPUPROFILE_REALTIME=1 \
>     >     CPUPROFILE_FREQUENCY=500 \
>     >     RUBYOPT="-r`gem which perftools | tail -1`"  =
>     >     /usr/bin/puppet ...
>     >
>     > Then you can generate a report - which needs graphviz and ps2pdf (or
>     > ghostscript) with:
>     >
>     > pprof.rb --pdf /tmp/puppet-agent.profile > puppet-agent.pdf
>     > pprof.rb --callgrind /tmp/puppet-agent.profile > puppet-agent.grind
>     >
>     > The will show the details of where wall-time was spent.  The PDF is
>     > immediately useful.  The callgrind report is useful for investigation
>     > with kcachegrind or qcachegrind, which we have here.
>     >
>     > The 'CPUPROFILE' variable sets where the output goes; that is the
>     > input for pprof.rb reporting.
> 
>     I spent a couple of hours profiling a test puppet master v2.7.16 running
>     on my mac with one of my largest host. This is on ruby 1.8.7, without
>     storeconfigs and running in cputime, not walltime. I profiled only the
>     call to P::U::Compiler#compile.
> 
>     The profiler pointed at:
>     Puppet::Util::Autoload#searchpath
>     as the most consuming single method (nearly 15% of the whole compilation
>     cputime).
>     The profiler also pointed at the require call in the
>     P::U::Autoload#module_directories.
> 
>     In this test env, compilation time is:
>     5.85s at first run (which includes parsing)
>     2.85s for subsequent runs
> 
>     When I improperly fixed the above two callsites, I was able to reduce
>     those compilation times to:
>     4.77s first run,
>     2.30s subsequent run
>     So that's about 20% improvements.
> 
>     And then I tried to understand why we were called that many time the
>     searchpath method.
>     And found out that searchpath is most of the time called from
>     P::U::Autoload#load.
> 
>     I then added some "puts" to see what kind of file we were trying to
>     autoload. That's when I noticed that we were trying to load a lot of
>     time in a row some of my manifest define as if they were ruby files.
> 
>     So I added more puts to find how that was possible, and found that the
>     sole caller was:
>     Puppet::MetaType::Manager#type
> 
>     This method is in turn called mostly by:
>     P::U::Resource#resource_type
>     which in turn is called by several part of P::U::Resource.
> 
>     Then comes the first problem, P::U::Resource#resource_type calls the
>     P::Metatype::Manager#type like this:
>        Puppet::Type.type(type.to_s.downcase.to_sym)
>     And P::Metatype::Manager#type already does this:
>        name = name.to_s.downcase.to_sym
> 
>     I assume we could really make sure to do this only once...
> 
>     So now lets have a better look to P::Metatype::Manager. This entity is
>     responsible of loading new types when needed. In our case (ie called to
>     check definitions), no ruby files can be loaded (of course), but since
>     there is no negative-caching, each time we evaluate a definition, we try
>     to load a non-existent ruby file, and at the same time we do a call to
>     searchpath with does a lot of file related things.
> 
>     By adding negative caching in P::Metatype::Manager (and removing my ugly
>     searchpath memoization), I was able to cut down my compile time of about
>     23%.
>     Here's the patch:
>     diff --git a/lib/puppet/metatype/manager.rb
>     b/lib/puppet/metatype/manager.rb
>     index 597a89f..3d7ccc1 100644
>     --- a/lib/puppet/metatype/manager.rb
>     +++ b/lib/puppet/metatype/manager.rb
>     @@ -107,9 +107,14 @@ module Manager
>       # Return a Type instance by name.
>       def type(name)
>         @types ||= {}
>     +    @notfound ||= {}
> 
>         name = name.to_s.downcase.to_sym
> 
>     +    if @notfound.include?(name)
>     +      return
>     +    end
>     +
>         if t = @types[name]
>           return t
>         else
>     @@ -117,6 +122,10 @@ module Manager
>             Puppet.warning "Loaded puppet/type/#{name} but no class was
>     created" unless @types.include? name
>           end
> 
>     +      unless @types[name]
>     +        @notfound[name] = true
>     +      end
>     +
>           return @types[name]
>         end
>       end
> 
> 
>     I also got even better (around 32% improvement in the catalog
>     compilation time), by memoizing the P::Resource#resource_type result.
>     I'm not sure there aren't any side-effects, though. I need to do more
>     research...
> 
>     Anyway, that looks like a good starting point for the next performance
>     sprint :)
> 
>     Oh, also I have the various perftools trace available if needed.
> 
> 
> This is great. My initial profiling had pointed to the same place but
> you've gotten a lot further. I'll take a look at testing this patch out
> on Monday.

There's also several other paths that I think might be worth following:

* there are several resource.type.to_s.downcase kind of call in the
compiler code, which could well be memoized at the P::Resource level.
Maybe it is time to refactor this resource type symbol in its own class?

* if you profile with walltime (and not CPU time as I did earlier), the
largest hotspot becomes the GC runs (25% of the compilation time). It
might be a deficiency of the MRI 1.8.7 GC, but it's also possible that
we're creating a lot of transient objects (without noticing). I'll
conduct more allocation profiling if that's ever possible.

Thanks,
-- 
Brice Figureau
My Blog: http://www.masterzen.fr/

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Reply via email to