[Puppet-dev] Community PR Triage for 2015-06-30

2015-06-30 Thread Josh Cooper
The PR triage for puppet/facter/hiera/puppet-server will be starting at 10:00 
AM PDT today at http://links.puppetlabs.com/pr-triage

Josh

-- 
Josh Cooper
Developer, Puppet Labs

*PuppetConf 2015 http://2015.puppetconf.com/ is coming to Portland, 
Oregon! Join us October 5-9.*
*Register now to take advantage of the Early Adopter discount 
https://www.eventbrite.com/e/puppetconf-2015-october-5-9-tickets-13115894995?discount=EarlyAdopter
 *
*—**save $349!*

-- 
You received this message because you are subscribed to the Google Groups 
Puppet Developers group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/41060ed9-e62b-4faf-a62b-2a069282c7bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[Puppet-dev] Re: Catalog Deserialization performance

2015-06-30 Thread Romain F.
I've already benchmarked and profiled Catalog's from_data_hash and 
to_data_hash methods using the benchmark framework.
Most of the time is spent in the from_data_hash (we already knew it) but 
there is no big pitfalls where Ruby loses his time.

My callgrind file shows that the top 5 (in self time) is :
- Array.flatten (55000 calls)
- Array.each (115089 calls)
- Puppet::Resource.initialize (15000 calls)
- String.=~ (65045 calls)
- Hash[]= (115084 calls)

This top 5 is taking ~30% of the total time .

As you can see, it can be dificult to optimize this. IMHO, the benchmark 
- tweak - benchmark way of optimizing is not sufficient here. I think 
the way it (de)serialize a catalog needs a deep refactor. 

Cheers,

Le mardi 30 juin 2015 04:23:42 UTC+2, henrik lindberg a écrit :

 On 2015-29-06 22:41, Trevor Vaughan wrote: 
  If you get a profiling suite together (aka, bunch of random patches) 
  could you release it? 
  

 It is not difficult actually. Look at the benchmarks in the puppet code 
 base. Many of them are suitable for profiling with a ruby profiler. 
 I don't think we have any benchmarks targeting the agent side though, so 
 the first thing to do (for someone) is to write one. 

 What is more difficult is coming up with a benchmark that does not 
 involve real/complex resources - but deserialization and up to actually 
 applying should be possible to work with in a simple way. 

 Profiling is then just running that benchmark with the ruby profiler 
 turned on and analyzing the result, make changes, run again... (repeat 
 until happy). 

 - henrik 


  I've been curious about this for quite some time but never quite got 
  around to dealing with it. 
  
  My concern is very much client side performance since the more you 
  managing a client, the less the client gets to do it's actual job. 
  
  Thanks, 
  
  Trevor 
  
  On Mon, Jun 29, 2015 at 4:35 PM, Henrik Lindberg 
  henrik@cloudsmith.com javascript: mailto:
 henrik@cloudsmith.com javascript: 
  wrote: 
  
  On 2015-29-06 16 tel:2015-29-06%2016:48, Romain F. wrote: 
  
  Hi everyone, 
  
  I try to optimize our Puppet runs by running some benchmarks and 
  patching the puppet core (if possible). But I have some 
 difficulties 
  around the catalog serialization/deserialization. 
  
  In fact, in 3.7.5 or 3.8.x, the Config Retrieval takes roughly 
  7secs and 
  only 4 secs is on the master side. Same fact in 4.2 but with 9 
  secs of 
  config retrieval and still 4 secs on the master side. 
  
  My first thoughts was Okay, time to try MsgPack. No 
 improvements. 
  
  I've instrumented a bit the code in the master branch around 
  this, and 
  I've found out that, on my 9secs of config retrieval, 3.61secs 
  is lost 
  in catalog deserialization, 2 secs is the catalog conversion.. 
  But it's 
  not the real deserialization (PSON to Hash) that takes ages, 
  it's the 
  creation of the Catalog object itself (Hash to catalog). 
 Benchmarks 
  shows that the time to deserialize MsgPack (or PSON) is 
 negligible 
  compared to the catalog deserialization time. 
  
  So here is my question : Is that a known issue ? Is there any 
  reason of 
  the regression in 4.x (Future parser creating more objects, ...) 
 ? 
  
  The parser=future setting only makes a difference when compiling the 
  catalog - the catalog itself does not contain more or different data 
  (except possibly using numbers instead of strings for some 
 attributes). 
  
  The best way to optimize this is to write a benchmark using the 
  benchmark framework and measure the time it takes to deserialize a 
  given catalog. Then run that benchmark with Ruby profiling turned 
 on. 
  
  There are quite a few things going on at the agent side in addition 
  to taking the catalog PSON and turning it into a catalog that it can 
  apply (loading types, resolving providers, etc). Make sure to 
  benchmark these separately if possible. 
  
  Regards 
  - henrik 
  
  Cheers, 
  
  -- 
  You received this message because you are subscribed to the 
 Google 
  Groups Puppet Developers group. 
  To unsubscribe from this group and stop receiving emails from 
  it, send 
  an email to puppet-dev+...@googlegroups.com javascript: 
  mailto:puppet-dev%2bunsubscr...@googlegroups.com javascript: 

  mailto:puppet-dev+unsubscr...@googlegroups.com javascript: 
  mailto:puppet-dev%2bunsubscr...@googlegroups.com javascript:. 

  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/puppet-dev/a5bf7422-6119-43ee-ba11-44001c1ce097%40googlegroups.com
  
  
 

[Puppet-dev] Re: Catalog Deserialization performance

2015-06-30 Thread Henrik Lindberg

On 2015-30-06 16:17, Romain F. wrote:

I've already benchmarked and profiled Catalog's from_data_hash and
to_data_hash methods using the benchmark framework.
Most of the time is spent in the from_data_hash (we already knew it) but
there is no big pitfalls where Ruby loses his time.

My callgrind file shows that the top 5 (in self time) is :
- Array.flatten (55000 calls)
- Array..each (115089 calls)
- Puppet::Resource.initialize (15000 calls)
- String.=~ (65045 calls)
- Hash[]= (115084 calls)

This top 5 is taking ~30% of the total time .

As you can see, it can be dificult to optimize this. IMHO, the
benchmark - tweak - benchmark way of optimizing is not sufficient
here. I think the way it (de)serialize a catalog needs a deep refactor.



There is probably lots of duplicated work going on at the levels above 
and those are causing those generic methods to light up (except 
Puppet::Resource.initialize).


There is both the deserialization process as such to optimize, but also 
the Resource implementation itself which is far from optimal.


The next thing would be to focus on Resource.initialize/from_data_hash

I think it is also relevant to establish some kind of world record - 
say serializing and deserializing a hash using MsgPack; a hash of data 
cannot be transported faster across the wire than that (unless also not 
using Ruby objects to represent the data - with a lot of extra complexity).


I mean, a hash of some complexity will always consume quite a bit of 
processing and memory to get across the wire. Is it hitting the world 
record enough?


- henrik


Cheers,

Le mardi 30 juin 2015 04:23:42 UTC+2, henrik lindberg a écrit :

On 2015-29-06 22:41, Trevor Vaughan wrote:
  If you get a profiling suite together (aka, bunch of random patches)
  could you release it?
 

It is not difficult actually. Look at the benchmarks in the puppet code
base. Many of them are suitable for profiling with a ruby profiler.
I don't think we have any benchmarks targeting the agent side
though, so
the first thing to do (for someone) is to write one.

What is more difficult is coming up with a benchmark that does not
involve real/complex resources - but deserialization and up to actually
applying should be possible to work with in a simple way.

Profiling is then just running that benchmark with the ruby profiler
turned on and analyzing the result, make changes, run again... (repeat
until happy).

- henrik


  I've been curious about this for quite some time but never quite got
  around to dealing with it.
 
  My concern is very much client side performance since the more you
  managing a client, the less the client gets to do it's actual job.
 
  Thanks,
 
  Trevor
 
  On Mon, Jun 29, 2015 at 4:35 PM, Henrik Lindberg
  henrik.@cloudsmith.com javascript:
mailto:henrik@cloudsmith.com javascript:
  wrote:
 
  On 2015-29-06 16 tel:2015-29-06%2016:48, Romain F. wrote:
 
  Hi everyone,
 
  I try to optimize our Puppet runs by running some
benchmarks and
  patching the puppet core (if possible).. But I have some
difficulties
  around the catalog serialization/deserialization.
 
  In fact, in 3.7.5 or 3.8.x, the Config Retrieval takes
roughly
  7secs and
  only 4 secs is on the master side. Same fact in 4.2 but
with 9
  secs of
  config retrieval and still 4 secs on the master side.
 
  My first thoughts was Okay, time to try MsgPack. No
improvements.
 
  I've instrumented a bit the code in the master branch around
  this, and
  I've found out that, on my 9secs of config retrieval,
3.61secs
  is lost
  in catalog deserialization, 2 secs is the catalog
conversion..
  But it's
  not the real deserialization (PSON to Hash) that takes
ages,
  it's the
  creation of the Catalog object itself (Hash to catalog).
Benchmarks
  shows that the time to deserialize MsgPack (or PSON) is
negligible
  compared to the catalog deserialization time.
 
  So here is my question : Is that a known issue ? Is there
any
  reason of
  the regression in 4.x (Future parser creating more
objects, ...) ?
 
  The parser=future setting only makes a difference when
compiling the
  catalog - the catalog itself does not contain more or
different data
  (except possibly using numbers instead of strings for some
attributes).
 
  The best way to optimize this is to write a benchmark using the
  benchmark framework and measure the time it takes to
deserialize a
  given catalog. Then run that benchmark 

[Puppet-dev] Re: Community PR Triage for 2015-06-30

2015-06-30 Thread Josh Cooper
On Tuesday, June 30, 2015 at 8:19:23 AM UTC-7, Josh Cooper wrote:

 The PR triage for puppet/facter/hiera/puppet-server will be starting at 10:00 
 AM PDT today at http://links.puppetlabs.com/pr-triage


Notes from today's PR triage are posted: 
https://github.com/puppet-community/community-triage/blob/master/core/notes/2015-06-30.md
.

We merged several PRs (including 4 for native facter on openbsd, thanks 
Jasper!) and had good discussions around filebuckets, static compiler, and 
agent-side profiling. We'll be back next week at the usual time.

Josh

-- 
Josh Cooper
Developer, Puppet Labs

*PuppetConf 2015 http://2015.puppetconf.com/ is coming to Portland, 
Oregon! Join us October 5-9.*
*Register now to take advantage of the Early Adopter discount 
https://www.eventbrite.com/e/puppetconf-2015-october-5-9-tickets-13115894995?discount=EarlyAdopter
 *
*—**save $349!* 

-- 
You received this message because you are subscribed to the Google Groups 
Puppet Developers group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/5666d15a-e36c-4d8f-90a7-fb70c0641fc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.