[Puppet-dev] Community PR Triage for 2015-06-30
The PR triage for puppet/facter/hiera/puppet-server will be starting at 10:00 AM PDT today at http://links.puppetlabs.com/pr-triage Josh -- Josh Cooper Developer, Puppet Labs *PuppetConf 2015 http://2015.puppetconf.com/ is coming to Portland, Oregon! Join us October 5-9.* *Register now to take advantage of the Early Adopter discount https://www.eventbrite.com/e/puppetconf-2015-october-5-9-tickets-13115894995?discount=EarlyAdopter * *—**save $349!* -- You received this message because you are subscribed to the Google Groups Puppet Developers group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/41060ed9-e62b-4faf-a62b-2a069282c7bc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[Puppet-dev] Re: Catalog Deserialization performance
I've already benchmarked and profiled Catalog's from_data_hash and to_data_hash methods using the benchmark framework. Most of the time is spent in the from_data_hash (we already knew it) but there is no big pitfalls where Ruby loses his time. My callgrind file shows that the top 5 (in self time) is : - Array.flatten (55000 calls) - Array.each (115089 calls) - Puppet::Resource.initialize (15000 calls) - String.=~ (65045 calls) - Hash[]= (115084 calls) This top 5 is taking ~30% of the total time . As you can see, it can be dificult to optimize this. IMHO, the benchmark - tweak - benchmark way of optimizing is not sufficient here. I think the way it (de)serialize a catalog needs a deep refactor. Cheers, Le mardi 30 juin 2015 04:23:42 UTC+2, henrik lindberg a écrit : On 2015-29-06 22:41, Trevor Vaughan wrote: If you get a profiling suite together (aka, bunch of random patches) could you release it? It is not difficult actually. Look at the benchmarks in the puppet code base. Many of them are suitable for profiling with a ruby profiler. I don't think we have any benchmarks targeting the agent side though, so the first thing to do (for someone) is to write one. What is more difficult is coming up with a benchmark that does not involve real/complex resources - but deserialization and up to actually applying should be possible to work with in a simple way. Profiling is then just running that benchmark with the ruby profiler turned on and analyzing the result, make changes, run again... (repeat until happy). - henrik I've been curious about this for quite some time but never quite got around to dealing with it. My concern is very much client side performance since the more you managing a client, the less the client gets to do it's actual job. Thanks, Trevor On Mon, Jun 29, 2015 at 4:35 PM, Henrik Lindberg henrik@cloudsmith.com javascript: mailto: henrik@cloudsmith.com javascript: wrote: On 2015-29-06 16 tel:2015-29-06%2016:48, Romain F. wrote: Hi everyone, I try to optimize our Puppet runs by running some benchmarks and patching the puppet core (if possible). But I have some difficulties around the catalog serialization/deserialization. In fact, in 3.7.5 or 3.8.x, the Config Retrieval takes roughly 7secs and only 4 secs is on the master side. Same fact in 4.2 but with 9 secs of config retrieval and still 4 secs on the master side. My first thoughts was Okay, time to try MsgPack. No improvements. I've instrumented a bit the code in the master branch around this, and I've found out that, on my 9secs of config retrieval, 3.61secs is lost in catalog deserialization, 2 secs is the catalog conversion.. But it's not the real deserialization (PSON to Hash) that takes ages, it's the creation of the Catalog object itself (Hash to catalog). Benchmarks shows that the time to deserialize MsgPack (or PSON) is negligible compared to the catalog deserialization time. So here is my question : Is that a known issue ? Is there any reason of the regression in 4.x (Future parser creating more objects, ...) ? The parser=future setting only makes a difference when compiling the catalog - the catalog itself does not contain more or different data (except possibly using numbers instead of strings for some attributes). The best way to optimize this is to write a benchmark using the benchmark framework and measure the time it takes to deserialize a given catalog. Then run that benchmark with Ruby profiling turned on. There are quite a few things going on at the agent side in addition to taking the catalog PSON and turning it into a catalog that it can apply (loading types, resolving providers, etc). Make sure to benchmark these separately if possible. Regards - henrik Cheers, -- You received this message because you are subscribed to the Google Groups Puppet Developers group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com javascript: mailto:puppet-dev%2bunsubscr...@googlegroups.com javascript: mailto:puppet-dev+unsubscr...@googlegroups.com javascript: mailto:puppet-dev%2bunsubscr...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/a5bf7422-6119-43ee-ba11-44001c1ce097%40googlegroups.com
[Puppet-dev] Re: Catalog Deserialization performance
On 2015-30-06 16:17, Romain F. wrote: I've already benchmarked and profiled Catalog's from_data_hash and to_data_hash methods using the benchmark framework. Most of the time is spent in the from_data_hash (we already knew it) but there is no big pitfalls where Ruby loses his time. My callgrind file shows that the top 5 (in self time) is : - Array.flatten (55000 calls) - Array..each (115089 calls) - Puppet::Resource.initialize (15000 calls) - String.=~ (65045 calls) - Hash[]= (115084 calls) This top 5 is taking ~30% of the total time . As you can see, it can be dificult to optimize this. IMHO, the benchmark - tweak - benchmark way of optimizing is not sufficient here. I think the way it (de)serialize a catalog needs a deep refactor. There is probably lots of duplicated work going on at the levels above and those are causing those generic methods to light up (except Puppet::Resource.initialize). There is both the deserialization process as such to optimize, but also the Resource implementation itself which is far from optimal. The next thing would be to focus on Resource.initialize/from_data_hash I think it is also relevant to establish some kind of world record - say serializing and deserializing a hash using MsgPack; a hash of data cannot be transported faster across the wire than that (unless also not using Ruby objects to represent the data - with a lot of extra complexity). I mean, a hash of some complexity will always consume quite a bit of processing and memory to get across the wire. Is it hitting the world record enough? - henrik Cheers, Le mardi 30 juin 2015 04:23:42 UTC+2, henrik lindberg a écrit : On 2015-29-06 22:41, Trevor Vaughan wrote: If you get a profiling suite together (aka, bunch of random patches) could you release it? It is not difficult actually. Look at the benchmarks in the puppet code base. Many of them are suitable for profiling with a ruby profiler. I don't think we have any benchmarks targeting the agent side though, so the first thing to do (for someone) is to write one. What is more difficult is coming up with a benchmark that does not involve real/complex resources - but deserialization and up to actually applying should be possible to work with in a simple way. Profiling is then just running that benchmark with the ruby profiler turned on and analyzing the result, make changes, run again... (repeat until happy). - henrik I've been curious about this for quite some time but never quite got around to dealing with it. My concern is very much client side performance since the more you managing a client, the less the client gets to do it's actual job. Thanks, Trevor On Mon, Jun 29, 2015 at 4:35 PM, Henrik Lindberg henrik.@cloudsmith.com javascript: mailto:henrik@cloudsmith.com javascript: wrote: On 2015-29-06 16 tel:2015-29-06%2016:48, Romain F. wrote: Hi everyone, I try to optimize our Puppet runs by running some benchmarks and patching the puppet core (if possible).. But I have some difficulties around the catalog serialization/deserialization. In fact, in 3.7.5 or 3.8.x, the Config Retrieval takes roughly 7secs and only 4 secs is on the master side. Same fact in 4.2 but with 9 secs of config retrieval and still 4 secs on the master side. My first thoughts was Okay, time to try MsgPack. No improvements. I've instrumented a bit the code in the master branch around this, and I've found out that, on my 9secs of config retrieval, 3.61secs is lost in catalog deserialization, 2 secs is the catalog conversion.. But it's not the real deserialization (PSON to Hash) that takes ages, it's the creation of the Catalog object itself (Hash to catalog). Benchmarks shows that the time to deserialize MsgPack (or PSON) is negligible compared to the catalog deserialization time. So here is my question : Is that a known issue ? Is there any reason of the regression in 4.x (Future parser creating more objects, ...) ? The parser=future setting only makes a difference when compiling the catalog - the catalog itself does not contain more or different data (except possibly using numbers instead of strings for some attributes). The best way to optimize this is to write a benchmark using the benchmark framework and measure the time it takes to deserialize a given catalog. Then run that benchmark
[Puppet-dev] Re: Community PR Triage for 2015-06-30
On Tuesday, June 30, 2015 at 8:19:23 AM UTC-7, Josh Cooper wrote: The PR triage for puppet/facter/hiera/puppet-server will be starting at 10:00 AM PDT today at http://links.puppetlabs.com/pr-triage Notes from today's PR triage are posted: https://github.com/puppet-community/community-triage/blob/master/core/notes/2015-06-30.md . We merged several PRs (including 4 for native facter on openbsd, thanks Jasper!) and had good discussions around filebuckets, static compiler, and agent-side profiling. We'll be back next week at the usual time. Josh -- Josh Cooper Developer, Puppet Labs *PuppetConf 2015 http://2015.puppetconf.com/ is coming to Portland, Oregon! Join us October 5-9.* *Register now to take advantage of the Early Adopter discount https://www.eventbrite.com/e/puppetconf-2015-october-5-9-tickets-13115894995?discount=EarlyAdopter * *—**save $349!* -- You received this message because you are subscribed to the Google Groups Puppet Developers group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/5666d15a-e36c-4d8f-90a7-fb70c0641fc2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.