On Sun, Sep 15, 2013 at 11:27 PM, Erik Dalén <erik.gustav.da...@gmail.com>wrote:
> > > > On 13 September 2013 18:52, Henrik Lindberg < > henrik.lindb...@cloudsmith.com> wrote: > >> Hi, >> Ideas regarding a potential performance boost that can be gained by >> performing batch processing of package installs/operations has been >> floating around in the Puppet echo system for quite some time. >> >> There is a discussion (and a somewhat dated implementation/proposal) in >> http://projects.puppetlabs.**com/issues/2198<http://projects.puppetlabs.com/issues/2198>which >> is good background reading for this topic. >> >> In issue #2198 (if you skipped reading it ;-)), the idea is that Puppet >> should have the feature to install a list of packages given by the user. >> >> It seems doable to generalize this idea and let puppet automatically >> optimize package installs under certain conditions. Performing individual >> package installs is quite expensive and even if the optimization >> opportunities may not be extensive (e.g. say that 20% (number completely >> made up) of packages could at least be paired with one other package) this >> is still a worth while activity. >> >> To kick this off, we need to do some research and design. So, here is an >> attempt to get this started by asking a bunch of questions. >> >> Under what conditions can two (or more) packages operations be batched? > > ------------------------------**------------------------------** >> ----------- >> > > In the relationship graph traversal code > (puppet/graph/relationship_graph#traverse) resources that are ready to run > are put on the queue. I think that it should be possible to batch every > package resource currently on the queue when a package resource is > encountered. It shouldn't matter which class it comes from or anything. > > Yeah, I think that is the most reasonable initial pass at the optimization. Take "batchable" resources in the queue and execute them together. > >> What needs to be done to providers? >> ------------------------------**----- >> Clearly the capability to handle multiple requests must be implemented >> for package managers that support this. What should the API look like? >> >> What needs to be done to the Package type? >> ------------------------------**------------ >> Is it all an ordering issue and handing off resources to the provider, or >> do we need to do things to the Package type as well? >> >> Are there situations were it is of value to veto batching per resource? >> (depending on how much optimization than can be deduced by looking at >> resource-dependencies). >> > > I would hope this is some sort of generic feature other types & providers > can opt in to. For example database_user could create several users in a > single connection without tearing it down between each one. > > I think it should be as well. I was tracing through the code to try and figure out where this could be put in. The decision about whether and how to batch together resource execution is something that the provider would naturally be responsible for, which would mean that the provider should control all of how a resource is synced. Unfortunately, providers don't have that much control right now. After Puppet::Transaction#evaluate decides that the resource (which is an instance of the particular subclass of Puppet::Type) needs to be executed (via Puppet::Graph::RelationshipGraph#traverse), it hands off the resource to Puppet::Transaction::ResourceHarness#evaluate. That method then iterates over the properties, calling Puppet::Property#sync in turn. When there is an :ensure property that isn't in sync, it only tries to sync the ensure property instead of the other properties individually. The sync on the property can result many different behaviors. Syncing the property can result in executing a block that was defined at the time the property was declared on the type (often ending in calling something on the provider, although not always). Often, syncing the property ends up calling provider."#{property.name}="(value). After all of the properties have been synced, then it calls #flush (and checks if #flush exists, even though it looks like it always exists since it is defined in Puppet::Type) on the resource, which in turn calls Puppet::Provider#flush. After looking over all of that, getting this in place requires giving more direct control to the provider. For the most part I think we could restrict ourselves to the path that executes when the :ensure property needs to be synced (which is the case of installing a batch of packages with yum, for instance). Here is (basically) what happens right now when the ensure property of a package gets synced (sorry for the images, but they are the most effective way I know to communicate these things): [image: Inline image 1] The problem with this picture for being able to batch operations together, is that everything turns into calls on the Puppet::Type instance, which then makes all of the individual calls to the provider. To batch, we need to group resources together and then give the groups to the provider. So the basic flow would look more like this: [image: Inline image 2] This is a pretty radical departure from the current type/provider system. How do we get this in without completely throwing away everything we already have? > -- > Erik Dalén > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to puppet-dev+unsubscr...@googlegroups.com. > To post to this group, send email to puppet-dev@googlegroups.com. > Visit this group at http://groups.google.com/group/puppet-dev. > For more options, visit https://groups.google.com/groups/opt_out. > -- Andrew Parker a...@puppetlabs.com Freenode: zaphod42 Twitter: @aparker42 Software Developer *Join us at PuppetConf 2014, September 23-24 in San Francisco* -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscr...@googlegroups.com. To post to this group, send email to puppet-dev@googlegroups.com. Visit this group at http://groups.google.com/group/puppet-dev. For more options, visit https://groups.google.com/groups/opt_out.
<<Ensure_sync.gif>>
<<Ensure_batch.gif>>