[Puppet-dev] Ideas for Batch Processing of Packages

Henrik Lindberg Fri, 13 Sep 2013 09:53:44 -0700

Hi,

Ideas regarding a potential performance boost that can be gained byperforming batch processing of package installs/operations has beenfloating around in the Puppet echo system for quite some time.

There is a discussion (and a somewhat dated implementation/proposal) inhttp://projects.puppetlabs.com/issues/2198 which is good backgroundreading for this topic.

In issue #2198 (if you skipped reading it ;-)), the idea is that Puppetshould have the feature to install a list of packages given by the user.

It seems doable to generalize this idea and let puppet automaticallyoptimize package installs under certain conditions. Performingindividual package installs is quite expensive and even if theoptimization opportunities may not be extensive (e.g. say that 20%(number completely made up) of packages could at least be paired withone other package) this is still a worth while activity.

To kick this off, we need to do some research and design. So, here is anattempt to get this started by asking a bunch of questions.


Under what conditions can two (or more) packages operations be batched?
-----------------------------------------------------------------------

As an example, say that a class contains a series of package resourceswithout any explicit dependencies between them. The idea is that thiscould be optimized. Are there any conditions that makes this impossible?

What if the resources are chained with explicit dependencies? (Guess isthat the dependencies were added for a reason, and should be done asindividual dependencies).

What if the list of packages are of different type? Is an chain ofimplicit dependencies between packages of the same type required to makeit possible to batch them? (Does it depend on the policy for implicitdependencies; parse-order, random, etc.)?

What if there are other implicit dependencies. Can it be deduced that anintermixed resource has no effect on the outcome of a following packageoperation? (Exec's can for sure do things).


Is it possible to optimize across class boundaries?

Is it enough to look at the queue of actions in the "planned catalog"and simply look-ahead for packages handled by the same provider. Anunbroken chain of operations handled by the same provider is collectedand then handed off to the provider? Does this provide enoughoptimization, or are we then likely to miss optimization opportunities?


How can we collect metrics for this?

What needs to be done to providers?
-----------------------------------

Clearly the capability to handle multiple requests must be implementedfor package managers that support this. What should the API look like?


What needs to be done to the Package type?
------------------------------------------

Is it all an ordering issue and handing off resources to the provider,or do we need to do things to the Package type as well?


Are there situations were it is of value to veto batching per resource?

(depending on how much optimization than can be deduced by looking atresource-dependencies).


Explicit group/list?
--------------------

If we want users to be able to explicitly give a group of packages tomanage - how should that work? A new resource type? An attribute onPackage? A defined type?

If we cannot optimize across classes, can we support explicitgrouping/batch operation? (Seems complex with yet another containmenthierarchy - or can this be done by introspecting a dependency chain ofcustom resources/classes perhaps used specifically for this purpose?


- henrik


--
You received this message because you are subscribed to the Google Groups "Puppet 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-dev@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.

[Puppet-dev] Ideas for Batch Processing of Packages

Reply via email to