On Friday, September 13, 2013 11:52:42 AM UTC-5, henrik lindberg wrote:
>
> [...]
> To kick this off, we need to do some research and design. So, here is an 
> attempt to get this started by asking a bunch of questions. 
>
> Under what conditions can two (or more) packages operations be batched? 
> ----------------------------------------------------------------------- 
>


My first inclination is to say that a group of packages serviced by the 
same batch-capable provider can be batched as long as there are no indirect 
dependencies among them that pass through a resource outside the group.  
This pre-supposes that the underlying system command will work out ordering 
issues among the designated packages where necessary; I think that is 
pretty normal.

It may be necessary to also require compatible target 'ensure' states, for 
package managers do not necessarily support mixing installs/updates with 
removals in the same command.  Those that do may require special measures 
to make it happen (e.g. yum).

 

> As an example, say that a class contains a series of package resources 
> without any explicit dependencies between them. The idea is that this 
> could be optimized. Are there any conditions that makes this impossible? 
>
>

Packages serviced by different providers cannot be meaningfully batched.

Packages ensured 'installed', 'latest', or <version> might not be batchable 
with packages ensured 'absent'.  As a practical matter, packages ensured 
'purged' probably cannot be batched with packages having any other target 
state, though that could be provider-dependent.

Dependency chains that connect two packages in the group and pass through 
any resource outside the group do not allow the the two packages in 
question to be applied in the same batch.  However, if all the resources in 
such a chain are packages serviced by the same provider, then it could in 
some cases be possible to resolve it by adding them to the batch.  Also, 
under some circumstances it might be possible to ignore intermediary 
dependencies on containers.

 

> What if the resources are chained with explicit dependencies? (Guess is 
> that the dependencies were added for a reason, and should be done as 
> individual dependencies). 
>
>

I think it's optimistic to assume that such dependencies were *always*added for 
a reason, and moreso to assume that that reason is valid and 
still holds.

Also, where the dependencies correspond only to (possibly-nested) class 
boundaries, I don't see any inherent problem with batching.

Alternatively, the reason for such dependencies could be simply to get 
around problems arising from not having batching.  That's not so much an 
issue for installs, but it is a very real issue where you want to ensure 
packages absent without resorting to ensuring them purged.

I suppose, however, that not batching doesn't ever make anyone worse off 
than they already are, so it would be possible to start with the obviously 
safe cases, and expand from there in later Puppet versions as seems 
reasonable and appropriate.

 

> What if the list of packages are of different type?



What would it mean to batch packages of different types?  They must 
ultimately be applied by different commands anyway, so they cannot be 
batched at that level.  Dependencies permitting, however, they could be 
grouped into two or more batches of homogeneous packages.

 

> Is an chain of 
> implicit dependencies between packages of the same type required to make 
> it possible to batch them? (Does it depend on the policy for implicit 
> dependencies; parse-order, random, etc.)? 
>


Implicit dependencies are an implementation detail, as far as I can see, 
thus the question is one of implementation rather than design.  In 
principle, I don't see any reason why implicit dependencies need to matter 
for the purpose of deciding what can be grouped, but there may be practical 
reasons why it makes sense to involve them.

 

>
> What if there are other implicit dependencies. Can it be deduced that an 
> intermixed resource has no effect on the outcome of a following package 
> operation? (Exec's can for sure do things). 
>


It is reasonable for Puppet to rely on the manifests fed to it to be a 
correct and complete representation of the user's requirements, including 
ordering constraints (even though we know that's not always the case).  As 
far as I am aware, Puppet makes no guarantees about (not) reordering 
resources that have no explicit direct or indirect relationships between 
them.  Therefore, I don't see a reason why Puppet could not batch packages 
from among a group of mixed resources lacking explicit dependencies among 
them.

 

>
> Is it possible to optimize across class boundaries? 
>


In principle, yes.

 

>
> Is it enough to look at the queue of actions in the "planned catalog" 
> and simply look-ahead for packages handled by the same provider. An 
> unbroken chain of operations handled by the same provider is collected 
> and then handed off to the provider? Does this provide enough 
> optimization, or are we then likely to miss optimization opportunities? 
>
>

That approach will certainly miss optimization opportunities.  Fewer 
opportunities will be missed in a highly-constrained catalog than in a more 
loosely-constrained one (because constraints/dependencies limit the 
opportunities available), but for greatest optimization, an approach such 
as you suggest needs to be accompanied by a mechanism to maximize the chain 
lengths, within defined ordering constraints .

 

> How can we collect metrics for this? 
>
> What needs to be done to providers? 
> ----------------------------------- 
> Clearly the capability to handle multiple requests must be implemented 
> for package managers that support this. What should the API look like? 
>


There are many possibilities, I'm sure.  One would be to endow package 
providers (maybe all providers) with "begin group" and "end group" methods 
that interested providers could hook to form and apply batches, but other 
providers simply ignore.  Indeed, probably an "end group" method alone 
would be sufficient in such a scheme.

 

>
> What needs to be done to the Package type? 
> ------------------------------------------ 
> Is it all an ordering issue and handing off resources to the provider, 
> or do we need to do things to the Package type as well? 
>
>

If Puppet does a reasonably good job of batching packages, then I don't 
think any change to the Package type is needed.  In particular, I think the 
user-defined batches in the proposed patch for issue 2198 are a poor idea 
because of their great potential to conflict with relationships.

 

> Are there situations were it is of value to veto batching per resource? 
> (depending on how much optimization than can be deduced by looking at 
> resource-dependencies). 
>


I don't know about vetoing on a per-resource basis (which is by no means a 
firm "no"), but I can imagine wanting to disable it on a per-run basis.  
Specifically, it may be easier to diagnose problems when packages are 
managed individually instead of in batches.  That could be a Puppet 
configuration option.

 

>
> Explicit group/list? 
> -------------------- 
> If we want users to be able to explicitly give a group of packages to 
> manage - how should that work? A new resource type? An attribute on 
> Package? A defined type? 
>
>

I do not think such a feature should be the first priority.  If automatic 
batching ends up working well, then I don't immediately see the wisdom in 
providing user-directed grouping at all.  If it is to be provided anyway, 
though, then much depends on the automatic batching implementation.  For 
example, if batching is documented to not cross class boundaries, then 
classes would form a natural grouping mechanism.

On the other hand, I think batching really needs to cross defined-type 
boundaries, else everyone who uses a defined-type wrapper around their 
package declarations will miss out on the optimization.

 

> If we cannot optimize across classes, can we support explicit 
> grouping/batch operation? (Seems complex with yet another containment 
> hierarchy - or can this be done by introspecting a dependency chain of 
> custom resources/classes perhaps used specifically for this purpose? 
>
>

As I said, I think it wise to avoid explicit package grouping.  I would 
prefer package batches to not be limited by class boundaries, but if it 
were limited that way, and if PL was committed to that, then classes would 
already provide a reasonable solution for limiting groups.  I don't think 
limiting groups to be contained within classes resolves all the issues that 
could arise from user-specified groups ala the issue 2198 patch.


John

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-dev@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to