Re: [Puppet-dev] Ideas for Batch Processing of Packages

Andy Parker Wed, 18 Sep 2013 07:07:51 -0700

On Tue, Sep 17, 2013 at 1:19 PM, John Bollinger
<john.bollin...@stjude.org>wrote:


>
>
> On Monday, September 16, 2013 11:12:58 AM UTC-5, Andy Parker wrote:
>
>> On Mon, Sep 16, 2013 at 6:56 AM, John Bollinger <john.bo...@stjude.org>wrote:
>>
>>>
>>>
>>> On Monday, September 16, 2013 6:48:17 AM UTC-5, Andy Parker wrote:
>>>>
>>>>
>>>> The problem with this picture for being able to batch operations
>>>> together, is that everything turns into calls on the Puppet::Type instance,
>>>> which then makes all of the individual calls to the provider. To batch, we
>>>> need to group resources together and then give the groups to the provider.
>>>>
>>>
>>>
>>> So you don't think my suggestion above to let providers assemble and
>>> apply batches is workable?  I think it requires only one or two extra
>>> signals to the provider (via the Type), to mark batch boundaries.  Most of
>>> the magic would happen in those providers that choose to perform it, and
>>> those that don't choose to perform it can just ignore the new signals.  The
>>> main part of the protocol between the agent core and types/providers
>>> remains unchanged.
>>>
>>> I haven't delved into the details of how exactly it would be
>>> implemented, so perhaps there is a show-stopper there, but I'm not seeing a
>>> flaw in the basic idea.
>>>
>>>
>> No, you are right. I forgot about that one. I was just running through
>> the code, the biggest problem that I can see so far is simply that there
>> isn't "the provider". We end up with a provider instance per resource, as
>> far as I can tell. Others have solved that by tracking data on the provider
>> class.
>>
>> I think that for the batching we just need a way of asking a provider if
>> two resources (for the same provider class) are batchable. The comparison
>> of batchable needs to be transitive (so if A and B are batchable, and so
>> are B and C, then all A, B, and C are). In fact it needs to also be
>> symmetric and reflexive, since it is really just another form of equality.
>> That helps us to define what can be batched together.
>>
>>
>
> I think an equivalence relation may be stronger than is needed.  It should
> be sufficient to be able to answer this weaker question: given a set S of
> mutually batchable resources and a resource R not in S, can R be batched
> together with all the resources of S?  It is possible for a provider type
> to be able to batch resources on that basis, but not meaningfully to batch
> resources based on a full equivalence relation.
>
>
The difference just comes down to how conservative the approach should be.
The equivalence relation leaves out the possibility of a provider being
able to say that of a set of three resources A, B, C that it can batch [A,
B] or [B, C], but not [A, B, C] (because of the transitive constraint). The
way that you state it would allow it to do that. I actually think that
leaving the batching that open would lead to unwanted variations of batches
between runs. By expressing the batches in the form of an comparison
operator then we can guarantee consistency (unless the implementation of
the comparison does not conform to the requirements). There is actually
also a run time problem. The comparison operator allows batches to be built
in O(n) time, where n is the number of resources to consider, whereas, in
general, the set inclusion method would be O(n^2). I think having the
batching criteria a bit more conservative ends up being a win because of
easy of definition and speed of execution even if it will miss some cases
where it could have created a batch.


>
>
>> Once we know that, then the problem is how to decide what exactly the
>> batches are. Since we don't actually have a complete view of all of the
>> resources, the decision is going to be based off of incomplete information.
>> Also the batches might need to take into account other factors that are out
>> of the control of the provider such as constraints from the "ordering"
>> selected.
>>
>
>
> There are questions related to what can be batched together that are
> better answered by the agent core, and other parts that can be answered
> only by provider types (as distinguished from provider instances).  There
> must therefore be some type of communication between the two about the
> matter in order to do it right.  I remain enamored of the idea of putting
> the reins in the hands of provider types, partly because I think it affords
> a simpler API, and partly because providers are able to provide appropriate
> specialization.
>
>
I agree, the provider type specifies what can be in a batch (via the
comparison discussed above), the core decides what candidates to try to
batch.


> Consider, for example, the "yum" Package provider.  Because of yum's
> nature, the provider cannot easily support batching out-of-sync packages
> ensured 'absent' with out-of-sync packages ensured 'installed', 'latest',
> or <version>, but as long as external considerations (e.g. relationships
> with other resources) do not preclude it, the yum provider could
> simultaneously build separate batches for the two categories.  That would
> allow for larger batches to be formed under some circumstances, and it
> could be essential for correct operation of removals in others.
>
> More generally, batching under control of providers would allow batches of
> different provider types and even of different resource types to be formed
> simultaneously, provided always that the application order of the relevant
> resources is not constrained.
>
>
Are you thinking that a provider should somehow batch together resources of
different type? That seems like a very accident prone thing to do.


>
>
>> So, for instance, for "--ordering manifest" we would probably want to do
>> a kind of run length encoding of the resources. Start a batch on a
>> resource, and end the batch when the next resource is not batchable.
>>
>>
>
> That's similar to my idea for how the agent core would choose when to
> signal that batches must be flushed, but my concept does not require the
> core to have an absolute sense of what is batchable.  It can focus on the
> external considerations alone, such as relationships between resources, and
> let providers worry about the other details.  In any case, the fewer
> ordering constraints the agent must obey, the bigger the batches that can
> be formed.  As such, a requirement to adhere strictly to manifest order is
> potentially a great inhibitor of batching.
>
>
One thing that I'm still uncertain about is how to handle failures, and
what should appear in the report. Should all resources get the same event?
Should they all fail or succeed together? Is that something that the
provider gets to decide. I think the provider should be able to decide it
so that if it is able to separate the parts of the batch, then it can
report on that, and if it isn't, then it just gives them all the same
status.

Should the report contain information about what the batch was? What else
might the report contain?

[image: Inline image 1]

Based on this, I think that at a minimum, the new interface for providers
is:

  * Provider::batchable?(resource1, resource2)
  * Provider::batch_start
  * Provider::batch_end

I think the base Provider class also can do something to help track the
current batch for the implementations. This could be in the form of
individual methods to track the batch, but I think better is a generic
system for the provider type to track state (I dislike mutable state
tracking, but I am not seeing a way around this at the moment).

I'm a little worried about the implications for noop, purging and deleting
and non-ensure batching (do we make it so that batching is just part of the
ensure branch? I think so). Right now Puppet::Transaction::ResourceHarness
has a *lot* of logic around what to do in various situations.


> John
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-dev+unsubscr...@googlegroups.com.
> To post to this group, send email to puppet-dev@googlegroups.com.
> Visit this group at http://groups.google.com/group/puppet-dev.
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
Andrew Parker
a...@puppetlabs.com
Freenode: zaphod42
Twitter: @aparker42
Software Developer

*Join us at PuppetConf 2014, September 23-24 in San Francisco*

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-dev@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.

<<Ensure_evented_batch.gif>>

Re: [Puppet-dev] Ideas for Batch Processing of Packages

Reply via email to