On Tue, Sep 17, 2013 at 1:19 PM, John Bollinger <john.bollin...@stjude.org>wrote:
> > > On Monday, September 16, 2013 11:12:58 AM UTC-5, Andy Parker wrote: > >> On Mon, Sep 16, 2013 at 6:56 AM, John Bollinger <john.bo...@stjude.org>wrote: >> >>> >>> >>> On Monday, September 16, 2013 6:48:17 AM UTC-5, Andy Parker wrote: >>>> >>>> >>>> The problem with this picture for being able to batch operations >>>> together, is that everything turns into calls on the Puppet::Type instance, >>>> which then makes all of the individual calls to the provider. To batch, we >>>> need to group resources together and then give the groups to the provider. >>>> >>> >>> >>> So you don't think my suggestion above to let providers assemble and >>> apply batches is workable? I think it requires only one or two extra >>> signals to the provider (via the Type), to mark batch boundaries. Most of >>> the magic would happen in those providers that choose to perform it, and >>> those that don't choose to perform it can just ignore the new signals. The >>> main part of the protocol between the agent core and types/providers >>> remains unchanged. >>> >>> I haven't delved into the details of how exactly it would be >>> implemented, so perhaps there is a show-stopper there, but I'm not seeing a >>> flaw in the basic idea. >>> >>> >> No, you are right. I forgot about that one. I was just running through >> the code, the biggest problem that I can see so far is simply that there >> isn't "the provider". We end up with a provider instance per resource, as >> far as I can tell. Others have solved that by tracking data on the provider >> class. >> >> I think that for the batching we just need a way of asking a provider if >> two resources (for the same provider class) are batchable. The comparison >> of batchable needs to be transitive (so if A and B are batchable, and so >> are B and C, then all A, B, and C are). In fact it needs to also be >> symmetric and reflexive, since it is really just another form of equality. >> That helps us to define what can be batched together. >> >> > > I think an equivalence relation may be stronger than is needed. It should > be sufficient to be able to answer this weaker question: given a set S of > mutually batchable resources and a resource R not in S, can R be batched > together with all the resources of S? It is possible for a provider type > to be able to batch resources on that basis, but not meaningfully to batch > resources based on a full equivalence relation. > > The difference just comes down to how conservative the approach should be. The equivalence relation leaves out the possibility of a provider being able to say that of a set of three resources A, B, C that it can batch [A, B] or [B, C], but not [A, B, C] (because of the transitive constraint). The way that you state it would allow it to do that. I actually think that leaving the batching that open would lead to unwanted variations of batches between runs. By expressing the batches in the form of an comparison operator then we can guarantee consistency (unless the implementation of the comparison does not conform to the requirements). There is actually also a run time problem. The comparison operator allows batches to be built in O(n) time, where n is the number of resources to consider, whereas, in general, the set inclusion method would be O(n^2). I think having the batching criteria a bit more conservative ends up being a win because of easy of definition and speed of execution even if it will miss some cases where it could have created a batch. > > >> Once we know that, then the problem is how to decide what exactly the >> batches are. Since we don't actually have a complete view of all of the >> resources, the decision is going to be based off of incomplete information. >> Also the batches might need to take into account other factors that are out >> of the control of the provider such as constraints from the "ordering" >> selected. >> > > > There are questions related to what can be batched together that are > better answered by the agent core, and other parts that can be answered > only by provider types (as distinguished from provider instances). There > must therefore be some type of communication between the two about the > matter in order to do it right. I remain enamored of the idea of putting > the reins in the hands of provider types, partly because I think it affords > a simpler API, and partly because providers are able to provide appropriate > specialization. > > I agree, the provider type specifies what can be in a batch (via the comparison discussed above), the core decides what candidates to try to batch. > Consider, for example, the "yum" Package provider. Because of yum's > nature, the provider cannot easily support batching out-of-sync packages > ensured 'absent' with out-of-sync packages ensured 'installed', 'latest', > or <version>, but as long as external considerations (e.g. relationships > with other resources) do not preclude it, the yum provider could > simultaneously build separate batches for the two categories. That would > allow for larger batches to be formed under some circumstances, and it > could be essential for correct operation of removals in others. > > More generally, batching under control of providers would allow batches of > different provider types and even of different resource types to be formed > simultaneously, provided always that the application order of the relevant > resources is not constrained. > > Are you thinking that a provider should somehow batch together resources of different type? That seems like a very accident prone thing to do. > > >> So, for instance, for "--ordering manifest" we would probably want to do >> a kind of run length encoding of the resources. Start a batch on a >> resource, and end the batch when the next resource is not batchable. >> >> > > That's similar to my idea for how the agent core would choose when to > signal that batches must be flushed, but my concept does not require the > core to have an absolute sense of what is batchable. It can focus on the > external considerations alone, such as relationships between resources, and > let providers worry about the other details. In any case, the fewer > ordering constraints the agent must obey, the bigger the batches that can > be formed. As such, a requirement to adhere strictly to manifest order is > potentially a great inhibitor of batching. > > One thing that I'm still uncertain about is how to handle failures, and what should appear in the report. Should all resources get the same event? Should they all fail or succeed together? Is that something that the provider gets to decide. I think the provider should be able to decide it so that if it is able to separate the parts of the batch, then it can report on that, and if it isn't, then it just gives them all the same status. Should the report contain information about what the batch was? What else might the report contain? [image: Inline image 1] Based on this, I think that at a minimum, the new interface for providers is: * Provider::batchable?(resource1, resource2) * Provider::batch_start * Provider::batch_end I think the base Provider class also can do something to help track the current batch for the implementations. This could be in the form of individual methods to track the batch, but I think better is a generic system for the provider type to track state (I dislike mutable state tracking, but I am not seeing a way around this at the moment). I'm a little worried about the implications for noop, purging and deleting and non-ensure batching (do we make it so that batching is just part of the ensure branch? I think so). Right now Puppet::Transaction::ResourceHarness has a *lot* of logic around what to do in various situations. > John > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to puppet-dev+unsubscr...@googlegroups.com. > To post to this group, send email to puppet-dev@googlegroups.com. > Visit this group at http://groups.google.com/group/puppet-dev. > For more options, visit https://groups.google.com/groups/opt_out. > -- Andrew Parker a...@puppetlabs.com Freenode: zaphod42 Twitter: @aparker42 Software Developer *Join us at PuppetConf 2014, September 23-24 in San Francisco* -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscr...@googlegroups.com. To post to this group, send email to puppet-dev@googlegroups.com. Visit this group at http://groups.google.com/group/puppet-dev. For more options, visit https://groups.google.com/groups/opt_out.
<<Ensure_evented_batch.gif>>