----- Original Message ----- > On Apr 24, 2011, at 5:27 AM, R.I.Pienaar wrote: > > > Sorry for the patch spam, had to make some tweaks to this: > > > > - fix the commit message to reflect the code change better > > - tweak the queue name used for the identity queue to make writing > > ACLs easier > > This is a significant change in how mcollective can be used, right? > Before this it only worked with topic queues and now it can have > per-host individual queues? > > This seems like a great addition, and I have some ideas of how it'll > change usage, but can you provide some examples of what you expect > people to do with it?
Yes, its a big change, the tl;dr version is I want to make mc appropriate as a transport for BPM job systems and the like. The challenge isnt in making a point to point job system this has been done many times over, the challenge is in marrying the current discovery, broadcast and basic technology that makes MC unique with such a P2P system that they compliment each other. I can build a solid point to point system that shares Authentication, Authorization and Auditing and of course also code you wrote in the RPC layer with the current broadcast system, its the marriage that is important. The improvements you gain from this work fall in various categories: - Optimizing the use at scale - Pluggable/Optional discovery - Jobs aimed at down machines - Leveraging the RPC layer as a HA layer - Optmising the resource usage on the nodes - Making the client side more DOS proof - Scheduling and long running jobs - Converting more traditional systems into async systems We now have a pretty solid RPC system thats tailored for parallel use and most often we do use the code exposed by RPC in such a manner. Of course in MC you can choose to target 1 machine or at scale just 0.01% of machines for a specific RPC req, this will allow me to optimise for those cases - you can address the very same shared RPC code but in this manner. The current model is heavily based on discovery, BPM systems have other needs they know what is out there and just want to talk to the machines they know. This will let me make the discovery phase pluggable/optional but still let you re-use the RPC code you wrote in this approach. We could keep their databases up to date by using MC registration. Currently MC only cares for what is up - its unaware of whats down and so what machines isnt even receiving its requests. The current API supports bypassing discovery and working based on discovery information you provide but its a bit of a square peg in a round hole world so I want to make that better. You will be able to submit a job to a down host, the middleware will persist it (with optional timeout) and if the machine comes back it will get that request and take action. This is very complex and dangerous and the use cases here is pretty thin on the ground but I'll implement it and I am sure we'll get a better view of what it enables. Queues have the side effect that 1 message will only ever be handled by one consumer. If you had an agent on a MC network that existed to send SMS or Email messages you can just start up 10 copies of this code on 10 machines and submit jobs to the queue for this code. The middleware will load balance across the 10 copies giving you HA and scalability more or less for free. Allowing MC RPC agents to be exposed in this manner is a huge win for code reuse in muliple domains. Today every job is a thread. If I dont make a thread right now I will loose the request and not take action. The whole thing is designed for this aim, there's a guy on a CLI waiting for responses. This works fine, better than it sounds here really. But as MC gets adopted into wider use by more and more systems in your infrastructure I will need a style of job that can be throttled. A queue will let me do that, the middleware will hang onto my jobs while I am out of threads and give them to me when I have threads available again. By its nature this wont be the default and I cant really do this sanely with the broadcast traffic but if you're doing jobs they tend to be long running and CPU intensive, submit those using this method and you'll get throttling and constrained concurrency. Again its in the marriage not in one replacing the other. With replies going out to a topic it means your receiving end needs to be fast there's more or less a 1000 message buffer and if you're not consuming replies fast enough that buffer fills up and messages might get lost. Given enough machines, big enough responses we now will be doing a DOS on the client code. Having responses in a queue means I can take a day to process the result of a RPC request it wont matter. Not a problem anyone has run into yet outside of my own experiments but when the internal infrastructure for the work outlined here is done this is a tiny change to effect to future proof us in this regard I want to say do a yum update at 4am. I want to submit that job the day before and come back the day after and ask the network how it went. I want to be able to create 5 jobs with dependencies. I want to be able to edit scheduled jobs. I want to be able to query the network for machines that will run a scheduled job etc. We'll use the current broadcast medium to create/query/ edit these jobs. We'll possibly use queues to store jobs but most likely the jobs will live on the nodes disks. We'll have a BPM like supervisor that notices new machines being provisioned that should have a certain job and it would create those using a P2P paradigm. Today if we wanted to break Puppet up to use middleware as a transport MC would not be able to do that since Puppets needs are almost always point to point and more a conversation than a series of jobs. A queued based system and the ability to improve this story a lot there should be no reason why Puppet or systems like it cant just be API's callable at the MC layer. these systems wouldnt need to write their own AAA etc, they can just use whatever of the many AAA plugins are in use at the transport - MC - layer. And finally if you bring all of this together you have a system that will enable true BPM or Job system use that shares code, authorization, authentication and auditing with the broadcast world. BPM systems built ontop of MC might have jobs exposed at their layer but they can expose some of their business logic to users on the CLI/monitoring/dashboards/etc in the current highly interactive way. A BPM system might be as simple as my POC Puppet Type (https://github.com/ripienaar/puppet-mcollective) or maybe we provide a transport for systems like Run Deck or maybe we becomes an attractive transport for other 3rd party commercial systems. As the enabling technology - MC - becomes more ubiquitous by adoption into distros and by people like Canonical building systems like their new Ubuntu Orchastra using MC I want to see more 3rd parties building systems that use us like they would use TCP/IP today. Ticket 7226 is an anchor for much of this work, it sounds like a hug big job its not really that epic. Much of the ground work is laid and the code changes are relatively small as I've had much of this stuff in mind for ages already but the result that they enable is huge. Version 1.2.0 will be out hopefully next week and the new 1.3.x dev series will focus on this work. -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.
