+1 This approach makes sense to me.
On 10/30/2017 05:26 PM, Michael Hrivnak wrote: > While it's on my mind, I just want to get this idea out to others for future > consideration. I do not think we > should necessarily make any changes to Pulp 3.0 based on this. > > Setup > ------- > > What is a Pulp worker? We tend to think of them as a process, or pair of > processes in parent-child > relationship, with a number from 0-7 (or a higher number if you configure > Pulp as such). Each worker has a > systemd unit file and a queue. We know how many should be running and monitor > them. If you have multiple > machines, each machine has a defined set of numbered workers. > > Pulp tracks each worker in the database. Why? For resource reservation. For > any given resource (usually a > repository), all not-complete tasks are assigned to the same worker so they > go into one FIFO queue, which > preserves order-of-operation. Having one worker per queue guarantees that no > more than one task will run at a > time for a given resource. > > Difficulty arises when we deal with workers going offline. What if a worker > dies unexpectedly and leaves its > queue behind, orphaned? How can we quiesce a worker (stop assigning it work) > so it can be taken offline > gracefully? In a clustered environment, such as Pulp running in Kubernetes or > OpenShift, users will expect the > ability to scale the number of workers up and down, and so we'll need to > address these challenges. The > containerized-Pulp use case helps clarify, I think, the role of workers vs. > queues. > > Pitch > ------ > > Workers are stateless processes. They are a commodity that should come and go > just as easily as the processes > that handle http requests. The only long-term state associated with a worker > is its queue, and I propose that > we (eventually) stop defining a queue based on which worker created it. > > Today: a worker starts, creates a queue for itself, and informs Pulp it is > ready to receive work in that queue. > > Future: a worker starts, the worker informs Pulp it is ready, and Pulp tells > the worker which queues it should > work from. > > Queues become the first-class resource in Pulp that tasks are assigned to. > Pulp monitors workers to ensure > that each queue is assigned to exactly one healthy worker, but it does not > care as much which one. > > Use Cases > -------------- > > If a worker process dies and a new one starts up, Pulp can assign the > orphaned queue to the new worker. > > If a worker dies (gracefully or not) and a new one does not show up, Pulp can > assign the orphaned queue to > another worker, which would do double-duty until one of the queues was > emptied, at which point Pulp could > choose to delete that queue. > > If a new additional worker shows up, Pulp could potentially assign it only to > the general "celery" queue. > Based on some policy, a new resource-reserving queue could optionally be > created in the future, only if/when > it was needed, and assigned to that worker. > > Pulp as a clustered app would own and manage a pool of queues. The number of > queues would be influenced by > user settings (maybe a min and max), how much work is being requested at any > given time, and how many > processes are available to do work. The cluster would manage the full > lifecycle of each queue. > > Pulp would monitor a pool of workers who are effectively anonymous. They > would have no meaningful identity > from a scheduling standpoint. They come and go through outside influence, but > the application would make no > effort to manage their lifecycle. Pulp would only tell each worker which > queues it should work from. > > Summary > ----------- > > Details aside, the important points are: > > - Focus on the queue as the owner of state. > - For purposes of scheduling tasks, worker processes are anonymous. > - Pulp manages a pool of queues, monitors a pool of workers, and assigns > queues to workers as workers come and go. > > Thoughts? Would it help to elaborate with concrete examples? Maybe a > metaphor... > > Black Friday > --------------- > > Extending our familiar Black Friday metaphor... starting with a re-cap. > > Customers at a retail store are standing in one long line to check out. A > traffic-cop at the head of the line > tells each person which register to go to, based on some rules. (each > register represents a worker's queue). > > This proposal is that we should think about the line at each register > separately from the cashier. (the line > is a queue, and the cashier is a worker process) One cashier coming on duty > can take over another's register > so they can go on break. If a cashier has to close their register to go on > break, the cashier next-door might > run back-and-forth between two registers for a while until one of the lines > is empty. An entire shift of 16 > fresh cashiers might show up and relieve the previous shift. (similar to > migrating worker processes from one > machine in a cluster to another; the queues stay the same, but they get > matched with new anonymous workers) > > -- > > Michael Hrivnak > > Principal Software Engineer, RHCE > > Red Hat > > > > _______________________________________________ > Pulp-dev mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/pulp-dev >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
