Re: [Pulp-dev] rethinking workers vs queues

Jeff Ortel Tue, 31 Oct 2017 07:09:33 -0700

+1

This approach makes sense to me.


On 10/30/2017 05:26 PM, Michael Hrivnak wrote:
> While it's on my mind, I just want to get this idea out to others for future 
> consideration. I do not think we
> should necessarily make any changes to Pulp 3.0 based on this.
> 
> Setup
> -------
> 
> What is a Pulp worker? We tend to think of them as a process, or pair of 
> processes in parent-child
> relationship, with a number from 0-7 (or a higher number if you configure 
> Pulp as such). Each worker has a
> systemd unit file and a queue. We know how many should be running and monitor 
> them. If you have multiple
> machines, each machine has a defined set of numbered workers.
> 
> Pulp tracks each worker in the database. Why? For resource reservation. For 
> any given resource (usually a
> repository), all not-complete tasks are assigned to the same worker so they 
> go into one FIFO queue, which
> preserves order-of-operation. Having one worker per queue guarantees that no 
> more than one task will run at a
> time for a given resource.
> 
> Difficulty arises when we deal with workers going offline. What if a worker 
> dies unexpectedly and leaves its
> queue behind, orphaned? How can we quiesce a worker (stop assigning it work) 
> so it can be taken offline
> gracefully? In a clustered environment, such as Pulp running in Kubernetes or 
> OpenShift, users will expect the
> ability to scale the number of workers up and down, and so we'll need to 
> address these challenges. The
> containerized-Pulp use case helps clarify, I think, the role of workers vs. 
> queues.
> 
> Pitch
> ------
> 
> Workers are stateless processes. They are a commodity that should come and go 
> just as easily as the processes
> that handle http requests. The only long-term state associated with a worker 
> is its queue, and I propose that
> we (eventually) stop defining a queue based on which worker created it.
> 
> Today: a worker starts, creates a queue for itself, and informs Pulp it is 
> ready to receive work in that queue.
> 
> Future: a worker starts, the worker informs Pulp it is ready, and Pulp tells 
> the worker which queues it should
> work from.
> 
> Queues become the first-class resource in Pulp that tasks are assigned to. 
> Pulp monitors workers to ensure
> that each queue is assigned to exactly one healthy worker, but it does not 
> care as much which one.
> 
> Use Cases
> --------------
> 
> If a worker process dies and a new one starts up, Pulp can assign the 
> orphaned queue to the new worker.
> 
> If a worker dies (gracefully or not) and a new one does not show up, Pulp can 
> assign the orphaned queue to
> another worker, which would do double-duty until one of the queues was 
> emptied, at which point Pulp could
> choose to delete that queue.
> 
> If a new additional worker shows up, Pulp could potentially assign it only to 
> the general "celery" queue.
> Based on some policy, a new resource-reserving queue could optionally be 
> created in the future, only if/when
> it was needed, and assigned to that worker.
> 
> Pulp as a clustered app would own and manage a pool of queues. The number of 
> queues would be influenced by
> user settings (maybe a min and max), how much work is being requested at any 
> given time, and how many
> processes are available to do work. The cluster would manage the full 
> lifecycle of each queue.
> 
> Pulp would monitor a pool of workers who are effectively anonymous. They 
> would have no meaningful identity
> from a scheduling standpoint. They come and go through outside influence, but 
> the application would make no
> effort to manage their lifecycle. Pulp would only tell each worker which 
> queues it should work from.
> 
> Summary
> -----------
> 
> Details aside, the important points are:
> 
> - Focus on the queue as the owner of state.
> - For purposes of scheduling tasks, worker processes are anonymous.
> - Pulp manages a pool of queues, monitors a pool of workers, and assigns 
> queues to workers as workers come and go.
> 
> Thoughts? Would it help to elaborate with concrete examples? Maybe a 
> metaphor...
> 
> Black Friday
> ---------------
> 
> Extending our familiar Black Friday metaphor... starting with a re-cap.
> 
> Customers at a retail store are standing in one long line to check out. A 
> traffic-cop at the head of the line
> tells each person which register to go to, based on some rules. (each 
> register represents a worker's queue).
> 
> This proposal is that we should think about the line at each register 
> separately from the cashier. (the line
> is a queue, and the cashier is a worker process) One cashier coming on duty 
> can take over another's register
> so they can go on break. If a cashier has to close their register to go on 
> break, the cashier next-door might
> run back-and-forth between two registers for a while until one of the lines 
> is empty. An entire shift of 16
> fresh cashiers might show up and relieve the previous shift. (similar to 
> migrating worker processes from one
> machine in a cluster to another; the queues stay the same, but they get 
> matched with new anonymous workers)
> 
> -- 
> 
> Michael Hrivnak
> 
> Principal Software Engineer, RHCE 
> 
> Red Hat
> 
> 
> 
> _______________________________________________
> Pulp-dev mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/pulp-dev
>

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev

Re: [Pulp-dev] rethinking workers vs queues

Reply via email to