None of this sounds particularly difficult, just a SMOP. The interesting bits aren't getting the job server up and running. What will be interesting is scaling the server. Am I correct to assume that the workers now directly pull from the queue? And then you would be changing that to a single queue puller that will then redistribute to worker processes? You may end up killing your over all capacity with an artificial bottleneck.
We ended up building a similar system (but using 0mq, which has been both good and bad, but mostly bad, heh) at work with a very small/simple broker written in C for speed distributing work to individual workers. But instead of having complicated management, we just opted for a round robin approach. And instead of querying the broker for stats, we setup graphite and statsd and the worker processes themselves spit out metrics in a non-blocking fashion (after sending off the reply) so we can monitor things like request processing time, various counts on request resolution, etc. The workers themselves actually have a control channel to which we can individually address to tell them to pause/resume/stop. This gives us the flexibility to do rolling restarts of our workers when we are doing a deployment of updated worker code. And knowing that we can easily consume all of the cores on the machine, our infrastructure is such that we have multiple of such machines and do round-robin dns (with the ability to drop a host out for maintenance if needed). In the end, I would shy away from monolithic systems, especially in POE. POE is fast, especially with the right event loop, but ultimately, it is still a single process. Any time spent processing your management requests is time that it is not spending assigning work or gathering results from a worker. Some things to consider (and watch out for) if you decide to move forward with this: Inter-session communication is slow. Really slow. And the number of sessions that the kernel must keep track of also has an impact on performance (POE does a lot of book keeping). You'll need to figure out some sort of serialization mechanism for framing your requests to the workers. By default, POE::Filter::Reference uses Storable which is really slow and also produces pretty bloated output. You will need to possibly consider a different Filter module (I know of someone building a Filter using Sereal). Also be prepared to manage timeouts and your worker processes going away and what your master process will do in those cases. You will likely end up writing more error handling code than actual get-work-done code. For what it is worth, I actually wrote our worker code using Reflex (a better abstraction layer on top of POE). It basically spins up a singleton session and completes as many operations within a timeslice as possible to avoid going through the kernel as often. If latency in your worker processes is a concern you might consider using Reflex. There are various POE adaptors as well so they interoperate. On Thu, 14 Mar 2013 18:23:24 -0700 Kevin Goess <[email protected]> wrote: > We currently use rabbitmq for message between our web application and > asynchronous workers. The worker management is somewhat ad hoc, and > we're looking for a way to get a better handle on them. > > It looks like POE has the components that I want, so it's finally > time for me to learn about POE. But there's an awful lot of POE > material out there, and I'm afraid if I try to digest it all I'll > have made a lot of false starts before I find the right path. Can > anybody tell me if I'm going in the right direction, or if there's > already something out there that does this? > > I think I want to use POE as a job server driven by the AMQP POE > client, with workers in separate child processes handled by something > like POE::Component::Daemon (which has a scoreboard) or > POE::Wheel::Run. > > I'd like to be able to query the server on a management port with > questions like > > - How many messages per queue are you receiving > - What's the completion time for jobs on each queue > - How idle/busy are your child workers? > > It should be able to take commands like "add or drop these queues", > and it should automatically take care of tasks like making sure no > queue is being starved in favor of another queue. > > Is this the right idea? Is there a general direction for this that > would be obvious to sketch out that would save me having to > understand every example in the poe cookbook? > > Any pointers would be appreciated. Thanks! -- Nicholas Perez XMPP/Email: [email protected] https://metacpan.org/author/NPEREZ http://github.com/nperez
