None of this sounds particularly difficult, just a SMOP. The
interesting bits aren't getting the job server up and running. What
will be interesting is scaling the server. Am I correct to assume that
the workers now directly pull from the queue? And then you would be
changing that to a single queue puller that will then redistribute to
worker processes? You may end up killing your over all capacity with an
artificial bottleneck. 

We ended up building a similar system (but using 0mq, which has been
both good and bad, but mostly bad, heh) at work with a very small/simple
broker written in C for speed distributing work to individual workers.
But instead of having complicated management, we just opted for a round
robin approach. And instead of querying the broker for stats, we setup
graphite and statsd and the worker processes themselves spit out
metrics in a non-blocking fashion (after sending off the reply) so we
can monitor things like request processing time, various counts on
request resolution, etc.

The workers themselves actually have a control channel to which we can
individually address to tell them to pause/resume/stop. This gives us
the flexibility to do rolling restarts of our workers when we are doing
a deployment of updated worker code.

And knowing that we can easily consume all of the cores on the machine,
our infrastructure is such that we have multiple of such machines and
do round-robin dns (with the ability to drop a host out for maintenance
if needed). 

In the end, I would shy away from monolithic systems, especially in
POE. POE is fast, especially with the right event loop, but ultimately,
it is still a single process. Any time spent processing your management
requests is time that it is not spending assigning work or
gathering results from a worker.

Some things to consider (and watch out for) if you decide to move
forward with this:

Inter-session communication is slow. Really slow. And the number of
sessions that the kernel must keep track of also has an impact on
performance (POE does a lot of book keeping). You'll need to figure out
some sort of serialization mechanism for framing your requests to the
workers. By default, POE::Filter::Reference uses Storable which is
really slow and also produces pretty bloated output. You will need to
possibly consider a different Filter module (I know of someone building
a Filter using Sereal). Also be prepared to manage timeouts and your
worker processes going away and what your master process will do in
those cases. You will likely end up writing more error handling code
than actual get-work-done code.

For what it is worth, I actually wrote our worker code using Reflex (a
better abstraction layer on top of POE). It basically spins up a
singleton session and completes as many operations within a timeslice
as possible to avoid going through the kernel as often. If latency in
your worker processes is a concern you might consider using Reflex.
There are various POE adaptors as well so they interoperate.

On Thu, 14 Mar 2013 18:23:24 -0700
Kevin Goess <[email protected]> wrote:

> We currently use rabbitmq for message between our web application and
> asynchronous workers. The worker management is somewhat ad hoc, and
> we're looking for a way to get a better handle on them.
> 
> It looks like POE has the components that I want, so it's finally
> time for me to learn about POE.  But there's an awful lot of POE
> material out there, and I'm afraid if I try to digest it all I'll
> have made a lot of false starts before I find the right path.  Can
> anybody tell me if I'm going in the right direction, or if there's
> already something out there that does this?
> 
> I think I want to use POE as a job server driven by the AMQP POE
> client, with workers in separate child processes handled by something
> like POE::Component::Daemon (which has a scoreboard) or
> POE::Wheel::Run.
> 
> I'd like to be able to query the server on a management port with
> questions like
> 
>    - How many messages per queue are you receiving
>    - What's the completion time for jobs on each queue
>    - How idle/busy are your child workers?
> 
> It should be able to take commands like "add or drop these queues",
> and it should automatically take care of tasks like making sure no
> queue is being starved in favor of another queue.
> 
> Is this the right idea? Is there a general direction for this that
> would be obvious to sketch out that would save me having to
> understand every example in the poe cookbook?
> 
> Any pointers would be appreciated.  Thanks!


-- 

Nicholas Perez
XMPP/Email: [email protected]
https://metacpan.org/author/NPEREZ
http://github.com/nperez

Reply via email to