It's hard to say, there are a number of variables. Some things to think
about: Are the tasks idempotent? do they have leases (like SQS)? Is one
process responsible for processing the tasks or will you have many vying
for the jobs? Are the tasks ordered by creation date, or weighted by
some factor? If processing for a task fails should another processor
start processing, or drop the task, or move the task to a failed list?
(to guard against totally blocking processing if 2 tasks are continually
failing due to say, an error in the processing code). etc...
A simple approach might be to have a single queue of tasks:
http://hadoop.apache.org/zookeeper/docs/current/recipes.html#sc_recipes_Queues
where:
1) your task processors look for the first available task
2) if found they create a ephemeral node as a child of the task node
(if the processor dies the ephemeral node will be removed)
3) the processor processes the task then deletes the task when "done"
the ephemeral created in 2) indicates whether a task is available or not
processors set watches on un-available tasks (the watch is on the
ephemeral), and re-run 1) when the watch eventually triggers
(hint, you have to use exists("task/child", true) for the available check)
Obv if 3 is partially successful (ie you process the task and update,
but fail before deleting the task node) then non-idempotence is going to
be an issue. There are probably other considerations as well as the
short list I gave above.
This sounds like a useful recipe to include in src/recipes if you are in
a position to contribute back.
Regards,
Patrick
Alexander Sibiryakov wrote:
Hello everybody!
Please give me advice about designing application on zookeeper. I'd like
to implement queue with limit on number of simultaneous tasks. For
example I have 10 tasks, and I can process only 2 tasks simultaneously.
When one task is finished processing, system should start another,
supporting the number of task being in processing state within 2. Thanks.