As you look at this, I would be grateful if you can evaluate alternative implementations in which
a) each task is a separate file or b) all tasks are listed and described in a single file that is updated atomically using standard ZK read-modify-write-repeat-on-failure style or c) all tasks are listed in a single file, but their descriptions are kept in separate files whose names are in the single file. Atomic updates occur to the single file, task files are cleaned up as well as possible. And task files that are not deleted in good order (should be exceedingly rare) can be recognized by lack of a reference from the single control file. The trade-offs here occurs with large numbers of running tasks, large numbers of pending tasks or very high task churn rates. Option (a) becomes very bad with many pending tasks because selecting a task may have server round trips proportional to number of pending tasks. Option (b) might exceed the maximum file size for moderate number of tasks. Option (c) seems safe except for the occasional need for garbage cleanup if programs fail between updating the control file and deleting the task files. Mostly people talk about (a), but (c) seems very competitive to me. All of these alternatives simply implement the "look for" verb in Patrick's excellent outline. What he suggests for task working convention is quite reasonable. On Tue, Jul 14, 2009 at 9:45 AM, Patrick Hunt <ph...@apache.org> wrote: > 1) your task processors look for the first available task > 2) if found they create a ephemeral node as a child of the task node > (if the processor dies the ephemeral node will be removed) > 3) the processor processes the task then deletes the task when "done" >