This should roughly work. The one thing that I have seen that would not work well with this would be processes that run anomalously long.
As such, I would include an expected time of completion as well as process id in the task ephemeral file. Then you can run a period cleanup process to look for tasks that have out-lived their expected span of time. Any tasks that have run much longer than expected can be killed. That should cause the ephemeral file for that process to vanish and other tasks can bid for the task. Of course you will also need a reliable way to signal completion of the task and you may need some way to indicate what kinds of output were produced and where these are located. The deletion of the original task file is a natural way to signal completion, but you have to be careful about any other state changes recording the completion and finish those state changes before deleting the task file. That way if the process is killed or dies or is disconnected before completely recording the result of the task, nobody will think that the task is done. On Sat, Jan 23, 2010 at 12:58 AM, Zheng Shao <zsh...@gmail.com> wrote: > > > Each node will start 10 processes. > Each process will list the directory "/mytasks" with a watcher > If trigger by the watcher, we relist the directory. > If we found some missing files in the range of "0" to "99", we > create an EPHEMERAL node with no-overwrite option > if the creation is successful, then we disable the watcher and > start processing the corresponding task (if something goes wrong, just > kill itself and the node will be gone) > if not, we go back to wait for watcher. > > Will this work? > -- Ted Dunning, CTO DeepDyve