Have you checked out the distributed queue recipe? It is what I have used to implement a solution to a similar problem. http://hadoop.apache.org/zookeeper/docs/r3.3.2/recipes.html
Are the jobs worker-specific, or can all workers handle all jobs? The distributed queue protocol is very nice and simple. If you have a list of tasks and the workers are all able to handle all tasks, they can just pick tasks off as they become available and you don't have to worry too much about load balancing. Otherwise you can use the same recipe to do a queue per worker. Either way I think it will answer some of your questions about how to watch and not miss tasks. C -----Original Message----- From: Sabyasachi Ruj [mailto:[email protected]] Sent: Monday, March 07, 2011 5:42 AM To: [email protected] Subject: Task/Job distribution using ZooKeeper Hi, I am planning to write an application which will have Worker processes distributed across multiple machines. One of them will be Leader which will assign tasks to other processes. Designing the Leader elelection process is quite simple: each process tries to create a ephemeral node in the same path. Whoever is successful, becomes the leader. I got this technique from Mahadev Konar's talk here: http://developer.yahoo.com/blogs/ydn/posts/2009/08/hadoop_summit_zookeeper/ . But could not find any discussion about task/job distribution using ZooKeeper. I'll elaborate a little on the environment setup: Suppose there are 10 worker maschines, each one runs a process, one of them becomes the Leader. Tasks are submitted in the queue (may be managed in MySQL), the Leader takes them and assigns to a worker. The worker processes gets notified whenever a tasks is submitted by the leader. I think these jobs can be coordinated as child znodes for each worker node like: /server/worker1/job1 /server/worker1/job2 /server/worker1/job3 /server/worker2/job1 /server/worker2/job2 To get an alert whenever a job is submitted, the workers can watch on its corresponding znode. But again I've a doubt here. Is there a chance in this case, that some jobs might get lost/delayed? Step 1: Worker is watching on its zonde for jobs. Step 2: Server submits a job X. Step 3: Worker gets notified. Step 4: Before setting the watch again, server submits another job Y. Step 5: Now the worker sets the watch. So, my questions are: 1. How to design the process of distributing the tasks evenly? 2. Was ZooKeeper designed for this use case? 3. In the example above, is there a chance that the worker may miss notification for job Y? -- Sabyasachi
