Hi, I am planning to write an application which will have Worker processes distributed across multiple machines. One of them will be Leader which will assign tasks to other processes. Designing the Leader elelection process is quite simple: each process tries to create a ephemeral node in the same path. Whoever is successful, becomes the leader. I got this technique from Mahadev Konar's talk here: http://developer.yahoo.com/blogs/ydn/posts/2009/08/hadoop_summit_zookeeper/ . But could not find any discussion about task/job distribution using ZooKeeper.
I'll elaborate a little on the environment setup: Suppose there are 10 worker maschines, each one runs a process, one of them becomes the Leader. Tasks are submitted in the queue (may be managed in MySQL), the Leader takes them and assigns to a worker. The worker processes gets notified whenever a tasks is submitted by the leader. I think these jobs can be coordinated as child znodes for each worker node like: /server/worker1/job1 /server/worker1/job2 /server/worker1/job3 /server/worker2/job1 /server/worker2/job2 To get an alert whenever a job is submitted, the workers can watch on its corresponding znode. But again I've a doubt here. Is there a chance in this case, that some jobs might get lost/delayed? Step 1: Worker is watching on its zonde for jobs. Step 2: Server submits a job X. Step 3: Worker gets notified. Step 4: Before setting the watch again, server submits another job Y. Step 5: Now the worker sets the watch. So, my questions are: 1. How to design the process of distributing the tasks evenly? 2. Was ZooKeeper designed for this use case? 3. In the example above, is there a chance that the worker may miss notification for job Y? -- Sabyasachi
