Re: Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)
I agree, masterless is ideal but it is against KISS somehow About error handling, does ZK-22 means disconnection will be eliminated from API and will be solely handled by ZK implementation? I am not sure it is such a good idea though. Application layer need to be notified that communication with ZK has been broken - things may out of sync - and enter the safe mode accordingly.I would image in some cases current fail fast behavior might be desirable. Another layer on top of ZK API (with overridable behavior, like the ProtocolSupport stuff) seems strike a balance here... my 2c
Re: Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)
Thanks for the detailed explanation, Mahadev and Ted. The suggestions are very valuable to us. One additional question for how zookeeper handles errors: Let's say we have 3 zookeeper servers Z1, Z2, Z3, and 3 clients C1, C2, C3. C1 is connected to Z1. C2 is connected to Z2. C3 is connected to Z3. C3 has created a ephemeral node. What will happen if C3 and Z3 are partitioned from the rest of the world? I guess C3 should see some errors, but where will I get it (since C3 is not calling any zookeeper functions after the ephemeral node is created. I am reading http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200807.mbox/%3c386225.96676...@web31804.mail.mud.yahoo.com%3e There are 2 types of error that C3 needs to handle: 1. disconnections; 2. session expirations. Is that still valid (since it's over 1.5 years old)? I am also reading http://wiki.apache.org/hadoop/ZooKeeper/FAQ and I guess these are the only 2 types of error that C3 need to handle. Correct? Zheng On Sun, Jan 24, 2010 at 7:08 PM, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Zheng, Let's say I have 100 long-running tasks and 20 nodes. I want each of them to take up to 10 tasks. Each of the task should be taken by one and only one node. This is exactly one of our users is using ZooKeeper for. You might want to make it more general saying that a directory /tasks/ will have the list of tasks that need to be processed - (in your case 0-99). Basically storing the list of tasks also in zookeeper. The clients can then read of this list and try creating ephemeral nodes for tasks in mytasks/ and assign themselves as the owner of those tasks. You also should factor in the task dying or the machine not able to start that task. In that case the machine should just remove the ephemeral node that it created and should let the other machines take up that task. Here is one minor thing that might be useful. One of the zookeeper users who was doing exactly the same thing had the number of failures of booting up a task stored as data in /tasks/ znode for that task. This way all the machines can update this count and alert (to the admin) if a task cannot be started or worked upon by a given count of machines. Hope this helps. Thanks Mahadev On 1/23/10 12:58 AM, Zheng Shao zsh...@gmail.com wrote: Let's say I have 100 long-running tasks and 20 nodes. I want each of them to take up to 10 tasks. Each of the task should be taken by one and only one node. Will the following solution solve the problem? Create a directory /mytasks in zookeeper. Normally there will be 100 EPHEMERAL children in /mytasks directory, named from 0 to 99. The data of each will be the name of the node and the process id in the node. This data is optional but allow us to do lookup from task to node and process id. Each node will start 10 processes. Each process will list the directory /mytasks with a watcher If trigger by the watcher, we relist the directory. If we found some missing files in the range of 0 to 99, we create an EPHEMERAL node with no-overwrite option if the creation is successful, then we disable the watcher and start processing the corresponding task (if something goes wrong, just kill itself and the node will be gone) if not, we go back to wait for watcher. Will this work? -- Yours, Zheng
Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)
Let's say I have 100 long-running tasks and 20 nodes. I want each of them to take up to 10 tasks. Each of the task should be taken by one and only one node. Will the following solution solve the problem? Create a directory /mytasks in zookeeper. Normally there will be 100 EPHEMERAL children in /mytasks directory, named from 0 to 99. The data of each will be the name of the node and the process id in the node. This data is optional but allow us to do lookup from task to node and process id. Each node will start 10 processes. Each process will list the directory /mytasks with a watcher If trigger by the watcher, we relist the directory. If we found some missing files in the range of 0 to 99, we create an EPHEMERAL node with no-overwrite option if the creation is successful, then we disable the watcher and start processing the corresponding task (if something goes wrong, just kill itself and the node will be gone) if not, we go back to wait for watcher. Will this work? -- Yours, Zheng
Re: Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)
This should roughly work. The one thing that I have seen that would not work well with this would be processes that run anomalously long. As such, I would include an expected time of completion as well as process id in the task ephemeral file. Then you can run a period cleanup process to look for tasks that have out-lived their expected span of time. Any tasks that have run much longer than expected can be killed. That should cause the ephemeral file for that process to vanish and other tasks can bid for the task. Of course you will also need a reliable way to signal completion of the task and you may need some way to indicate what kinds of output were produced and where these are located. The deletion of the original task file is a natural way to signal completion, but you have to be careful about any other state changes recording the completion and finish those state changes before deleting the task file. That way if the process is killed or dies or is disconnected before completely recording the result of the task, nobody will think that the task is done. On Sat, Jan 23, 2010 at 12:58 AM, Zheng Shao zsh...@gmail.com wrote: Each node will start 10 processes. Each process will list the directory /mytasks with a watcher If trigger by the watcher, we relist the directory. If we found some missing files in the range of 0 to 99, we create an EPHEMERAL node with no-overwrite option if the creation is successful, then we disable the watcher and start processing the corresponding task (if something goes wrong, just kill itself and the node will be gone) if not, we go back to wait for watcher. Will this work? -- Ted Dunning, CTO DeepDyve