Re: Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)

2010-01-25 Thread Qing Yan
I agree, masterless is ideal but it is against KISS somehow


About error handling, does ZK-22 means disconnection will be eliminated from
API and will be solely
handled by ZK implementation?

I am not sure it is such a good idea though. Application layer need to be
notified that communication with ZK has been broken - things may out of sync
- and enter the safe mode accordingly.I would image in some cases current
fail fast behavior might be desirable.

Another layer on top of ZK API (with overridable behavior, like the
ProtocolSupport stuff) seems strike a balance here...

my 2c


Re: Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)

2010-01-24 Thread Zheng Shao
Thanks for the detailed explanation, Mahadev and Ted. The suggestions
are very valuable to us.


One additional question for how zookeeper handles errors:

Let's say we have 3 zookeeper servers Z1, Z2, Z3, and 3 clients C1, C2, C3.
C1 is connected to Z1.
C2 is connected to Z2.
C3 is connected to Z3.
C3 has created a ephemeral node.
What will happen if C3 and Z3 are partitioned from the rest of the
world? I guess C3 should see some errors, but where will I get it
(since C3 is not calling any zookeeper functions after the ephemeral
node is created.

I am reading 
http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200807.mbox/%3c386225.96676...@web31804.mail.mud.yahoo.com%3e
There are 2 types of error that C3 needs to handle: 1. disconnections;
2. session expirations.
Is that still valid (since it's over 1.5 years old)?

I am also reading http://wiki.apache.org/hadoop/ZooKeeper/FAQ and I
guess these are the only 2 types of error that C3 need to handle.
Correct?


Zheng

On Sun, Jan 24, 2010 at 7:08 PM, Mahadev Konar maha...@yahoo-inc.com wrote:
 Hi  Zheng,
 Let's say I have 100 long-running tasks and 20 nodes.
 I want each of them to take up to 10 tasks. Each of the task should be
 taken by one and only one node.
 This is exactly one of our users is using ZooKeeper for. You might want to
 make it more general  saying that a directory /tasks/ will have the list of
 tasks that need to be processed - (in your case 0-99). Basically storing the
 list of tasks also in zookeeper. The clients can then read of this list and
 try creating ephemeral nodes for tasks in mytasks/ and assign themselves as
 the owner of those tasks.
 You also should factor in the task dying or the machine not able to start
 that task. In that case the machine should just remove the ephemeral node
 that it created and should let the other machines take up that task.

 Here is one minor thing that might be useful. One of the zookeeper users who
 was doing exactly the same thing had the number of failures of booting up a
 task stored as data in /tasks/ znode for that task. This way all the
 machines can update this count and alert (to the admin) if a task cannot be
 started or worked upon by a given count of machines.

 Hope this helps.

 Thanks
 Mahadev


 On 1/23/10 12:58 AM, Zheng Shao zsh...@gmail.com wrote:

 Let's say I have 100 long-running tasks and 20 nodes.
 I want each of them to take up to 10 tasks. Each of the task should be
 taken by one and only one node.

 Will the following solution solve the problem?

 Create a directory /mytasks in zookeeper.
 Normally there will be 100 EPHEMERAL children in /mytasks directory,
 named from 0 to 99.
 The data of each will be the name of the node and the process id in
 the node. This data is optional but allow us to do lookup from task to
 node and process id.


 Each node will start 10 processes.
   Each process will list the directory /mytasks with a watcher
   If trigger by the watcher, we relist the directory.
   If we found some missing files in the range of 0 to 99, we
 create an EPHEMERAL node with no-overwrite option
     if the creation is successful, then we disable the watcher and
 start processing the corresponding task (if something goes wrong, just
 kill itself and the node will be gone)
     if not, we go back to wait for watcher.

 Will this work?







-- 
Yours,
Zheng


Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)

2010-01-23 Thread Zheng Shao
Let's say I have 100 long-running tasks and 20 nodes.
I want each of them to take up to 10 tasks. Each of the task should be
taken by one and only one node.

Will the following solution solve the problem?

Create a directory /mytasks in zookeeper.
Normally there will be 100 EPHEMERAL children in /mytasks directory,
named from 0 to 99.
The data of each will be the name of the node and the process id in
the node. This data is optional but allow us to do lookup from task to
node and process id.


Each node will start 10 processes.
  Each process will list the directory /mytasks with a watcher
  If trigger by the watcher, we relist the directory.
  If we found some missing files in the range of 0 to 99, we
create an EPHEMERAL node with no-overwrite option
if the creation is successful, then we disable the watcher and
start processing the corresponding task (if something goes wrong, just
kill itself and the node will be gone)
if not, we go back to wait for watcher.

Will this work?



-- 
Yours,
Zheng


Re: Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)

2010-01-23 Thread Ted Dunning
This should roughly work.  The one thing that I have seen that would not
work well with this would be processes that run anomalously long.

As such, I would include an expected time of completion as well as process
id in the task ephemeral file.  Then you can run a period cleanup process to
look for tasks that have out-lived their expected span of time.  Any tasks
that have run much longer than expected can be killed.  That should cause
the ephemeral file for that process to vanish and other tasks can bid for
the task.

Of course you will also need a reliable way to signal completion of the task
and you may need some way to indicate what kinds of output were produced and
where these are located.  The deletion of the original task file is a
natural way to signal completion, but you have to be careful about any other
state changes recording the completion and finish those state changes before
deleting the task file.  That way if the process is killed or dies or is
disconnected before completely recording  the result of the task, nobody
will think that the task is done.

On Sat, Jan 23, 2010 at 12:58 AM, Zheng Shao zsh...@gmail.com wrote:



 Each node will start 10 processes.
  Each process will list the directory /mytasks with a watcher
  If trigger by the watcher, we relist the directory.
  If we found some missing files in the range of 0 to 99, we
 create an EPHEMERAL node with no-overwrite option
if the creation is successful, then we disable the watcher and
 start processing the corresponding task (if something goes wrong, just
 kill itself and the node will be gone)
if not, we go back to wait for watcher.

 Will this work?




-- 
Ted Dunning, CTO
DeepDyve