I agree, masterless is ideal but it is against KISS somehow
About error handling, does ZK-22 means disconnection will be eliminated from
API and will be solely
handled by ZK implementation?
I am not sure it is such a good idea though. Application layer need to be
notified that communication
Thanks for the detailed explanation, Mahadev and Ted. The suggestions
are very valuable to us.
One additional question for how zookeeper handles errors:
Let's say we have 3 zookeeper servers Z1, Z2, Z3, and 3 clients C1, C2, C3.
C1 is connected to Z1.
C2 is connected to Z2.
C3 is connected to
Let's say I have 100 long-running tasks and 20 nodes.
I want each of them to take up to 10 tasks. Each of the task should be
taken by one and only one node.
Will the following solution solve the problem?
Create a directory /mytasks in zookeeper.
Normally there will be 100 EPHEMERAL children in
This should roughly work. The one thing that I have seen that would not
work well with this would be processes that run anomalously long.
As such, I would include an expected time of completion as well as process
id in the task ephemeral file. Then you can run a period cleanup process to
look