Hi all: In my system I have scheduled tasks that only one cluster member should run. I am using the leader election recipe to determine which cluster member should run the scheduled tasks.
The way it works is that every cluster member has the scheduler running. At the time a scheduled job starts all cluster members execute the same method. It first checks if the current node is the leader. If it is it goes ahead and runs the task. Otherwise the method returns. The tasks themselves can take a few milliseconds up to tens of minutes. During the time the task is running a cluster member could lose its leadership. I don't want another cluster member to start running a scheduled leader-only task until the first one is finished. At first I considered using an ephemeral node as a flag to indicate "task in progress" and changing the logic for starting a scheduled task to be "if I am the leader AND no task is currently in progress". However, if the znode is ephemeral it could get lost the same way the leadership was lost. On the other hand if I use a non-ephemeral node I need to add logic to check for stale/invalid "task in progress" nodes (check for staleness plus try to contact the node that is running the task to see if it responds). Am I correct in assuming that I cannot use an ephemeral node for the "task in progress" flag? And that a non-ephemeral node with stale checking is the way to go? This seems like a pretty common use case. Thanks, -- Eric
