Thanks for the feedback. I'm probably going to modify quartz to work with Zookeeper to start and launch jobs. Architecturally, I don't think persisting Jobs or trigger history in ZK is a very good idea, it's turning it into a persistent data store, which is not designed for. I was thinking I could change the core APIs in the following way.
Implement leader/follower election as a standalone module. Is this already done somewhere? I know there's a recipe but if the code is done that's less for me to do. Implement an abstract JobStore implementation (ZooKeeperJobStore) with the following properties Default Case 1. All calls that deal with returning triggers will use the follower/leader semantics. All nodes (including the leader) will be followers. They will only be returned jobs they should run for the call aquireNextTrigger 2. All calls to writing triggers will write triggers to the datastore and to a trigger queue in ZK 3. The leader will pick up triggers from the queue, and distribute them to the next available node via the ZK trigger queues per node. Each operation will attempt to be wisely partitioned. In the first implementation, it will simply schedule the job on a node that has the least executions near the time specified for the trigger. In the next release, I could use average job duration semantics to try to avoid scheduling overlapping jobs, especially in long running jobs. Failover 1. The leader will scan all current followers when a follower leaves, or after a new leader is designated. 2. For any node with jobs that is not currently a follower, it's triggers will be re-written to the trigger queue from above 3. The redistribution semantics will fire from above Does this sound reasonable? After performing more research I think job semantics such as partitioning and parallel processing are outside the scope of how the scheduler should work. Those semantics are more internal to the job itself, and I think they should remain outside of the scope of this project. todd SENIOR SOFTWARE ENGINEER todd nine| spidertracks ltd | On Tue, 2010-08-24 at 04:20 +0000, Ted Dunning wrote: > These are pretty easy to solve with ZK. Ephemerality, exclusive create, > atomic update and file versions allow you to implement most of the semantics > you need. > > I don't know of any recipes available for this, but they would be worthy > additions to ZK. > > On Mon, Aug 23, 2010 at 11:33 PM, Todd Nine <t...@spidertracks.co.nz> wrote: > > > Solving UC1 and UC2 via zookeeper or some other framework if one is > > recommended. We don't run Hadoop, just ZK and Cassandra as we don't have a > > need for map/reduce. I'm searching for any existing framework that can > > perform standard time based scheduling in a distributed environment. As I > > said earlier, Quartz is the closest model to what we're looking for, but it > > can't be used in a distributed parallel environment. Any suggestions for a > > system that could accomplish this would be helpful. > > > > Thanks, > > Todd > > > > On 24 August 2010 11:27, Mahadev Konar <maha...@yahoo-inc.com> wrote: > > > > > Hi Todd, > > > Just to be clear, are you looking at solving UC1 and UC2 via zookeeper? > > Or > > > is this a broader question for scheduling on cassandra nodes? For the > > latter > > > this probably isnt the right mailing list. > > > > > > Thanks > > > mahadev > > > > > > > > > On 8/23/10 4:02 PM, "Todd Nine" <t...@spidertracks.co.nz> wrote: > > > > > > Hi all, > > > We're using Zookeeper for Leader Election and system monitoring. We're > > > also using it for synchronizing our cluster wide jobs with barriers. > > > We're > > > running into an issue where we now have a single job, but each node can > > > fire > > > the job independently of others with different criteria in the job. In > > the > > > event of a system failure, another node in our application cluster will > > > need > > > to fire this Job. I've used quartz previously (we're running Java 6), > > but > > > it simply isn't designed for the use case we have. I found this article > > on > > > cloudera. > > > > > > http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/ > > > > > > > > > I've looked at both plugins, but they require hadoop. We're not > > currently > > > running hadoop, we only have Cassandra. Here are the 2 basic use cases > > we > > > need to support. > > > > > > UC1: Synchronized Jobs > > > 1. A job is fired across all nodes > > > 2. The nodes wait until the barrier is entered by all participants > > > 3. The nodes process the data and leave > > > 4. On all nodes leaving the barrier, the Leader node marks the job as > > > complete. > > > > > > > > > UC2: Multiple Jobs per Node > > > 1. A Job is scheduled for a future time on a specific node (usually the > > > same > > > node that's creating the trigger) > > > 2. A Trigger can be overwritten and cancelled without the job firing > > > 3. In the event of a node failure, the Leader will take all pending jobs > > > from the failed node, and partition them across the remaining nodes. > > > > > > > > > Any input would be greatly appreciated. > > > > > > Thanks, > > > Todd > > > > > > > >