Re: Non Hadoop scheduling frameworks
Todd Nine: [...] UC1: Synchronized Jobs 1. A job is fired across all nodes 2. The nodes wait until the barrier is entered by all participants 3. The nodes process the data and leave 4. On all nodes leaving the barrier, the Leader node marks the job as complete. UC2: Multiple Jobs per Node 1. A Job is scheduled for a future time on a specific node (usually the same node that's creating the trigger) 2. A Trigger can be overwritten and cancelled without the job firing 3. In the event of a node failure, the Leader will take all pending jobs from the failed node, and partition them across the remaining nodes. Hi Todd, we've implemented UC2 for an internal project with ZK. I'd love to make the code free, but I've to ask our product owner. It's a small company so this could go quickly. But I don't know how to convince them. They're so afraid of giving away stuff. The basic idea is, that we've two folders in ZK, a work queue and a lock folder. The items (znodes) in the work queue a timestamp prefixed. Every node consuming the queue tries to create an ephemeral znode in the lock folder before starting on a work item. Work items are actually URLs and we lock on the domain. Since we also use a lock pool on every worker that only releases on overflow or timeout, we can reuse locks and also get weak locality for URLs of the same domain. - That's all the magic. Six java classes on top of our own ZK helper lib. Best regards, Thomas Koch, http://www.koch.ro
Non Hadoop scheduling frameworks
Hi all, We're using Zookeeper for Leader Election and system monitoring. We're also using it for synchronizing our cluster wide jobs with barriers. We're running into an issue where we now have a single job, but each node can fire the job independently of others with different criteria in the job. In the event of a system failure, another node in our application cluster will need to fire this Job. I've used quartz previously (we're running Java 6), but it simply isn't designed for the use case we have. I found this article on cloudera. http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/ I've looked at both plugins, but they require hadoop. We're not currently running hadoop, we only have Cassandra. Here are the 2 basic use cases we need to support. UC1: Synchronized Jobs 1. A job is fired across all nodes 2. The nodes wait until the barrier is entered by all participants 3. The nodes process the data and leave 4. On all nodes leaving the barrier, the Leader node marks the job as complete. UC2: Multiple Jobs per Node 1. A Job is scheduled for a future time on a specific node (usually the same node that's creating the trigger) 2. A Trigger can be overwritten and cancelled without the job firing 3. In the event of a node failure, the Leader will take all pending jobs from the failed node, and partition them across the remaining nodes. Any input would be greatly appreciated. Thanks, Todd
Re: Non Hadoop scheduling frameworks
Hi Todd, Just to be clear, are you looking at solving UC1 and UC2 via zookeeper? Or is this a broader question for scheduling on cassandra nodes? For the latter this probably isnt the right mailing list. Thanks mahadev On 8/23/10 4:02 PM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, We're using Zookeeper for Leader Election and system monitoring. We're also using it for synchronizing our cluster wide jobs with barriers. We're running into an issue where we now have a single job, but each node can fire the job independently of others with different criteria in the job. In the event of a system failure, another node in our application cluster will need to fire this Job. I've used quartz previously (we're running Java 6), but it simply isn't designed for the use case we have. I found this article on cloudera. http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/ I've looked at both plugins, but they require hadoop. We're not currently running hadoop, we only have Cassandra. Here are the 2 basic use cases we need to support. UC1: Synchronized Jobs 1. A job is fired across all nodes 2. The nodes wait until the barrier is entered by all participants 3. The nodes process the data and leave 4. On all nodes leaving the barrier, the Leader node marks the job as complete. UC2: Multiple Jobs per Node 1. A Job is scheduled for a future time on a specific node (usually the same node that's creating the trigger) 2. A Trigger can be overwritten and cancelled without the job firing 3. In the event of a node failure, the Leader will take all pending jobs from the failed node, and partition them across the remaining nodes. Any input would be greatly appreciated. Thanks, Todd
Re: Non Hadoop scheduling frameworks
These are pretty easy to solve with ZK. Ephemerality, exclusive create, atomic update and file versions allow you to implement most of the semantics you need. I don't know of any recipes available for this, but they would be worthy additions to ZK. On Mon, Aug 23, 2010 at 11:33 PM, Todd Nine t...@spidertracks.co.nz wrote: Solving UC1 and UC2 via zookeeper or some other framework if one is recommended. We don't run Hadoop, just ZK and Cassandra as we don't have a need for map/reduce. I'm searching for any existing framework that can perform standard time based scheduling in a distributed environment. As I said earlier, Quartz is the closest model to what we're looking for, but it can't be used in a distributed parallel environment. Any suggestions for a system that could accomplish this would be helpful. Thanks, Todd On 24 August 2010 11:27, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Todd, Just to be clear, are you looking at solving UC1 and UC2 via zookeeper? Or is this a broader question for scheduling on cassandra nodes? For the latter this probably isnt the right mailing list. Thanks mahadev On 8/23/10 4:02 PM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, We're using Zookeeper for Leader Election and system monitoring. We're also using it for synchronizing our cluster wide jobs with barriers. We're running into an issue where we now have a single job, but each node can fire the job independently of others with different criteria in the job. In the event of a system failure, another node in our application cluster will need to fire this Job. I've used quartz previously (we're running Java 6), but it simply isn't designed for the use case we have. I found this article on cloudera. http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/ I've looked at both plugins, but they require hadoop. We're not currently running hadoop, we only have Cassandra. Here are the 2 basic use cases we need to support. UC1: Synchronized Jobs 1. A job is fired across all nodes 2. The nodes wait until the barrier is entered by all participants 3. The nodes process the data and leave 4. On all nodes leaving the barrier, the Leader node marks the job as complete. UC2: Multiple Jobs per Node 1. A Job is scheduled for a future time on a specific node (usually the same node that's creating the trigger) 2. A Trigger can be overwritten and cancelled without the job firing 3. In the event of a node failure, the Leader will take all pending jobs from the failed node, and partition them across the remaining nodes. Any input would be greatly appreciated. Thanks, Todd