Re: Non Hadoop scheduling frameworks

2010-08-24 Thread Thomas Koch
Todd Nine:
 [...]
 UC1: Synchronized Jobs
 1. A job is fired across all nodes
 2. The nodes wait until the barrier is entered by all participants
 3. The nodes process the data and leave
 4. On all nodes leaving the barrier, the Leader node marks the job as
 complete.
 
 UC2: Multiple Jobs per Node
 1. A Job is scheduled for a future time on a specific node (usually the
 same node that's creating the trigger)
 2. A Trigger can be overwritten and cancelled without the job firing
 3. In the event of a node failure, the Leader will take all pending jobs
 from the failed node, and partition them across the remaining nodes.

Hi Todd,

we've implemented UC2 for an internal project with ZK. I'd love to make the 
code free, but I've to ask our product owner. It's a small company so this 
could go quickly. But I don't know how to convince them. They're so afraid of 
giving away stuff.
The basic idea is, that we've two folders in ZK, a work queue and a lock 
folder. The items (znodes) in the work queue a timestamp prefixed. Every node 
consuming the queue tries to create an ephemeral znode in the lock folder 
before starting on a work item. Work items are actually URLs and we lock on 
the domain. Since we also use a lock pool on every worker that only releases 
on overflow or timeout, we can reuse locks and also get weak locality for 
URLs of the same domain. - That's all the magic. Six java classes on top of 
our own ZK helper lib.

Best regards,

Thomas Koch, http://www.koch.ro


Non Hadoop scheduling frameworks

2010-08-23 Thread Todd Nine
Hi all,
  We're using Zookeeper for Leader Election and system monitoring.  We're
also using it for synchronizing our cluster wide jobs with  barriers.  We're
running into an issue where we now have a single job, but each node can fire
the job independently of others with different criteria in the job.  In the
event of a system failure, another node in our application cluster will need
to fire this Job.  I've used quartz previously (we're running Java 6), but
it simply isn't designed for the use case we have.  I found this article on
cloudera.

http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/


I've looked at both plugins, but they require hadoop.  We're not currently
running hadoop, we only have Cassandra.  Here are the 2 basic use cases we
need to support.

UC1: Synchronized Jobs
1. A job is fired across all nodes
2. The nodes wait until the barrier is entered by all participants
3. The nodes process the data and leave
4. On all nodes leaving the barrier, the Leader node marks the job as
complete.


UC2: Multiple Jobs per Node
1. A Job is scheduled for a future time on a specific node (usually the same
node that's creating the trigger)
2. A Trigger can be overwritten and cancelled without the job firing
3. In the event of a node failure, the Leader will take all pending jobs
from the failed node, and partition them across the remaining nodes.


Any input would be greatly appreciated.

Thanks,
Todd


Re: Non Hadoop scheduling frameworks

2010-08-23 Thread Mahadev Konar
Hi Todd,
  Just to be clear, are you looking at solving UC1 and UC2 via zookeeper? Or is 
this a broader question for scheduling on cassandra nodes? For the latter this 
probably isnt the right mailing list.

Thanks
mahadev


On 8/23/10 4:02 PM, Todd Nine t...@spidertracks.co.nz wrote:

Hi all,
  We're using Zookeeper for Leader Election and system monitoring.  We're
also using it for synchronizing our cluster wide jobs with  barriers.  We're
running into an issue where we now have a single job, but each node can fire
the job independently of others with different criteria in the job.  In the
event of a system failure, another node in our application cluster will need
to fire this Job.  I've used quartz previously (we're running Java 6), but
it simply isn't designed for the use case we have.  I found this article on
cloudera.

http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/


I've looked at both plugins, but they require hadoop.  We're not currently
running hadoop, we only have Cassandra.  Here are the 2 basic use cases we
need to support.

UC1: Synchronized Jobs
1. A job is fired across all nodes
2. The nodes wait until the barrier is entered by all participants
3. The nodes process the data and leave
4. On all nodes leaving the barrier, the Leader node marks the job as
complete.


UC2: Multiple Jobs per Node
1. A Job is scheduled for a future time on a specific node (usually the same
node that's creating the trigger)
2. A Trigger can be overwritten and cancelled without the job firing
3. In the event of a node failure, the Leader will take all pending jobs
from the failed node, and partition them across the remaining nodes.


Any input would be greatly appreciated.

Thanks,
Todd



Re: Non Hadoop scheduling frameworks

2010-08-23 Thread Ted Dunning
These are pretty easy to solve with ZK.  Ephemerality, exclusive create,
atomic update and file versions allow you to implement most of the semantics
you need.

I don't know of any recipes available for this, but they would be worthy
additions to ZK.

On Mon, Aug 23, 2010 at 11:33 PM, Todd Nine t...@spidertracks.co.nz wrote:

 Solving UC1 and UC2 via zookeeper or some other framework if one is
 recommended.  We don't run Hadoop, just ZK and Cassandra as we don't have a
 need for map/reduce.  I'm searching for any existing framework that can
 perform standard time based scheduling in a distributed environment.  As I
 said earlier, Quartz is the closest model to what we're looking for, but it
 can't be used in a distributed parallel environment.  Any suggestions for a
 system that could accomplish this would be helpful.

 Thanks,
 Todd

 On 24 August 2010 11:27, Mahadev Konar maha...@yahoo-inc.com wrote:

  Hi Todd,
   Just to be clear, are you looking at solving UC1 and UC2 via zookeeper?
 Or
  is this a broader question for scheduling on cassandra nodes? For the
 latter
  this probably isnt the right mailing list.
 
  Thanks
  mahadev
 
 
  On 8/23/10 4:02 PM, Todd Nine t...@spidertracks.co.nz wrote:
 
  Hi all,
   We're using Zookeeper for Leader Election and system monitoring.  We're
  also using it for synchronizing our cluster wide jobs with  barriers.
   We're
  running into an issue where we now have a single job, but each node can
  fire
  the job independently of others with different criteria in the job.  In
 the
  event of a system failure, another node in our application cluster will
  need
  to fire this Job.  I've used quartz previously (we're running Java 6),
 but
  it simply isn't designed for the use case we have.  I found this article
 on
  cloudera.
 
  http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/
 
 
  I've looked at both plugins, but they require hadoop.  We're not
 currently
  running hadoop, we only have Cassandra.  Here are the 2 basic use cases
 we
  need to support.
 
  UC1: Synchronized Jobs
  1. A job is fired across all nodes
  2. The nodes wait until the barrier is entered by all participants
  3. The nodes process the data and leave
  4. On all nodes leaving the barrier, the Leader node marks the job as
  complete.
 
 
  UC2: Multiple Jobs per Node
  1. A Job is scheduled for a future time on a specific node (usually the
  same
  node that's creating the trigger)
  2. A Trigger can be overwritten and cancelled without the job firing
  3. In the event of a node failure, the Leader will take all pending jobs
  from the failed node, and partition them across the remaining nodes.
 
 
  Any input would be greatly appreciated.
 
  Thanks,
  Todd