Re: Delayed Workflow and Job Scheduling Design

2017-01-04 Thread Xue Junkai
For the questions:

>>- From user perspective, user have to understand the difference
>>between delay time and start time.
Yes. This will be documented in Task Framework User Guide to explain
difference between StartTime and DelayTime

>>- The WorkflowRebalancer will be called multiple times, which might
>>be considered for performance.

The only time complexity we added in is to going through all the jobs
(O(n)) to find the next schedule time. Although the WorkflowRebalancer will
be called multiple times it wont affect that much.

Best,

Junkai


On Wed, Jan 4, 2017 at 9:02 AM, kishore g  wrote:

> will review it today
>
> On Tue, Jan 3, 2017 at 12:24 PM, Xue Junkai  wrote:
>
> > Hi All,
> >
> > Here's the pull request of this design: https://github.com/
> > apache/helix/pull/64
> > Could anyone help me review it?
> >
> > Best,
> >
> > Junkai
> >
> > On Thu, Dec 8, 2016 at 6:09 PM, Xue Junkai  wrote:
> >
> >> Hi All,
> >>
> >> I have a short design for the Delayed Workflow and Job Scheduling. Since
> >> I cannot access wiki, I attached with this email. Any feedbacks and
> >> comments are highly appreciated!
> >>
> >> Best,
> >>
> >> Junkai
> >> Overview
> >>
> >> Currently, Workflows and Jobs running by Helix requires more
> flexibility.
> >> For example, some of the jobs need to be started after some jobs
> finished
> >> for a certain mount of time. Same as Workflow, it may run at specific
> time,
> >> when some operations have been done.  To better support Workflow and Job
> >> scheduling, Helix should provide a new feature to let user setup the
> delay
> >> time or starting for specific Workflows and Jobs. Workflows and Jobs
> should
> >> have an option that allow user set starting time of this Workflow or
> Job or
> >> set the delaying time for this Workflow and Job, when they are ready to
> >> start. Then Workflows and Jobs can be scheduled at correct time.
> >> Purposed Design
> >>
> >> The whole design has been split into two parts, generic rebalancer
> >> scheduling and delay time calculation. Since Job scheduling can be done
> via
> >> rerun WorkflowRebalancer, Workflow and Job delay scheduling can rely on
> the
> >> same generic scheduling mechanism. Generic task scheduling tasks the
> >> responsibiliy to set the running time for specific Workflow object. Then
> >> each object has its own starting time calculation algorithm.
> >>
> >> Generic Task Scheduling
> >>
> >> For generic task scheduling, it is better to have a centralized
> >> scheduler, RebalanceScheduler. It provides four public APIs:
> >> public class RebalanceScheduler {
> >> public void scheduleRebalance(HelixManager manager, String resource,
> >> long startTime);
> >>
> >> public long getRebalanceTime(String resource);
> >>
> >> public long removeScheduledRebalance(String resource);
> >>
> >> public static void invokeRebalance(HelixDataAccessor accessor,
> >> String resource);
> >> }
> >>
> >>
> >>
> >> Obviously, it offers schedule a rebalancer, get schedule time of a
> >> rebalancer and remove a rebalancer schedule. It also have an API that
> can
> >> invoke rebalancer immediately. With this RebalancerScheduler, each
> resource
> >> can be scheduled at certain start time.
> >> Delay Time Calculation
> >>
> >> Workflows have a property expiryTime, which is the delay time that for
> >> the Workflow. User can set it by call setExpiry method in
> WorkflowConfig.
> >> For Job, two methods, in JobConfig, will be provided: setExecutionStart
> and
> >> setExecutionDelay. Through these API, user can set the delay time and
> start
> >> time for Workflows and Jobs. Internally, Helix will take the delay time
> and
> >> start time, which is later.
> >>
> >> For the logic implemented in computing Workflows and Jobs, Helix choose
> >> to do real time computation. User can set delay time or start time at
> >> JobConfig. When the job is ready to run, Helix will calculate the "start
> >> time" for delay via current time plus the delay time. Then compare it
> with
> >> start time if user set it up in JobConfig.
> >>
> >> [image: Inline image 1]
> >> Impact
> >>
> >>- From user perspective, user have to understand the difference
> >>between delay time and start time.
> >>- The WorkflowRebalancer will be called multiple times, which might
> >>be considered for performance.
> >>
> >>
> >
> >
> > --
> > Junkai Xue
> >
>



-- 
Junkai Xue


Re: Delayed Workflow and Job Scheduling Design

2017-01-04 Thread Xue Junkai
Thanks for the help!

On Wed, Jan 4, 2017 at 9:02 AM, kishore g  wrote:

> will review it today
>
> On Tue, Jan 3, 2017 at 12:24 PM, Xue Junkai  wrote:
>
> > Hi All,
> >
> > Here's the pull request of this design: https://github.com/
> > apache/helix/pull/64
> > Could anyone help me review it?
> >
> > Best,
> >
> > Junkai
> >
> > On Thu, Dec 8, 2016 at 6:09 PM, Xue Junkai  wrote:
> >
> >> Hi All,
> >>
> >> I have a short design for the Delayed Workflow and Job Scheduling. Since
> >> I cannot access wiki, I attached with this email. Any feedbacks and
> >> comments are highly appreciated!
> >>
> >> Best,
> >>
> >> Junkai
> >> Overview
> >>
> >> Currently, Workflows and Jobs running by Helix requires more
> flexibility.
> >> For example, some of the jobs need to be started after some jobs
> finished
> >> for a certain mount of time. Same as Workflow, it may run at specific
> time,
> >> when some operations have been done.  To better support Workflow and Job
> >> scheduling, Helix should provide a new feature to let user setup the
> delay
> >> time or starting for specific Workflows and Jobs. Workflows and Jobs
> should
> >> have an option that allow user set starting time of this Workflow or
> Job or
> >> set the delaying time for this Workflow and Job, when they are ready to
> >> start. Then Workflows and Jobs can be scheduled at correct time.
> >> Purposed Design
> >>
> >> The whole design has been split into two parts, generic rebalancer
> >> scheduling and delay time calculation. Since Job scheduling can be done
> via
> >> rerun WorkflowRebalancer, Workflow and Job delay scheduling can rely on
> the
> >> same generic scheduling mechanism. Generic task scheduling tasks the
> >> responsibiliy to set the running time for specific Workflow object. Then
> >> each object has its own starting time calculation algorithm.
> >>
> >> Generic Task Scheduling
> >>
> >> For generic task scheduling, it is better to have a centralized
> >> scheduler, RebalanceScheduler. It provides four public APIs:
> >> public class RebalanceScheduler {
> >> public void scheduleRebalance(HelixManager manager, String resource,
> >> long startTime);
> >>
> >> public long getRebalanceTime(String resource);
> >>
> >> public long removeScheduledRebalance(String resource);
> >>
> >> public static void invokeRebalance(HelixDataAccessor accessor,
> >> String resource);
> >> }
> >>
> >>
> >>
> >> Obviously, it offers schedule a rebalancer, get schedule time of a
> >> rebalancer and remove a rebalancer schedule. It also have an API that
> can
> >> invoke rebalancer immediately. With this RebalancerScheduler, each
> resource
> >> can be scheduled at certain start time.
> >> Delay Time Calculation
> >>
> >> Workflows have a property expiryTime, which is the delay time that for
> >> the Workflow. User can set it by call setExpiry method in
> WorkflowConfig.
> >> For Job, two methods, in JobConfig, will be provided: setExecutionStart
> and
> >> setExecutionDelay. Through these API, user can set the delay time and
> start
> >> time for Workflows and Jobs. Internally, Helix will take the delay time
> and
> >> start time, which is later.
> >>
> >> For the logic implemented in computing Workflows and Jobs, Helix choose
> >> to do real time computation. User can set delay time or start time at
> >> JobConfig. When the job is ready to run, Helix will calculate the "start
> >> time" for delay via current time plus the delay time. Then compare it
> with
> >> start time if user set it up in JobConfig.
> >>
> >> [image: Inline image 1]
> >> Impact
> >>
> >>- From user perspective, user have to understand the difference
> >>between delay time and start time.
> >>- The WorkflowRebalancer will be called multiple times, which might
> >>be considered for performance.
> >>
> >>
> >
> >
> > --
> > Junkai Xue
> >
>



-- 
Junkai Xue


Re: Delayed Workflow and Job Scheduling Design

2017-01-04 Thread kishore g
will review it today

On Tue, Jan 3, 2017 at 12:24 PM, Xue Junkai  wrote:

> Hi All,
>
> Here's the pull request of this design: https://github.com/
> apache/helix/pull/64
> Could anyone help me review it?
>
> Best,
>
> Junkai
>
> On Thu, Dec 8, 2016 at 6:09 PM, Xue Junkai  wrote:
>
>> Hi All,
>>
>> I have a short design for the Delayed Workflow and Job Scheduling. Since
>> I cannot access wiki, I attached with this email. Any feedbacks and
>> comments are highly appreciated!
>>
>> Best,
>>
>> Junkai
>> Overview
>>
>> Currently, Workflows and Jobs running by Helix requires more flexibility.
>> For example, some of the jobs need to be started after some jobs finished
>> for a certain mount of time. Same as Workflow, it may run at specific time,
>> when some operations have been done.  To better support Workflow and Job
>> scheduling, Helix should provide a new feature to let user setup the delay
>> time or starting for specific Workflows and Jobs. Workflows and Jobs should
>> have an option that allow user set starting time of this Workflow or Job or
>> set the delaying time for this Workflow and Job, when they are ready to
>> start. Then Workflows and Jobs can be scheduled at correct time.
>> Purposed Design
>>
>> The whole design has been split into two parts, generic rebalancer
>> scheduling and delay time calculation. Since Job scheduling can be done via
>> rerun WorkflowRebalancer, Workflow and Job delay scheduling can rely on the
>> same generic scheduling mechanism. Generic task scheduling tasks the
>> responsibiliy to set the running time for specific Workflow object. Then
>> each object has its own starting time calculation algorithm.
>>
>> Generic Task Scheduling
>>
>> For generic task scheduling, it is better to have a centralized
>> scheduler, RebalanceScheduler. It provides four public APIs:
>> public class RebalanceScheduler {
>> public void scheduleRebalance(HelixManager manager, String resource,
>> long startTime);
>>
>> public long getRebalanceTime(String resource);
>>
>> public long removeScheduledRebalance(String resource);
>>
>> public static void invokeRebalance(HelixDataAccessor accessor,
>> String resource);
>> }
>>
>>
>>
>> Obviously, it offers schedule a rebalancer, get schedule time of a
>> rebalancer and remove a rebalancer schedule. It also have an API that can
>> invoke rebalancer immediately. With this RebalancerScheduler, each resource
>> can be scheduled at certain start time.
>> Delay Time Calculation
>>
>> Workflows have a property expiryTime, which is the delay time that for
>> the Workflow. User can set it by call setExpiry method in WorkflowConfig.
>> For Job, two methods, in JobConfig, will be provided: setExecutionStart and
>> setExecutionDelay. Through these API, user can set the delay time and start
>> time for Workflows and Jobs. Internally, Helix will take the delay time and
>> start time, which is later.
>>
>> For the logic implemented in computing Workflows and Jobs, Helix choose
>> to do real time computation. User can set delay time or start time at
>> JobConfig. When the job is ready to run, Helix will calculate the "start
>> time" for delay via current time plus the delay time. Then compare it with
>> start time if user set it up in JobConfig.
>>
>> [image: Inline image 1]
>> Impact
>>
>>- From user perspective, user have to understand the difference
>>between delay time and start time.
>>- The WorkflowRebalancer will be called multiple times, which might
>>be considered for performance.
>>
>>
>
>
> --
> Junkai Xue
>


Re: Delayed Workflow and Job Scheduling Design

2017-01-03 Thread Xue Junkai
Hi All,

Here's the pull request of this design:
https://github.com/apache/helix/pull/64
Could anyone help me review it?

Best,

Junkai

On Thu, Dec 8, 2016 at 6:09 PM, Xue Junkai  wrote:

> Hi All,
>
> I have a short design for the Delayed Workflow and Job Scheduling. Since I
> cannot access wiki, I attached with this email. Any feedbacks and comments
> are highly appreciated!
>
> Best,
>
> Junkai
> Overview
>
> Currently, Workflows and Jobs running by Helix requires more flexibility.
> For example, some of the jobs need to be started after some jobs finished
> for a certain mount of time. Same as Workflow, it may run at specific time,
> when some operations have been done.  To better support Workflow and Job
> scheduling, Helix should provide a new feature to let user setup the delay
> time or starting for specific Workflows and Jobs. Workflows and Jobs should
> have an option that allow user set starting time of this Workflow or Job or
> set the delaying time for this Workflow and Job, when they are ready to
> start. Then Workflows and Jobs can be scheduled at correct time.
> Purposed Design
>
> The whole design has been split into two parts, generic rebalancer
> scheduling and delay time calculation. Since Job scheduling can be done via
> rerun WorkflowRebalancer, Workflow and Job delay scheduling can rely on the
> same generic scheduling mechanism. Generic task scheduling tasks the
> responsibiliy to set the running time for specific Workflow object. Then
> each object has its own starting time calculation algorithm.
>
> Generic Task Scheduling
>
> For generic task scheduling, it is better to have a centralized scheduler,
> RebalanceScheduler. It provides four public APIs:
> public class RebalanceScheduler {
> public void scheduleRebalance(HelixManager manager, String resource,
> long startTime);
>
> public long getRebalanceTime(String resource);
>
> public long removeScheduledRebalance(String resource);
>
> public static void invokeRebalance(HelixDataAccessor accessor, String
> resource);
> }
>
>
>
> Obviously, it offers schedule a rebalancer, get schedule time of a
> rebalancer and remove a rebalancer schedule. It also have an API that can
> invoke rebalancer immediately. With this RebalancerScheduler, each resource
> can be scheduled at certain start time.
> Delay Time Calculation
>
> Workflows have a property expiryTime, which is the delay time that for the
> Workflow. User can set it by call setExpiry method in WorkflowConfig. For
> Job, two methods, in JobConfig, will be provided: setExecutionStart and
> setExecutionDelay. Through these API, user can set the delay time and start
> time for Workflows and Jobs. Internally, Helix will take the delay time and
> start time, which is later.
>
> For the logic implemented in computing Workflows and Jobs, Helix choose to
> do real time computation. User can set delay time or start time at
> JobConfig. When the job is ready to run, Helix will calculate the "start
> time" for delay via current time plus the delay time. Then compare it with
> start time if user set it up in JobConfig.
>
> [image: Inline image 1]
> Impact
>
>- From user perspective, user have to understand the difference
>between delay time and start time.
>- The WorkflowRebalancer will be called multiple times, which might be
>considered for performance.
>
>


-- 
Junkai Xue


Delayed Workflow and Job Scheduling Design

2016-12-08 Thread Xue Junkai
Hi All,

I have a short design for the Delayed Workflow and Job Scheduling. Since I
cannot access wiki, I attached with this email. Any feedbacks and comments
are highly appreciated!

Best,

Junkai
Overview

Currently, Workflows and Jobs running by Helix requires more flexibility.
For example, some of the jobs need to be started after some jobs finished
for a certain mount of time. Same as Workflow, it may run at specific time,
when some operations have been done.  To better support Workflow and Job
scheduling, Helix should provide a new feature to let user setup the delay
time or starting for specific Workflows and Jobs. Workflows and Jobs should
have an option that allow user set starting time of this Workflow or Job or
set the delaying time for this Workflow and Job, when they are ready to
start. Then Workflows and Jobs can be scheduled at correct time.
Purposed Design

The whole design has been split into two parts, generic rebalancer
scheduling and delay time calculation. Since Job scheduling can be done via
rerun WorkflowRebalancer, Workflow and Job delay scheduling can rely on the
same generic scheduling mechanism. Generic task scheduling tasks the
responsibiliy to set the running time for specific Workflow object. Then
each object has its own starting time calculation algorithm.

Generic Task Scheduling

For generic task scheduling, it is better to have a centralized scheduler,
RebalanceScheduler. It provides four public APIs:
public class RebalanceScheduler {
public void scheduleRebalance(HelixManager manager, String resource,
long startTime);

public long getRebalanceTime(String resource);

public long removeScheduledRebalance(String resource);

public static void invokeRebalance(HelixDataAccessor accessor, String
resource);
}



Obviously, it offers schedule a rebalancer, get schedule time of a
rebalancer and remove a rebalancer schedule. It also have an API that can
invoke rebalancer immediately. With this RebalancerScheduler, each resource
can be scheduled at certain start time.
Delay Time Calculation

Workflows have a property expiryTime, which is the delay time that for the
Workflow. User can set it by call setExpiry method in WorkflowConfig. For
Job, two methods, in JobConfig, will be provided: setExecutionStart and
setExecutionDelay. Through these API, user can set the delay time and start
time for Workflows and Jobs. Internally, Helix will take the delay time and
start time, which is later.

For the logic implemented in computing Workflows and Jobs, Helix choose to
do real time computation. User can set delay time or start time at
JobConfig. When the job is ready to run, Helix will calculate the "start
time" for delay via current time plus the delay time. Then compare it with
start time if user set it up in JobConfig.

[image: Inline image 1]
Impact

   - From user perspective, user have to understand the difference between
   delay time and start time.
   - The WorkflowRebalancer will be called multiple times, which might be
   considered for performance.