RE: TaskRebalancer

Kanak Biscuitwala Sun, 19 Jan 2014 20:50:50 -0800
This sounds a lot like what we did in AutoRebalanceStrategy. There's an 
interface called ReplicaPlacementScheme that the algorithm calls into, and a 
DefaultPlacementScheme that just does evenly balanced assignment.
The simplest thing we could do is have a task rebalancer config and set a 
switch for which placement scheme to use. The current task rebalancer already 
has to specify things like the DAG, so this could just be another field to add 
on.
> Date: Sun, 19 Jan 2014 13:14:33 -0800
> Subject: Re: TaskRebalancer
> From: [email protected]
> To: [email protected]
> CC: [email protected]; [email protected]
> 
> Thanks Jason, I was looking at the rebalancer. Looks like target resource
> is mandatory. What do you suggest is the right way to make target resource
> optional.
> 
> This is my understanding of what task rebalancer is doing today.
> 
> It assumes that the system is already hosting a resource something like a
> database, index etc. Now one can use the task framework to launch arbitrary
> tasks on nodes hosting these resources. For example lets say there is a
> database MyDB with 3 partitions and 2 replicas and using Master Slave state
> model and 3 nodes N1 N2 N3. In a happy state the cluster might look like
> this
> 
> {
>   "id":"MyDB",
>   "mapFields":{
>     "MyDB_0":{
>       "N1":"MASTER",
>       "N2":"SLAVE"
>     },
>     "MyDB_1":{
>       "N2":"MASTER",
>       "N3":"SLAVE"
>     },
>     "MyDB_2":{
>       "N1":"SLAVE",
>       "N3":"MASTER"
>     }
>   }
> }
> 
> Lets say one wants to take backup of these databases but run only the
> SLAVEs. One can define the back up task and launch 3 back up tasks (one for
> each partition) only on SLAVEs.
> 
> What we have currently works perfectly for this scenario. One has to simply
> define the target resource and state for the backup tasks and they will be
> launched in appropriate place. So in this scenario, back task for
> partitions 0,1,2 will be launched at N2, N3, and N1.
> 
> But what if the tasks dont have any target resource and can be run on any
> node N1 N2 or N3 and the only requirement is distribute the tasks evenly.
> 
> We should decouple the logic of where a task is placed from the logic of
> distributing the tasks. For example, we can abstract out the placement
> constraint from the rebalancer logic. So we can have a placement provider
> that computes placement randomly and one that computes placement based on
> another resource. Probably another one that computes placement based on
> data locality.
> 
> What is the right way to approach this ?
> 
> thanks,
> Kishore G
> 
> 
> On Sun, Jan 19, 2014 at 10:12 AM, Zhen Zhang <[email protected]> wrote:
> 
> > TestTaskRebalancer and TestTaskRebalancerStopResume are examples.
> >
> > Thanks,
> > Jason
> >
> >
> > On Sun, Jan 19, 2014 at 9:20 AM, kishore g <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I am trying to use TaskRebalancer but not able to understand how it
> > works,
> > > is there any example I can try?
> > >
> > > thanks,
> > > Kishore G
> > >
> >
RE: TaskRebalancer

Reply via email to