This sounds a lot like what we did in AutoRebalanceStrategy. There's an
interface called ReplicaPlacementScheme that the algorithm calls into, and a
DefaultPlacementScheme that just does evenly balanced assignment.
The simplest thing we could do is have a task rebalancer config and set a
switch for which placement scheme to use. The current task rebalancer already
has to specify things like the DAG, so this could just be another field to add
on.
> Date: Sun, 19 Jan 2014 13:14:33 -0800
> Subject: Re: TaskRebalancer
> From: [email protected]
> To: [email protected]
> CC: [email protected]; [email protected]
>
> Thanks Jason, I was looking at the rebalancer. Looks like target resource
> is mandatory. What do you suggest is the right way to make target resource
> optional.
>
> This is my understanding of what task rebalancer is doing today.
>
> It assumes that the system is already hosting a resource something like a
> database, index etc. Now one can use the task framework to launch arbitrary
> tasks on nodes hosting these resources. For example lets say there is a
> database MyDB with 3 partitions and 2 replicas and using Master Slave state
> model and 3 nodes N1 N2 N3. In a happy state the cluster might look like
> this
>
> {
> "id":"MyDB",
> "mapFields":{
> "MyDB_0":{
> "N1":"MASTER",
> "N2":"SLAVE"
> },
> "MyDB_1":{
> "N2":"MASTER",
> "N3":"SLAVE"
> },
> "MyDB_2":{
> "N1":"SLAVE",
> "N3":"MASTER"
> }
> }
> }
>
> Lets say one wants to take backup of these databases but run only the
> SLAVEs. One can define the back up task and launch 3 back up tasks (one for
> each partition) only on SLAVEs.
>
> What we have currently works perfectly for this scenario. One has to simply
> define the target resource and state for the backup tasks and they will be
> launched in appropriate place. So in this scenario, back task for
> partitions 0,1,2 will be launched at N2, N3, and N1.
>
> But what if the tasks dont have any target resource and can be run on any
> node N1 N2 or N3 and the only requirement is distribute the tasks evenly.
>
> We should decouple the logic of where a task is placed from the logic of
> distributing the tasks. For example, we can abstract out the placement
> constraint from the rebalancer logic. So we can have a placement provider
> that computes placement randomly and one that computes placement based on
> another resource. Probably another one that computes placement based on
> data locality.
>
> What is the right way to approach this ?
>
> thanks,
> Kishore G
>
>
> On Sun, Jan 19, 2014 at 10:12 AM, Zhen Zhang <[email protected]> wrote:
>
> > TestTaskRebalancer and TestTaskRebalancerStopResume are examples.
> >
> > Thanks,
> > Jason
> >
> >
> > On Sun, Jan 19, 2014 at 9:20 AM, kishore g <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I am trying to use TaskRebalancer but not able to understand how it
> > works,
> > > is there any example I can try?
> > >
> > > thanks,
> > > Kishore G
> > >
> >