Hi, Santosh One question, what exactly you need to do to bring a job from OFFLINE to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From OFFLINE->UP you will get the job started and ready to serve request. From UP->OFFLINE you will block there until job get drained.
With this state model, you can start to drain a node by disabling it. Once a node is disabled, Helix will send UP->OFFLINE transition to all partitions on that node, in your implementation of UP->OFFLINE transition, you block there until the job completes. Once the job (partition) on node-1 goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP). Does this work for you? How long you would expect OFFLINE->UP take here, if it is fast, the switch should be fast. Lei On Mon, May 11, 2020 at 9:02 PM santosh gujar <[email protected]> wrote: > Yes, there would be a database. > So far i have following state model for partition. > OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express > following > 1. How to Trigger Drain (This is for example we decide to get node out for > maintenance) > 2. Once a drain has started, I expect helix rebalancer to kick in and move > the partition simultaneously on another node in start_up mode. > 3. Once All jobs on node1 are done, need a manual way to trigger it to > offline and move the other partition to UP state. > > It might be possible that my thinking is entirely wrong and how to fit it > in helix model, but essentially above is the sequence of i want achieve. > Any pointers will be of great help. The constraint is that it's a long > running jobs that cannot be moved immediately to other node. > > Regards, > Santosh > > On Tue, May 12, 2020 at 1:25 AM kishore g <[email protected]> wrote: > >> I was thinking exactly in that direction - having two states is the right >> thing to do. Before we get there, one more question - >> >> - when you get a request for a job, how do you know if that job is old or >> new? Is there a database that provides the mapping between job and node >> >> On Mon, May 11, 2020 at 12:44 PM santosh gujar <[email protected]> >> wrote: >> >>> Thank You Kishore, >>> >>> During drain process N2 will start new jobs, the requests related to old >>> jobs need to go to N1 and requests for new jobs need to go to N2. Thus >>> during drain on N1, the partition could be present on both nodes. >>> >>> My current thinking is that in helix somehow i need to model is >>> as Partition P with two different states on these two nodes. . e.g. N1 >>> could have partition P in Drain State and N2 can have partition P in >>> START_UP state. >>> I don't know if my thinking about states is correct, but looking for any >>> pointers. >>> >>> Regards >>> Santosh >>> >>> On Tue, May 12, 2020 at 1:01 AM kishore g <[email protected]> wrote: >>> >>>> what happens to request during the drain process i.e when you put N1 >>>> out of service and while N2 is waiting for N1 to finish the jobs, where >>>> will the requests for P go to - N1 or N2 >>>> >>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar < >>>> [email protected]> wrote: >>>> >>>>> Hello, >>>>> >>>>> I am looking for some clues or inputs on how to achieve following >>>>> >>>>> I am working on a service that involves running a statetful long >>>>> running jobs on a node. These long running jobs cannot be preempted and >>>>> continue on other nodes. >>>>> >>>>> Problem Requirements : >>>>> 1. In helix nomenclature, I let's say an helix partition P that >>>>> involves J number of such jobs running on a node. (N1) >>>>> 2. When I put the node in a drain, I want helix to assign a new node >>>>> to this partition (P) is also started on the new node (N2). >>>>> >>>>> 3. N1 can be put out of service only when all running jobs (J) on it >>>>> are over, at this point only N2 will serve P request. >>>>> >>>>> Questions : >>>>> 1. Can drain process be modeled using helix? >>>>> 2. If yes, Is there any recipe / pointers for a helix state model? >>>>> 3. Is there any custom way to trigger state transitions? From >>>>> documentation, I gather that Helix controller in full auto mode, triggers >>>>> state transitions only when number of partitions change or cluster changes >>>>> (node addition or deletion) >>>>> 3.I guess spectator will be needed, to custom routing logic in such >>>>> cases, any pointers for the the same? >>>>> >>>>> Thank You >>>>> Santosh >>>>> >>>> -- Lei Xia
