Re: Long running jobs and node drain

kishore g Thu, 21 May 2020 08:39:10 -0700

There is a way for the participant to invoke setRequestedState in Helix and
controller can trigger that if it does not violate the constraints


On Thu, May 21, 2020 at 4:53 AM santosh gujar <[email protected]>
wrote:

> Hello All,
>
> Any inputs on below.
>
> Thank You and appreciate your help.
>
> Regards,
> Santosh
>
> On Thu, May 14, 2020 at 2:47 PM santosh gujar <[email protected]>
> wrote:
>
>> Thanks a lot Lei,
>>
>> One last question on this topic,
>>
>> I gather from documentation that helix controller is the one that directs
>> state transitions in a greedy fashion. But this is a synchronous call, e.g.
>> in the example that we have been discussing,  the moment, the call returns
>> from UpToDrain(), the controller will call DrainToOffiline() immediately
>> and also update the states in Zookeeper accordingly. Is my understanding
>> correct?
>>
>> If yes, Is there anyway the transition can be asynchronous?  i.e.  i get
>> notified for up->drain transition, but drain->offline happens only when I
>> call some api on helix controller? e.g. in my case,  I would have to wait
>> via some kind of thread.wait() / sleep() until all other jobs are over. But
>> that could introduce some brittleness such that the process that is
>> handling the state transition cannot crash until all other jobs (which
>> could be running as separate processes) are finished. My preference would
>> be calling back an api to helix controller for further state transition
>> (drain->offline) for the partition.
>>
>> Thanks,
>> Santosh
>>
>>
>>
>> On Thu, May 14, 2020 at 1:28 AM Lei Xia <[email protected]> wrote:
>>
>>> Hi, Santosh
>>>
>>>   I meant the DRAIN-OFFLINE transition should be blocked. You can not
>>> block at up->drain, otherwise from Helix perspective the partition will be
>>> still in UP state, it won't bring new partition online.  The code logic
>>> could be something like below.
>>>
>>> class MyModel extends StateModel  {
>>> @Transition(from = "UP", to = "DRAIN")
>>>  public void UpToDrain(Message message, NotificationContext context) {
>>>   // you may disable some flags here to not take new jobs
>>>  }
>>>
>>> @Transition(from = "DRAIN", to = "OFFLINE")
>>>  public void DrainToOffline(Message message, NotificationContext
>>> context) {
>>>    wait (all job completed);
>>>   // additional cleanup works.
>>>  }
>>>
>>> @Transition(from = "OFFLINE", to = "UP")
>>>  public void OfflineToUP(Message message, NotificationContext context) {
>>>   // get ready to take new jobs.
>>>  }
>>>
>>> On Wed, May 13, 2020 at 11:24 AM santosh gujar <[email protected]>
>>> wrote:
>>>
>>>>
>>>> Thanks a lot Lei, I assume by blocking you mean , blocking on in a
>>>> method call that is called
>>>>
>>>> e.g. following pseudo code.
>>>>
>>>> class MyModel extends StateModel  {
>>>> @Transition(from = "UP", to = "DRAIN")
>>>> public void offlineToSlave(Message message, NotificationContext
>>>> context) {
>>>> //don't return until long long running job is running
>>>> }
>>>>
>>>> On Wed, May 13, 2020 at 10:40 PM Lei Xia <[email protected]> wrote:
>>>>
>>>>> Hi, Santosh
>>>>>
>>>>>   Thanks for explaining your case in detail. In this case, I would
>>>>> recommend you to use "OFFLINE->UP->DRAIN->OFFLINE" model. And you can set
>>>>> the constraint of your model to limit # of replica in UP state to be 1,
>>>>> i.e, Helix will make sure there is only 1 replica in UP at same time. When
>>>>> you are ready to drain an instance, disable the instance first, then Helix
>>>>> will transit all partitions (jobs) on that instance to DRAIN and then
>>>>> OFFLINE, you can block at DRAIN->OFFLINE transition until all jobs are
>>>>> completed.  On the other hand, once the old partition is in DRAIN state,
>>>>> Helix should bring up a new partition to UP (OFFLINE->UP) on a new node.
>>>>>
>>>>>
>>>>>
>>>>> Lei
>>>>>
>>>>> On Tue, May 12, 2020 at 10:58 AM santosh gujar <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Hunter,
>>>>>>
>>>>>> For various limitations and constraints at this moment, I cannot go
>>>>>> down the path of Task Framework.
>>>>>>
>>>>>> Thanks,
>>>>>> Santosh
>>>>>>
>>>>>> On Tue, May 12, 2020 at 7:23 PM Hunter Lee <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Alternative idea:
>>>>>>>
>>>>>>> Have you considered using Task Framework's targeted jobs for this
>>>>>>> use case? You could make the jobs long-running, and this way, you save
>>>>>>> yourself the trouble of having to implement the routing layer (simply
>>>>>>> specifying which partition to target in your JobConfig would do it).
>>>>>>>
>>>>>>> Task Framework doesn't actively terminate running threads on the
>>>>>>> worker (Participant) nodes, so you could achieve the effect of 
>>>>>>> "draining"
>>>>>>> the node by letting previously assigned tasks to finish by not actively
>>>>>>> canceling them in your cancel() logic.
>>>>>>>
>>>>>>> Hunter
>>>>>>>
>>>>>>> On Tue, May 12, 2020 at 1:02 AM santosh gujar <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Lei,
>>>>>>>>
>>>>>>>> Thanks a lot for your time and response.
>>>>>>>>
>>>>>>>> Some more context about helix partition that i mentioned in my
>>>>>>>> email earlier.
>>>>>>>> My thinking is to my map multiple long jobs to a helix partition by
>>>>>>>> running some hash function (simplest is taking a mod of an job)
>>>>>>>>
>>>>>>>> " what exactly you need to do to bring a job from OFFLINE to
>>>>>>>> STARTUP?"
>>>>>>>> I added STARTUP to distinguish the track the fact that a partition
>>>>>>>> could be hosted on two nodes simultaneously, I doubt 
>>>>>>>> offline->UP->OFFLINE
>>>>>>>> model can give me such information.
>>>>>>>>
>>>>>>>> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring
>>>>>>>> up the job in node-2 (OFFLINE->UP)"
>>>>>>>> I think it may not work in my case. Here is what I see the
>>>>>>>> implications.
>>>>>>>> 1. While node1 is in drain, old jobs continue to run, but i want
>>>>>>>> new jobs (for same partition) to be hosted by partition. Think of it 
>>>>>>>> as a
>>>>>>>> partition moves from one node to other but over a long time (hours) as
>>>>>>>> determined by when all existing jobs running on node1 finish.
>>>>>>>> 2. As per your suggestion,  node-2 serves the partition only when
>>>>>>>> node-1 is offline. But it cannot satisfy 1 above.
>>>>>>>> One workaround I can have is to handle up->offline transition event
>>>>>>>> in the application and save the information about the node1 somewhere, 
>>>>>>>> then
>>>>>>>> use this information later to distinguish old jobs and new jobs. But 
>>>>>>>> this
>>>>>>>> information is stored outside helix and i wanted to avoid it.  What
>>>>>>>> attracted me towards helix is it's auto re-balancing capability and 
>>>>>>>> it's a
>>>>>>>> central strorage for state of cluster which I can use for my routing 
>>>>>>>> logic.
>>>>>>>> 3. A job could be running for hours and thus drain can happen for a
>>>>>>>> long time.
>>>>>>>>
>>>>>>>>
>>>>>>>> "  How long you would expect OFFLINE->UP take here, if it is fast,
>>>>>>>> the switch should be fast. "
>>>>>>>> OFFLINE->UP is fast,  As I describe above, it's the drain on
>>>>>>>> earlier running node which is slow, the existing jobs cannot be
>>>>>>>> pre-empted to move to new node.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Santosh
>>>>>>>>
>>>>>>>> On Tue, May 12, 2020 at 10:40 AM Lei Xia <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi, Santosh
>>>>>>>>>
>>>>>>>>>   One question, what exactly you need to do to bring a job from
>>>>>>>>> OFFLINE to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From
>>>>>>>>> OFFLINE->UP you will get the job started and ready to serve request.  
>>>>>>>>> From
>>>>>>>>> UP->OFFLINE you will block there until job get drained.
>>>>>>>>>
>>>>>>>>>  With this state model, you can start to drain a node by disabling
>>>>>>>>> it. Once a node is disabled, Helix will send UP->OFFLINE transition 
>>>>>>>>> to all
>>>>>>>>> partitions on that node, in your implementation of UP->OFFLINE 
>>>>>>>>> transition,
>>>>>>>>> you block there until the job completes. Once the job (partition) on 
>>>>>>>>> node-1
>>>>>>>>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  
>>>>>>>>> Does
>>>>>>>>> this work for you?  How long you would expect OFFLINE->UP take here, 
>>>>>>>>> if it
>>>>>>>>> is fast, the switch should be fast.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Lei
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Yes, there would be a database.
>>>>>>>>>> So far i have following state model for partition.
>>>>>>>>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>>>>>>>>> following
>>>>>>>>>> 1. How to Trigger Drain (This is for example we decide to get
>>>>>>>>>> node out for maintenance)
>>>>>>>>>> 2. Once a drain has started, I expect helix rebalancer to kick in
>>>>>>>>>> and move the partition simultaneously on another node in start_up 
>>>>>>>>>> mode.
>>>>>>>>>> 3. Once All jobs  on node1 are done, need a manual way to trigger
>>>>>>>>>> it to offline and move the other partition to UP state.
>>>>>>>>>>
>>>>>>>>>> It might be possible that my thinking is entirely wrong and how
>>>>>>>>>> to fit it in helix model,  but essentially above is the sequence of 
>>>>>>>>>> i want
>>>>>>>>>> achieve.  Any pointers will be of great help. The constraint is that 
>>>>>>>>>> it's a
>>>>>>>>>> long running jobs that cannot be moved immediately to other node.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Santosh
>>>>>>>>>>
>>>>>>>>>> On Tue, May 12, 2020 at 1:25 AM kishore g <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I was thinking exactly in that direction - having two states is
>>>>>>>>>>> the right thing to do. Before we get there, one more question -
>>>>>>>>>>>
>>>>>>>>>>> - when you get a request for a job, how do you know if that job
>>>>>>>>>>> is old or new? Is there a database that provides the mapping 
>>>>>>>>>>> between job
>>>>>>>>>>> and node
>>>>>>>>>>>
>>>>>>>>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank You Kishore,
>>>>>>>>>>>>
>>>>>>>>>>>> During drain process N2 will start new jobs, the requests
>>>>>>>>>>>> related to old jobs need to go to N1 and requests for new jobs 
>>>>>>>>>>>> need to go
>>>>>>>>>>>> to N2. Thus during drain on N1, the partition could be present on 
>>>>>>>>>>>> both
>>>>>>>>>>>> nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> My current thinking is that in helix somehow i need to model is
>>>>>>>>>>>> as Partition P with two different states on these two nodes. . 
>>>>>>>>>>>> e.g. N1
>>>>>>>>>>>> could have partition P in Drain State and N2 can have partition P 
>>>>>>>>>>>> in
>>>>>>>>>>>> START_UP state.
>>>>>>>>>>>> I don't know if my thinking about states is correct, but
>>>>>>>>>>>> looking for any pointers.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>> Santosh
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> what  happens to request during the drain process i.e when you
>>>>>>>>>>>>> put N1 out of service and while N2 is waiting for N1 to finish 
>>>>>>>>>>>>> the jobs,
>>>>>>>>>>>>> where will the requests for P go to - N1 or N2
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am looking for some clues or inputs on how to achieve
>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am working on a service that involves running a
>>>>>>>>>>>>>> statetful long running jobs on a node. These long running jobs 
>>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>>> preempted and continue on other nodes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Problem Requirements :
>>>>>>>>>>>>>> 1. In helix nomenclature, I let's say an helix partition P
>>>>>>>>>>>>>> that involves J number of such jobs running on a node. (N1)
>>>>>>>>>>>>>> 2. When I put the node in a drain, I want helix to assign a
>>>>>>>>>>>>>> new node to this partition (P) is also started on the new node 
>>>>>>>>>>>>>> (N2).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3. N1 can be put out of service only when all running jobs
>>>>>>>>>>>>>> (J) on it are over, at this point only N2 will serve P request.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Questions :
>>>>>>>>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>>>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state
>>>>>>>>>>>>>> model?
>>>>>>>>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>>>>>>>>> documentation, I gather that Helix controller in full auto mode, 
>>>>>>>>>>>>>> triggers
>>>>>>>>>>>>>> state transitions only when number of partitions change or 
>>>>>>>>>>>>>> cluster changes
>>>>>>>>>>>>>> (node addition or deletion)
>>>>>>>>>>>>>> 3.I guess  spectator will be needed, to custom routing logic
>>>>>>>>>>>>>> in such cases, any pointers for the the same?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank You
>>>>>>>>>>>>>> Santosh
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Lei Xia
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>> --
>>>>> Lei Xia
>>>>>
>>>>
>>>
>>> --
>>> Lei Xia
>>>
>>

Re: Long running jobs and node drain

Reply via email to