Re: Long running jobs and node drain

santosh gujar Thu, 14 May 2020 02:18:25 -0700

Thanks a lot Lei,

One last question on this topic,


I gather from documentation that helix controller is the one that directs
state transitions in a greedy fashion. But this is a synchronous call, e.g.
in the example that we have been discussing,  the moment, the call returns
from UpToDrain(), the controller will call DrainToOffiline() immediately
and also update the states in Zookeeper accordingly. Is my understanding
correct?

If yes, Is there anyway the transition can be asynchronous?  i.e.  i get
notified for up->drain transition, but drain->offline happens only when I
call some api on helix controller? e.g. in my case,  I would have to wait
via some kind of thread.wait() / sleep() until all other jobs are over. But
that could introduce some brittleness such that the process that is
handling the state transition cannot crash until all other jobs (which
could be running as separate processes) are finished. My preference would
be calling back an api to helix controller for further state transition
(drain->offline) for the partition.

Thanks,
Santosh



On Thu, May 14, 2020 at 1:28 AM Lei Xia <[email protected]> wrote:

> Hi, Santosh
>
>   I meant the DRAIN-OFFLINE transition should be blocked. You can not
> block at up->drain, otherwise from Helix perspective the partition will be
> still in UP state, it won't bring new partition online.  The code logic
> could be something like below.
>
> class MyModel extends StateModel  {
> @Transition(from = "UP", to = "DRAIN")
>  public void UpToDrain(Message message, NotificationContext context) {
>   // you may disable some flags here to not take new jobs
>  }
>
> @Transition(from = "DRAIN", to = "OFFLINE")
>  public void DrainToOffline(Message message, NotificationContext context) {
>    wait (all job completed);
>   // additional cleanup works.
>  }
>
> @Transition(from = "OFFLINE", to = "UP")
>  public void OfflineToUP(Message message, NotificationContext context) {
>   // get ready to take new jobs.
>  }
>
> On Wed, May 13, 2020 at 11:24 AM santosh gujar <[email protected]>
> wrote:
>
>>
>> Thanks a lot Lei, I assume by blocking you mean , blocking on in a method
>> call that is called
>>
>> e.g. following pseudo code.
>>
>> class MyModel extends StateModel  {
>> @Transition(from = "UP", to = "DRAIN")
>> public void offlineToSlave(Message message, NotificationContext context) {
>> //don't return until long long running job is running
>> }
>>
>> On Wed, May 13, 2020 at 10:40 PM Lei Xia <[email protected]> wrote:
>>
>>> Hi, Santosh
>>>
>>>   Thanks for explaining your case in detail. In this case, I would
>>> recommend you to use "OFFLINE->UP->DRAIN->OFFLINE" model. And you can set
>>> the constraint of your model to limit # of replica in UP state to be 1,
>>> i.e, Helix will make sure there is only 1 replica in UP at same time. When
>>> you are ready to drain an instance, disable the instance first, then Helix
>>> will transit all partitions (jobs) on that instance to DRAIN and then
>>> OFFLINE, you can block at DRAIN->OFFLINE transition until all jobs are
>>> completed.  On the other hand, once the old partition is in DRAIN state,
>>> Helix should bring up a new partition to UP (OFFLINE->UP) on a new node.
>>>
>>>
>>>
>>> Lei
>>>
>>> On Tue, May 12, 2020 at 10:58 AM santosh gujar <[email protected]>
>>> wrote:
>>>
>>>> Hi Hunter,
>>>>
>>>> For various limitations and constraints at this moment, I cannot go
>>>> down the path of Task Framework.
>>>>
>>>> Thanks,
>>>> Santosh
>>>>
>>>> On Tue, May 12, 2020 at 7:23 PM Hunter Lee <[email protected]> wrote:
>>>>
>>>>> Alternative idea:
>>>>>
>>>>> Have you considered using Task Framework's targeted jobs for this use
>>>>> case? You could make the jobs long-running, and this way, you save 
>>>>> yourself
>>>>> the trouble of having to implement the routing layer (simply specifying
>>>>> which partition to target in your JobConfig would do it).
>>>>>
>>>>> Task Framework doesn't actively terminate running threads on the
>>>>> worker (Participant) nodes, so you could achieve the effect of "draining"
>>>>> the node by letting previously assigned tasks to finish by not actively
>>>>> canceling them in your cancel() logic.
>>>>>
>>>>> Hunter
>>>>>
>>>>> On Tue, May 12, 2020 at 1:02 AM santosh gujar <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Lei,
>>>>>>
>>>>>> Thanks a lot for your time and response.
>>>>>>
>>>>>> Some more context about helix partition that i mentioned in my email
>>>>>> earlier.
>>>>>> My thinking is to my map multiple long jobs to a helix partition by
>>>>>> running some hash function (simplest is taking a mod of an job)
>>>>>>
>>>>>> " what exactly you need to do to bring a job from OFFLINE to STARTUP?"
>>>>>> I added STARTUP to distinguish the track the fact that a partition
>>>>>> could be hosted on two nodes simultaneously, I doubt offline->UP->OFFLINE
>>>>>> model can give me such information.
>>>>>>
>>>>>> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring
>>>>>> up the job in node-2 (OFFLINE->UP)"
>>>>>> I think it may not work in my case. Here is what I see the
>>>>>> implications.
>>>>>> 1. While node1 is in drain, old jobs continue to run, but i want new
>>>>>> jobs (for same partition) to be hosted by partition. Think of it as a
>>>>>> partition moves from one node to other but over a long time (hours) as
>>>>>> determined by when all existing jobs running on node1 finish.
>>>>>> 2. As per your suggestion,  node-2 serves the partition only when
>>>>>> node-1 is offline. But it cannot satisfy 1 above.
>>>>>> One workaround I can have is to handle up->offline transition event
>>>>>> in the application and save the information about the node1 somewhere, 
>>>>>> then
>>>>>> use this information later to distinguish old jobs and new jobs. But this
>>>>>> information is stored outside helix and i wanted to avoid it.  What
>>>>>> attracted me towards helix is it's auto re-balancing capability and it's 
>>>>>> a
>>>>>> central strorage for state of cluster which I can use for my routing 
>>>>>> logic.
>>>>>> 3. A job could be running for hours and thus drain can happen for a
>>>>>> long time.
>>>>>>
>>>>>>
>>>>>> "  How long you would expect OFFLINE->UP take here, if it is fast,
>>>>>> the switch should be fast. "
>>>>>> OFFLINE->UP is fast,  As I describe above, it's the drain on earlier
>>>>>> running node which is slow, the existing jobs cannot be pre-empted to 
>>>>>> move
>>>>>> to new node.
>>>>>>
>>>>>> Regards,
>>>>>> Santosh
>>>>>>
>>>>>> On Tue, May 12, 2020 at 10:40 AM Lei Xia <[email protected]> wrote:
>>>>>>
>>>>>>> Hi, Santosh
>>>>>>>
>>>>>>>   One question, what exactly you need to do to bring a job from
>>>>>>> OFFLINE to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From
>>>>>>> OFFLINE->UP you will get the job started and ready to serve request.  
>>>>>>> From
>>>>>>> UP->OFFLINE you will block there until job get drained.
>>>>>>>
>>>>>>>  With this state model, you can start to drain a node by disabling
>>>>>>> it. Once a node is disabled, Helix will send UP->OFFLINE transition to 
>>>>>>> all
>>>>>>> partitions on that node, in your implementation of UP->OFFLINE 
>>>>>>> transition,
>>>>>>> you block there until the job completes. Once the job (partition) on 
>>>>>>> node-1
>>>>>>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
>>>>>>> this work for you?  How long you would expect OFFLINE->UP take here, if 
>>>>>>> it
>>>>>>> is fast, the switch should be fast.
>>>>>>>
>>>>>>>
>>>>>>> Lei
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Yes, there would be a database.
>>>>>>>> So far i have following state model for partition.
>>>>>>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>>>>>>> following
>>>>>>>> 1. How to Trigger Drain (This is for example we decide to get node
>>>>>>>> out for maintenance)
>>>>>>>> 2. Once a drain has started, I expect helix rebalancer to kick in
>>>>>>>> and move the partition simultaneously on another node in start_up mode.
>>>>>>>> 3. Once All jobs  on node1 are done, need a manual way to trigger
>>>>>>>> it to offline and move the other partition to UP state.
>>>>>>>>
>>>>>>>> It might be possible that my thinking is entirely wrong and how to
>>>>>>>> fit it in helix model,  but essentially above is the sequence of i want
>>>>>>>> achieve.  Any pointers will be of great help. The constraint is that 
>>>>>>>> it's a
>>>>>>>> long running jobs that cannot be moved immediately to other node.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Santosh
>>>>>>>>
>>>>>>>> On Tue, May 12, 2020 at 1:25 AM kishore g <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I was thinking exactly in that direction - having two states is
>>>>>>>>> the right thing to do. Before we get there, one more question -
>>>>>>>>>
>>>>>>>>> - when you get a request for a job, how do you know if that job is
>>>>>>>>> old or new? Is there a database that provides the mapping between job 
>>>>>>>>> and
>>>>>>>>> node
>>>>>>>>>
>>>>>>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Thank You Kishore,
>>>>>>>>>>
>>>>>>>>>> During drain process N2 will start new jobs, the requests related
>>>>>>>>>> to old jobs need to go to N1 and requests for new jobs need to go to 
>>>>>>>>>> N2.
>>>>>>>>>> Thus during drain on N1, the partition could be present on both 
>>>>>>>>>> nodes.
>>>>>>>>>>
>>>>>>>>>> My current thinking is that in helix somehow i need to model is
>>>>>>>>>> as Partition P with two different states on these two nodes. . e.g. 
>>>>>>>>>> N1
>>>>>>>>>> could have partition P in Drain State and N2 can have partition P in
>>>>>>>>>> START_UP state.
>>>>>>>>>> I don't know if my thinking about states is correct, but looking
>>>>>>>>>> for any pointers.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Santosh
>>>>>>>>>>
>>>>>>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> what  happens to request during the drain process i.e when you
>>>>>>>>>>> put N1 out of service and while N2 is waiting for N1 to finish the 
>>>>>>>>>>> jobs,
>>>>>>>>>>> where will the requests for P go to - N1 or N2
>>>>>>>>>>>
>>>>>>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> I am looking for some clues or inputs on how to achieve
>>>>>>>>>>>> following
>>>>>>>>>>>>
>>>>>>>>>>>> I am working on a service that involves running a
>>>>>>>>>>>> statetful long running jobs on a node. These long running jobs 
>>>>>>>>>>>> cannot be
>>>>>>>>>>>> preempted and continue on other nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> Problem Requirements :
>>>>>>>>>>>> 1. In helix nomenclature, I let's say an helix partition P that
>>>>>>>>>>>> involves J number of such jobs running on a node. (N1)
>>>>>>>>>>>> 2. When I put the node in a drain, I want helix to assign a new
>>>>>>>>>>>> node to this partition (P) is also started on the new node (N2).
>>>>>>>>>>>>
>>>>>>>>>>>> 3. N1 can be put out of service only when all running jobs (J)
>>>>>>>>>>>> on it are over, at this point only N2 will serve P request.
>>>>>>>>>>>>
>>>>>>>>>>>> Questions :
>>>>>>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state
>>>>>>>>>>>> model?
>>>>>>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>>>>>>> documentation, I gather that Helix controller in full auto mode, 
>>>>>>>>>>>> triggers
>>>>>>>>>>>> state transitions only when number of partitions change or cluster 
>>>>>>>>>>>> changes
>>>>>>>>>>>> (node addition or deletion)
>>>>>>>>>>>> 3.I guess  spectator will be needed, to custom routing logic in
>>>>>>>>>>>> such cases, any pointers for the the same?
>>>>>>>>>>>>
>>>>>>>>>>>> Thank You
>>>>>>>>>>>> Santosh
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Lei Xia
>>>>>>>
>>>>>>
>>>
>>> --
>>> Lei Xia
>>>
>>
>
> --
> Lei Xia
>

Re: Long running jobs and node drain

Reply via email to