Re: Long running jobs and node drain

Lei Xia Wed, 13 May 2020 12:59:12 -0700

Hi, Santosh

  I meant the DRAIN-OFFLINE transition should be blocked. You can not block
at up->drain, otherwise from Helix perspective the partition will be still
in UP state, it won't bring new partition online.  The code logic could be
something like below.


class MyModel extends StateModel  {
@Transition(from = "UP", to = "DRAIN")
 public void UpToDrain(Message message, NotificationContext context) {
  // you may disable some flags here to not take new jobs
 }

@Transition(from = "DRAIN", to = "OFFLINE")
 public void DrainToOffline(Message message, NotificationContext context) {
   wait (all job completed);
  // additional cleanup works.
 }

@Transition(from = "OFFLINE", to = "UP")
 public void OfflineToUP(Message message, NotificationContext context) {
  // get ready to take new jobs.
 }

On Wed, May 13, 2020 at 11:24 AM santosh gujar <[email protected]>
wrote:

>
> Thanks a lot Lei, I assume by blocking you mean , blocking on in a method
> call that is called
>
> e.g. following pseudo code.
>
> class MyModel extends StateModel  {
> @Transition(from = "UP", to = "DRAIN")
> public void offlineToSlave(Message message, NotificationContext context) {
> //don't return until long long running job is running
> }
>
> On Wed, May 13, 2020 at 10:40 PM Lei Xia <[email protected]> wrote:
>
>> Hi, Santosh
>>
>>   Thanks for explaining your case in detail. In this case, I would
>> recommend you to use "OFFLINE->UP->DRAIN->OFFLINE" model. And you can set
>> the constraint of your model to limit # of replica in UP state to be 1,
>> i.e, Helix will make sure there is only 1 replica in UP at same time. When
>> you are ready to drain an instance, disable the instance first, then Helix
>> will transit all partitions (jobs) on that instance to DRAIN and then
>> OFFLINE, you can block at DRAIN->OFFLINE transition until all jobs are
>> completed.  On the other hand, once the old partition is in DRAIN state,
>> Helix should bring up a new partition to UP (OFFLINE->UP) on a new node.
>>
>>
>>
>> Lei
>>
>> On Tue, May 12, 2020 at 10:58 AM santosh gujar <[email protected]>
>> wrote:
>>
>>> Hi Hunter,
>>>
>>> For various limitations and constraints at this moment, I cannot go down
>>> the path of Task Framework.
>>>
>>> Thanks,
>>> Santosh
>>>
>>> On Tue, May 12, 2020 at 7:23 PM Hunter Lee <[email protected]> wrote:
>>>
>>>> Alternative idea:
>>>>
>>>> Have you considered using Task Framework's targeted jobs for this use
>>>> case? You could make the jobs long-running, and this way, you save yourself
>>>> the trouble of having to implement the routing layer (simply specifying
>>>> which partition to target in your JobConfig would do it).
>>>>
>>>> Task Framework doesn't actively terminate running threads on the worker
>>>> (Participant) nodes, so you could achieve the effect of "draining" the node
>>>> by letting previously assigned tasks to finish by not actively canceling
>>>> them in your cancel() logic.
>>>>
>>>> Hunter
>>>>
>>>> On Tue, May 12, 2020 at 1:02 AM santosh gujar <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Lei,
>>>>>
>>>>> Thanks a lot for your time and response.
>>>>>
>>>>> Some more context about helix partition that i mentioned in my email
>>>>> earlier.
>>>>> My thinking is to my map multiple long jobs to a helix partition by
>>>>> running some hash function (simplest is taking a mod of an job)
>>>>>
>>>>> " what exactly you need to do to bring a job from OFFLINE to STARTUP?"
>>>>> I added STARTUP to distinguish the track the fact that a partition
>>>>> could be hosted on two nodes simultaneously, I doubt offline->UP->OFFLINE
>>>>> model can give me such information.
>>>>>
>>>>> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring up
>>>>> the job in node-2 (OFFLINE->UP)"
>>>>> I think it may not work in my case. Here is what I see the
>>>>> implications.
>>>>> 1. While node1 is in drain, old jobs continue to run, but i want new
>>>>> jobs (for same partition) to be hosted by partition. Think of it as a
>>>>> partition moves from one node to other but over a long time (hours) as
>>>>> determined by when all existing jobs running on node1 finish.
>>>>> 2. As per your suggestion,  node-2 serves the partition only when
>>>>> node-1 is offline. But it cannot satisfy 1 above.
>>>>> One workaround I can have is to handle up->offline transition event in
>>>>> the application and save the information about the node1 somewhere, then
>>>>> use this information later to distinguish old jobs and new jobs. But this
>>>>> information is stored outside helix and i wanted to avoid it.  What
>>>>> attracted me towards helix is it's auto re-balancing capability and it's a
>>>>> central strorage for state of cluster which I can use for my routing 
>>>>> logic.
>>>>> 3. A job could be running for hours and thus drain can happen for a
>>>>> long time.
>>>>>
>>>>>
>>>>> "  How long you would expect OFFLINE->UP take here, if it is fast, the
>>>>> switch should be fast. "
>>>>> OFFLINE->UP is fast,  As I describe above, it's the drain on earlier
>>>>> running node which is slow, the existing jobs cannot be pre-empted to move
>>>>> to new node.
>>>>>
>>>>> Regards,
>>>>> Santosh
>>>>>
>>>>> On Tue, May 12, 2020 at 10:40 AM Lei Xia <[email protected]> wrote:
>>>>>
>>>>>> Hi, Santosh
>>>>>>
>>>>>>   One question, what exactly you need to do to bring a job from
>>>>>> OFFLINE to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From
>>>>>> OFFLINE->UP you will get the job started and ready to serve request.  
>>>>>> From
>>>>>> UP->OFFLINE you will block there until job get drained.
>>>>>>
>>>>>>  With this state model, you can start to drain a node by disabling
>>>>>> it. Once a node is disabled, Helix will send UP->OFFLINE transition to 
>>>>>> all
>>>>>> partitions on that node, in your implementation of UP->OFFLINE 
>>>>>> transition,
>>>>>> you block there until the job completes. Once the job (partition) on 
>>>>>> node-1
>>>>>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
>>>>>> this work for you?  How long you would expect OFFLINE->UP take here, if 
>>>>>> it
>>>>>> is fast, the switch should be fast.
>>>>>>
>>>>>>
>>>>>> Lei
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Yes, there would be a database.
>>>>>>> So far i have following state model for partition.
>>>>>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>>>>>> following
>>>>>>> 1. How to Trigger Drain (This is for example we decide to get node
>>>>>>> out for maintenance)
>>>>>>> 2. Once a drain has started, I expect helix rebalancer to kick in
>>>>>>> and move the partition simultaneously on another node in start_up mode.
>>>>>>> 3. Once All jobs  on node1 are done, need a manual way to trigger it
>>>>>>> to offline and move the other partition to UP state.
>>>>>>>
>>>>>>> It might be possible that my thinking is entirely wrong and how to
>>>>>>> fit it in helix model,  but essentially above is the sequence of i want
>>>>>>> achieve.  Any pointers will be of great help. The constraint is that 
>>>>>>> it's a
>>>>>>> long running jobs that cannot be moved immediately to other node.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Santosh
>>>>>>>
>>>>>>> On Tue, May 12, 2020 at 1:25 AM kishore g <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I was thinking exactly in that direction - having two states is the
>>>>>>>> right thing to do. Before we get there, one more question -
>>>>>>>>
>>>>>>>> - when you get a request for a job, how do you know if that job is
>>>>>>>> old or new? Is there a database that provides the mapping between job 
>>>>>>>> and
>>>>>>>> node
>>>>>>>>
>>>>>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Thank You Kishore,
>>>>>>>>>
>>>>>>>>> During drain process N2 will start new jobs, the requests related
>>>>>>>>> to old jobs need to go to N1 and requests for new jobs need to go to 
>>>>>>>>> N2.
>>>>>>>>> Thus during drain on N1, the partition could be present on both nodes.
>>>>>>>>>
>>>>>>>>> My current thinking is that in helix somehow i need to model is
>>>>>>>>> as Partition P with two different states on these two nodes. . e.g. N1
>>>>>>>>> could have partition P in Drain State and N2 can have partition P in
>>>>>>>>> START_UP state.
>>>>>>>>> I don't know if my thinking about states is correct, but looking
>>>>>>>>> for any pointers.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Santosh
>>>>>>>>>
>>>>>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> what  happens to request during the drain process i.e when you
>>>>>>>>>> put N1 out of service and while N2 is waiting for N1 to finish the 
>>>>>>>>>> jobs,
>>>>>>>>>> where will the requests for P go to - N1 or N2
>>>>>>>>>>
>>>>>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I am looking for some clues or inputs on how to achieve
>>>>>>>>>>> following
>>>>>>>>>>>
>>>>>>>>>>> I am working on a service that involves running a statetful long
>>>>>>>>>>> running jobs on a node. These long running jobs cannot be preempted 
>>>>>>>>>>> and
>>>>>>>>>>> continue on other nodes.
>>>>>>>>>>>
>>>>>>>>>>> Problem Requirements :
>>>>>>>>>>> 1. In helix nomenclature, I let's say an helix partition P that
>>>>>>>>>>> involves J number of such jobs running on a node. (N1)
>>>>>>>>>>> 2. When I put the node in a drain, I want helix to assign a new
>>>>>>>>>>> node to this partition (P) is also started on the new node (N2).
>>>>>>>>>>>
>>>>>>>>>>> 3. N1 can be put out of service only when all running jobs (J)
>>>>>>>>>>> on it are over, at this point only N2 will serve P request.
>>>>>>>>>>>
>>>>>>>>>>> Questions :
>>>>>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state
>>>>>>>>>>> model?
>>>>>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>>>>>> documentation, I gather that Helix controller in full auto mode, 
>>>>>>>>>>> triggers
>>>>>>>>>>> state transitions only when number of partitions change or cluster 
>>>>>>>>>>> changes
>>>>>>>>>>> (node addition or deletion)
>>>>>>>>>>> 3.I guess  spectator will be needed, to custom routing logic in
>>>>>>>>>>> such cases, any pointers for the the same?
>>>>>>>>>>>
>>>>>>>>>>> Thank You
>>>>>>>>>>> Santosh
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Lei Xia
>>>>>>
>>>>>
>>
>> --
>> Lei Xia
>>
>

-- 
Lei Xia

Re: Long running jobs and node drain

Reply via email to