Re: Long running jobs and node drain

santosh gujar Thu, 21 May 2020 04:54:09 -0700

Hello All,

Any inputs on below.


Thank You and appreciate your help.

Regards,
Santosh

On Thu, May 14, 2020 at 2:47 PM santosh gujar <[email protected]>
wrote:

> Thanks a lot Lei,
>
> One last question on this topic,
>
> I gather from documentation that helix controller is the one that directs
> state transitions in a greedy fashion. But this is a synchronous call, e.g.
> in the example that we have been discussing,  the moment, the call returns
> from UpToDrain(), the controller will call DrainToOffiline() immediately
> and also update the states in Zookeeper accordingly. Is my understanding
> correct?
>
> If yes, Is there anyway the transition can be asynchronous?  i.e.  i get
> notified for up->drain transition, but drain->offline happens only when I
> call some api on helix controller? e.g. in my case,  I would have to wait
> via some kind of thread.wait() / sleep() until all other jobs are over. But
> that could introduce some brittleness such that the process that is
> handling the state transition cannot crash until all other jobs (which
> could be running as separate processes) are finished. My preference would
> be calling back an api to helix controller for further state transition
> (drain->offline) for the partition.
>
> Thanks,
> Santosh
>
>
>
> On Thu, May 14, 2020 at 1:28 AM Lei Xia <[email protected]> wrote:
>
>> Hi, Santosh
>>
>>   I meant the DRAIN-OFFLINE transition should be blocked. You can not
>> block at up->drain, otherwise from Helix perspective the partition will be
>> still in UP state, it won't bring new partition online.  The code logic
>> could be something like below.
>>
>> class MyModel extends StateModel  {
>> @Transition(from = "UP", to = "DRAIN")
>>  public void UpToDrain(Message message, NotificationContext context) {
>>   // you may disable some flags here to not take new jobs
>>  }
>>
>> @Transition(from = "DRAIN", to = "OFFLINE")
>>  public void DrainToOffline(Message message, NotificationContext context)
>> {
>>    wait (all job completed);
>>   // additional cleanup works.
>>  }
>>
>> @Transition(from = "OFFLINE", to = "UP")
>>  public void OfflineToUP(Message message, NotificationContext context) {
>>   // get ready to take new jobs.
>>  }
>>
>> On Wed, May 13, 2020 at 11:24 AM santosh gujar <[email protected]>
>> wrote:
>>
>>>
>>> Thanks a lot Lei, I assume by blocking you mean , blocking on in a
>>> method call that is called
>>>
>>> e.g. following pseudo code.
>>>
>>> class MyModel extends StateModel  {
>>> @Transition(from = "UP", to = "DRAIN")
>>> public void offlineToSlave(Message message, NotificationContext context)
>>> {
>>> //don't return until long long running job is running
>>> }
>>>
>>> On Wed, May 13, 2020 at 10:40 PM Lei Xia <[email protected]> wrote:
>>>
>>>> Hi, Santosh
>>>>
>>>>   Thanks for explaining your case in detail. In this case, I would
>>>> recommend you to use "OFFLINE->UP->DRAIN->OFFLINE" model. And you can set
>>>> the constraint of your model to limit # of replica in UP state to be 1,
>>>> i.e, Helix will make sure there is only 1 replica in UP at same time. When
>>>> you are ready to drain an instance, disable the instance first, then Helix
>>>> will transit all partitions (jobs) on that instance to DRAIN and then
>>>> OFFLINE, you can block at DRAIN->OFFLINE transition until all jobs are
>>>> completed.  On the other hand, once the old partition is in DRAIN state,
>>>> Helix should bring up a new partition to UP (OFFLINE->UP) on a new node.
>>>>
>>>>
>>>>
>>>> Lei
>>>>
>>>> On Tue, May 12, 2020 at 10:58 AM santosh gujar <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Hunter,
>>>>>
>>>>> For various limitations and constraints at this moment, I cannot go
>>>>> down the path of Task Framework.
>>>>>
>>>>> Thanks,
>>>>> Santosh
>>>>>
>>>>> On Tue, May 12, 2020 at 7:23 PM Hunter Lee <[email protected]> wrote:
>>>>>
>>>>>> Alternative idea:
>>>>>>
>>>>>> Have you considered using Task Framework's targeted jobs for this use
>>>>>> case? You could make the jobs long-running, and this way, you save 
>>>>>> yourself
>>>>>> the trouble of having to implement the routing layer (simply specifying
>>>>>> which partition to target in your JobConfig would do it).
>>>>>>
>>>>>> Task Framework doesn't actively terminate running threads on the
>>>>>> worker (Participant) nodes, so you could achieve the effect of "draining"
>>>>>> the node by letting previously assigned tasks to finish by not actively
>>>>>> canceling them in your cancel() logic.
>>>>>>
>>>>>> Hunter
>>>>>>
>>>>>> On Tue, May 12, 2020 at 1:02 AM santosh gujar <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Lei,
>>>>>>>
>>>>>>> Thanks a lot for your time and response.
>>>>>>>
>>>>>>> Some more context about helix partition that i mentioned in my email
>>>>>>> earlier.
>>>>>>> My thinking is to my map multiple long jobs to a helix partition by
>>>>>>> running some hash function (simplest is taking a mod of an job)
>>>>>>>
>>>>>>> " what exactly you need to do to bring a job from OFFLINE to
>>>>>>> STARTUP?"
>>>>>>> I added STARTUP to distinguish the track the fact that a partition
>>>>>>> could be hosted on two nodes simultaneously, I doubt 
>>>>>>> offline->UP->OFFLINE
>>>>>>> model can give me such information.
>>>>>>>
>>>>>>> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring
>>>>>>> up the job in node-2 (OFFLINE->UP)"
>>>>>>> I think it may not work in my case. Here is what I see the
>>>>>>> implications.
>>>>>>> 1. While node1 is in drain, old jobs continue to run, but i want new
>>>>>>> jobs (for same partition) to be hosted by partition. Think of it as a
>>>>>>> partition moves from one node to other but over a long time (hours) as
>>>>>>> determined by when all existing jobs running on node1 finish.
>>>>>>> 2. As per your suggestion,  node-2 serves the partition only when
>>>>>>> node-1 is offline. But it cannot satisfy 1 above.
>>>>>>> One workaround I can have is to handle up->offline transition event
>>>>>>> in the application and save the information about the node1 somewhere, 
>>>>>>> then
>>>>>>> use this information later to distinguish old jobs and new jobs. But 
>>>>>>> this
>>>>>>> information is stored outside helix and i wanted to avoid it.  What
>>>>>>> attracted me towards helix is it's auto re-balancing capability and 
>>>>>>> it's a
>>>>>>> central strorage for state of cluster which I can use for my routing 
>>>>>>> logic.
>>>>>>> 3. A job could be running for hours and thus drain can happen for a
>>>>>>> long time.
>>>>>>>
>>>>>>>
>>>>>>> "  How long you would expect OFFLINE->UP take here, if it is fast,
>>>>>>> the switch should be fast. "
>>>>>>> OFFLINE->UP is fast,  As I describe above, it's the drain on earlier
>>>>>>> running node which is slow, the existing jobs cannot be pre-empted to 
>>>>>>> move
>>>>>>> to new node.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Santosh
>>>>>>>
>>>>>>> On Tue, May 12, 2020 at 10:40 AM Lei Xia <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi, Santosh
>>>>>>>>
>>>>>>>>   One question, what exactly you need to do to bring a job from
>>>>>>>> OFFLINE to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From
>>>>>>>> OFFLINE->UP you will get the job started and ready to serve request.  
>>>>>>>> From
>>>>>>>> UP->OFFLINE you will block there until job get drained.
>>>>>>>>
>>>>>>>>  With this state model, you can start to drain a node by disabling
>>>>>>>> it. Once a node is disabled, Helix will send UP->OFFLINE transition to 
>>>>>>>> all
>>>>>>>> partitions on that node, in your implementation of UP->OFFLINE 
>>>>>>>> transition,
>>>>>>>> you block there until the job completes. Once the job (partition) on 
>>>>>>>> node-1
>>>>>>>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  
>>>>>>>> Does
>>>>>>>> this work for you?  How long you would expect OFFLINE->UP take here, 
>>>>>>>> if it
>>>>>>>> is fast, the switch should be fast.
>>>>>>>>
>>>>>>>>
>>>>>>>> Lei
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Yes, there would be a database.
>>>>>>>>> So far i have following state model for partition.
>>>>>>>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>>>>>>>> following
>>>>>>>>> 1. How to Trigger Drain (This is for example we decide to get node
>>>>>>>>> out for maintenance)
>>>>>>>>> 2. Once a drain has started, I expect helix rebalancer to kick in
>>>>>>>>> and move the partition simultaneously on another node in start_up 
>>>>>>>>> mode.
>>>>>>>>> 3. Once All jobs  on node1 are done, need a manual way to trigger
>>>>>>>>> it to offline and move the other partition to UP state.
>>>>>>>>>
>>>>>>>>> It might be possible that my thinking is entirely wrong and how to
>>>>>>>>> fit it in helix model,  but essentially above is the sequence of i 
>>>>>>>>> want
>>>>>>>>> achieve.  Any pointers will be of great help. The constraint is that 
>>>>>>>>> it's a
>>>>>>>>> long running jobs that cannot be moved immediately to other node.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Santosh
>>>>>>>>>
>>>>>>>>> On Tue, May 12, 2020 at 1:25 AM kishore g <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I was thinking exactly in that direction - having two states is
>>>>>>>>>> the right thing to do. Before we get there, one more question -
>>>>>>>>>>
>>>>>>>>>> - when you get a request for a job, how do you know if that job
>>>>>>>>>> is old or new? Is there a database that provides the mapping between 
>>>>>>>>>> job
>>>>>>>>>> and node
>>>>>>>>>>
>>>>>>>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thank You Kishore,
>>>>>>>>>>>
>>>>>>>>>>> During drain process N2 will start new jobs, the requests
>>>>>>>>>>> related to old jobs need to go to N1 and requests for new jobs need 
>>>>>>>>>>> to go
>>>>>>>>>>> to N2. Thus during drain on N1, the partition could be present on 
>>>>>>>>>>> both
>>>>>>>>>>> nodes.
>>>>>>>>>>>
>>>>>>>>>>> My current thinking is that in helix somehow i need to model is
>>>>>>>>>>> as Partition P with two different states on these two nodes. . e.g. 
>>>>>>>>>>> N1
>>>>>>>>>>> could have partition P in Drain State and N2 can have partition P in
>>>>>>>>>>> START_UP state.
>>>>>>>>>>> I don't know if my thinking about states is correct, but looking
>>>>>>>>>>> for any pointers.
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Santosh
>>>>>>>>>>>
>>>>>>>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> what  happens to request during the drain process i.e when you
>>>>>>>>>>>> put N1 out of service and while N2 is waiting for N1 to finish the 
>>>>>>>>>>>> jobs,
>>>>>>>>>>>> where will the requests for P go to - N1 or N2
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am looking for some clues or inputs on how to achieve
>>>>>>>>>>>>> following
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am working on a service that involves running a
>>>>>>>>>>>>> statetful long running jobs on a node. These long running jobs 
>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>> preempted and continue on other nodes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Problem Requirements :
>>>>>>>>>>>>> 1. In helix nomenclature, I let's say an helix partition P
>>>>>>>>>>>>> that involves J number of such jobs running on a node. (N1)
>>>>>>>>>>>>> 2. When I put the node in a drain, I want helix to assign a
>>>>>>>>>>>>> new node to this partition (P) is also started on the new node 
>>>>>>>>>>>>> (N2).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3. N1 can be put out of service only when all running jobs (J)
>>>>>>>>>>>>> on it are over, at this point only N2 will serve P request.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Questions :
>>>>>>>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state
>>>>>>>>>>>>> model?
>>>>>>>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>>>>>>>> documentation, I gather that Helix controller in full auto mode, 
>>>>>>>>>>>>> triggers
>>>>>>>>>>>>> state transitions only when number of partitions change or 
>>>>>>>>>>>>> cluster changes
>>>>>>>>>>>>> (node addition or deletion)
>>>>>>>>>>>>> 3.I guess  spectator will be needed, to custom routing logic
>>>>>>>>>>>>> in such cases, any pointers for the the same?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank You
>>>>>>>>>>>>> Santosh
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Lei Xia
>>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>> Lei Xia
>>>>
>>>
>>
>> --
>> Lei Xia
>>
>

Re: Long running jobs and node drain

Reply via email to