Re: At-most-one-online-partition system with substantial draining time (to go offline)

Akshesh Doshi Tue, 02 Jun 2020 12:49:32 -0700

Hi Åsmund

Thanks a lot for sharing your use-case with me. I think what you're saying
is correct and as per the priority, I do see my ONLINE->OFFLINE transitions
going before OFFLINE->ONLINE but just by a few milliseconds. I want it to
happen once the first transition completes.
Also, in the tutorial_state
<http://helix.apache.org/1.0.0-docs/tutorial_state.html> link, it mentions


> By default, Helix simply sorts the transitions alphabetically and fires as
> many as it can *without violating the constraints*.

The last part of the statement is what makes me believe that I should not
need to worry about the transition priorities to maintain my state
constrains.


Hi Kishore

I tried to simulate my use-case completely
<https://github.com/apache/helix/pull/1050> with the Quickstart app (along
with the hard upper bound of 1) and that does seem to behave as expected.
So I believe I need to dig deeper into my application's code to see if
there is anything else going on that is related to Helix.


Thank you very much, everyone, for all your help - I think I can build upon
this quickstart code. If I find anything interesting, I'll post to this
mailing list for future readers.


Regards
Akshesh Doshi


On Tue, 2 Jun 2020 at 16:48, Åsmund Tokheim <[email protected]> wrote:

> Hi Akshesh
>
> If I understand you correctly, we had the same requirements on our helix
> setup. Someone more actively using the helix library would have to confirm
> this, but I think we needed to use state transition priorities to ensure
> that master-offline was done before offline-master. See
> http://helix.apache.org/1.0.0-docs/tutorial_state.html. Also as I
> believe you have covered in your solution, make sure the state transition
> function doesn't exit before all the transition work has been performed.
>
> Hope this is of any help
>
> Regards
> Åsmund
>
> On Tue, Jun 2, 2020 at 10:55 AM Akshesh Doshi <[email protected]>
> wrote:
>
>> Hi Wang
>>
>> Thanks for your quick reply and suggestion.
>>
>>
>> Since my partitions are very skewed I want to decide their placement on
>> the nodes, hence I cannot use the FULL_AUTO mode. As for SEMI_AUTO mode,
>> since I care only about the ONLINE partitions (OFFLINE is as good as
>> DROPPED for me), I believe it is not different from CUSTOMIZED in my case?
>> Although I think this problem is not related to the rebalancing mode, I
>> did switch my application to SEMI_AUTO so that I do not run into unexpected
>> problems - thanks for the advice. And I can still reproduce my problem
>> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-semi-auto-txt>
>> .
>>
>>
>> Furthermore, I read this under the *CUSTOMIZED mode* section on the page
>> you shared <https://helix.apache.org/0.9.7-docs/tutorial_rebalance.html>:
>>
>>> *Suppose the current state of the system is ‘MyResource_0’ ->
>>> {N1:MASTER, N2:SLAVE} and the application changes the ideal state to
>>> ‘MyResource_0’ -> {N1:SLAVE,N2:MASTER}. While the application decides which
>>> node is MASTER and which is SLAVE, Helix will not blindly issue
>>> MASTER-->SLAVE to N1 and SLAVE-->MASTER to N2 in parallel, since that might
>>> result in a transient state where both N1 and N2 are masters, which
>>> violates the MasterSlave constraint that there is exactly one MASTER at a
>>> time. Helix will first issue MASTER-->SLAVE to N1 and after it is
>>> completed, it will issue SLAVE-->MASTER to N2.*
>>
>> ^This is exactly the responsibility I am trying to off-load to Helix but
>> I am seeing that Helix is issuing both those transitions (ONLINE-->OFFLINE
>> and OFFLINE-->ONLINE) in parallel. Is there any configuration that decides
>> this behaviour that I should be looking into?
>>
>>
>> Thank you again for your response - truly appreciate your efforts to help
>> me out.
>>
>> Regards
>> Akshesh Doshi
>>
>>
>> On Tue, 2 Jun 2020 at 13:27, Wang Jiajun <[email protected]> wrote:
>>
>>> Hi Akshesh,
>>>
>>> How did you set up your resource? I notice it is in the CUSTOMIZED mode.
>>> If you refer to this page
>>> <https://helix.apache.org/0.9.7-docs/tutorial_rebalance.html>, both
>>> replica location and the state will be defined by the application instead
>>> of Helix. I think you should use FULL_AUTO or at least SEMI_AUTO and then
>>> try again.
>>>
>>> Best Regards,
>>> Jiajun
>>>
>>>
>>> On Mon, Jun 1, 2020 at 10:34 PM Akshesh Doshi <[email protected]>
>>> wrote:
>>>
>>>> Hi Helix community
>>>>
>>>> Nice to e-meet you guys. I am pretty new to this project and it is my
>>>> first time writing to this mailing list - I apologize in advance for any
>>>> mistakes.
>>>>
>>>> I am trying to implement a system's state model requirement here but am
>>>> not able to achieve it. Hoping anyone here could point me in the right
>>>> direction.
>>>>
>>>>
>>>> GOAL
>>>> My system is a typical multi-node + multi-resource system with the
>>>> following properties:
>>>> 1. Any partition should have one & only one *online* partition at any
>>>> given point of time.
>>>> 2. The ONLINE -> OFFLINE transition is not instantaneous (typically
>>>> takes minutes).
>>>> 3. Offline partitions have no special role - they can be dropped as
>>>> soon as they become offline.
>>>>
>>>> If it helps in understanding better, my application is a tool which
>>>> copies data from Kafka to Hadoop.
>>>> And having two ONLINE partitions at the same time means I am
>>>> duplicating this data in Hadoop.
>>>>
>>>>
>>>> WHAT I HAVE TRIED
>>>> I was able to successfully modify the Quickstart
>>>> <https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/examples/Quickstart.java>
>>>>  script
>>>> to imitate my use-case so I believe Helix can handle this scenario.
>>>> But when I do it in my application I see that Helix fires the ONLINE ->
>>>> OFFLINE & OFFLINE -> ONLINE transitions (to the corresponding 2 nodes)
>>>> almost simultaneously. I want Helix to signal "ONLINE -> OFFLINE", then
>>>> wait until the partition goes offline and only then fire the "OFFLINE ->
>>>> ONLINE" transition to the new upcoming node.
>>>> I have implemented my *@Transition(from = "ONLINE", to = "OFFLINE")* 
>>>> function
>>>> in such a way that it waits for the partition to go offline (using
>>>> *latch.await()*
>>>> <https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CountDownLatch.html#await-->)
>>>> and only then returns (I have confirmed this from application logs).
>>>>
>>>> My application is different from my Quickstart app in the following
>>>> ways (or at least, these are the ones known to me, I am building upon
>>>> someone else's project so there might be code that I am not aware of):
>>>> 1. The rebalancing algo is *not* AUTO - I am using my own custom logic
>>>> to distribute partitions among nodes
>>>> 2. I have enabled nodes to auto-join i.e. 
>>>> *props.put(ZKHelixManager.ALLOW_PARTICIPANT_AUTO_JOIN,
>>>> String.valueOf(true));*
>>>> Is it possible for me to achieve this system with these settings
>>>> enabled?
>>>>
>>>>
>>>> DEBUG LOGS / CODE
>>>> If it helps, this is what I see in Zookeeper after adding a 2nd node to
>>>> my cluster which had 1 node with 1 resource with 6 partitions -
>>>> https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-txt
>>>> As you can see
>>>> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-txt-L15>,
>>>> there are a few partitions which have 2 ONLINE replicas at the same time
>>>> (after a while the draining replica goes away but in that duration, my data
>>>> gets duplicated, which is the problem I want to overcome). I cannot
>>>> understand how this is possible when I have set up these bounds
>>>> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-onlineofflinestatemodel-java-L36>
>>>>  in my model definition
>>>> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-onlineofflinestatemodel-java>
>>>> .
>>>>
>>>>
>>>>
>>>> I would really appreciate if anyone here could give me any clues that
>>>> what I might be doing wrong (or what I am trying to achieve is even
>>>> possible or not with Helix).
>>>>
>>>> Thank you so much for building such a wonderful tool and having this
>>>> mailing list to help us out.
>>>>
>>>>
>>>> Regards
>>>> Akshesh Doshi
>>>>
>>>

Re: At-most-one-online-partition system with substantial draining time (to go offline)

Reply via email to