Re: At-most-one-online-partition system with substantial draining time (to go offline)

kishore g Tue, 02 Jun 2020 08:11:25 -0700

Hi Akshesh,

I think it's not working because of the dynamic upper bound of R. You can
either use the master/slave or leader/standby model which has a hard upper
bound of 1.


thanks,
Kishore G

On Tue, Jun 2, 2020 at 2:48 AM Åsmund Tokheim <[email protected]> wrote:

> Hi Akshesh
>
> If I understand you correctly, we had the same requirements on our helix
> setup. Someone more actively using the helix library would have to confirm
> this, but I think we needed to use state transition priorities to ensure
> that master-offline was done before offline-master. See
> http://helix.apache.org/1.0.0-docs/tutorial_state.html. Also as I
> believe you have covered in your solution, make sure the state transition
> function doesn't exit before all the transition work has been performed.
>
> Hope this is of any help
>
> Regards
> Åsmund
>
> On Tue, Jun 2, 2020 at 10:55 AM Akshesh Doshi <[email protected]>
> wrote:
>
>> Hi Wang
>>
>> Thanks for your quick reply and suggestion.
>>
>>
>> Since my partitions are very skewed I want to decide their placement on
>> the nodes, hence I cannot use the FULL_AUTO mode. As for SEMI_AUTO mode,
>> since I care only about the ONLINE partitions (OFFLINE is as good as
>> DROPPED for me), I believe it is not different from CUSTOMIZED in my case?
>> Although I think this problem is not related to the rebalancing mode, I
>> did switch my application to SEMI_AUTO so that I do not run into unexpected
>> problems - thanks for the advice. And I can still reproduce my problem
>> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-semi-auto-txt>
>> .
>>
>>
>> Furthermore, I read this under the *CUSTOMIZED mode* section on the page
>> you shared <https://helix.apache.org/0.9.7-docs/tutorial_rebalance.html>:
>>
>>> *Suppose the current state of the system is ‘MyResource_0’ ->
>>> {N1:MASTER, N2:SLAVE} and the application changes the ideal state to
>>> ‘MyResource_0’ -> {N1:SLAVE,N2:MASTER}. While the application decides which
>>> node is MASTER and which is SLAVE, Helix will not blindly issue
>>> MASTER-->SLAVE to N1 and SLAVE-->MASTER to N2 in parallel, since that might
>>> result in a transient state where both N1 and N2 are masters, which
>>> violates the MasterSlave constraint that there is exactly one MASTER at a
>>> time. Helix will first issue MASTER-->SLAVE to N1 and after it is
>>> completed, it will issue SLAVE-->MASTER to N2.*
>>
>> ^This is exactly the responsibility I am trying to off-load to Helix but
>> I am seeing that Helix is issuing both those transitions (ONLINE-->OFFLINE
>> and OFFLINE-->ONLINE) in parallel. Is there any configuration that decides
>> this behaviour that I should be looking into?
>>
>>
>> Thank you again for your response - truly appreciate your efforts to help
>> me out.
>>
>> Regards
>> Akshesh Doshi
>>
>>
>> On Tue, 2 Jun 2020 at 13:27, Wang Jiajun <[email protected]> wrote:
>>
>>> Hi Akshesh,
>>>
>>> How did you set up your resource? I notice it is in the CUSTOMIZED mode.
>>> If you refer to this page
>>> <https://helix.apache.org/0.9.7-docs/tutorial_rebalance.html>, both
>>> replica location and the state will be defined by the application instead
>>> of Helix. I think you should use FULL_AUTO or at least SEMI_AUTO and then
>>> try again.
>>>
>>> Best Regards,
>>> Jiajun
>>>
>>>
>>> On Mon, Jun 1, 2020 at 10:34 PM Akshesh Doshi <[email protected]>
>>> wrote:
>>>
>>>> Hi Helix community
>>>>
>>>> Nice to e-meet you guys. I am pretty new to this project and it is my
>>>> first time writing to this mailing list - I apologize in advance for any
>>>> mistakes.
>>>>
>>>> I am trying to implement a system's state model requirement here but am
>>>> not able to achieve it. Hoping anyone here could point me in the right
>>>> direction.
>>>>
>>>>
>>>> GOAL
>>>> My system is a typical multi-node + multi-resource system with the
>>>> following properties:
>>>> 1. Any partition should have one & only one *online* partition at any
>>>> given point of time.
>>>> 2. The ONLINE -> OFFLINE transition is not instantaneous (typically
>>>> takes minutes).
>>>> 3. Offline partitions have no special role - they can be dropped as
>>>> soon as they become offline.
>>>>
>>>> If it helps in understanding better, my application is a tool which
>>>> copies data from Kafka to Hadoop.
>>>> And having two ONLINE partitions at the same time means I am
>>>> duplicating this data in Hadoop.
>>>>
>>>>
>>>> WHAT I HAVE TRIED
>>>> I was able to successfully modify the Quickstart
>>>> <https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/examples/Quickstart.java>
>>>>  script
>>>> to imitate my use-case so I believe Helix can handle this scenario.
>>>> But when I do it in my application I see that Helix fires the ONLINE ->
>>>> OFFLINE & OFFLINE -> ONLINE transitions (to the corresponding 2 nodes)
>>>> almost simultaneously. I want Helix to signal "ONLINE -> OFFLINE", then
>>>> wait until the partition goes offline and only then fire the "OFFLINE ->
>>>> ONLINE" transition to the new upcoming node.
>>>> I have implemented my *@Transition(from = "ONLINE", to = "OFFLINE")* 
>>>> function
>>>> in such a way that it waits for the partition to go offline (using
>>>> *latch.await()*
>>>> <https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CountDownLatch.html#await-->)
>>>> and only then returns (I have confirmed this from application logs).
>>>>
>>>> My application is different from my Quickstart app in the following
>>>> ways (or at least, these are the ones known to me, I am building upon
>>>> someone else's project so there might be code that I am not aware of):
>>>> 1. The rebalancing algo is *not* AUTO - I am using my own custom logic
>>>> to distribute partitions among nodes
>>>> 2. I have enabled nodes to auto-join i.e. 
>>>> *props.put(ZKHelixManager.ALLOW_PARTICIPANT_AUTO_JOIN,
>>>> String.valueOf(true));*
>>>> Is it possible for me to achieve this system with these settings
>>>> enabled?
>>>>
>>>>
>>>> DEBUG LOGS / CODE
>>>> If it helps, this is what I see in Zookeeper after adding a 2nd node to
>>>> my cluster which had 1 node with 1 resource with 6 partitions -
>>>> https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-txt
>>>> As you can see
>>>> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-txt-L15>,
>>>> there are a few partitions which have 2 ONLINE replicas at the same time
>>>> (after a while the draining replica goes away but in that duration, my data
>>>> gets duplicated, which is the problem I want to overcome). I cannot
>>>> understand how this is possible when I have set up these bounds
>>>> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-onlineofflinestatemodel-java-L36>
>>>>  in my model definition
>>>> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-onlineofflinestatemodel-java>
>>>> .
>>>>
>>>>
>>>>
>>>> I would really appreciate if anyone here could give me any clues that
>>>> what I might be doing wrong (or what I am trying to achieve is even
>>>> possible or not with Helix).
>>>>
>>>> Thank you so much for building such a wonderful tool and having this
>>>> mailing list to help us out.
>>>>
>>>>
>>>> Regards
>>>> Akshesh Doshi
>>>>
>>>

Re: At-most-one-online-partition system with substantial draining time (to go offline)

Reply via email to