Re: At-most-one-online-partition system with substantial draining time (to go offline)

Akshesh Doshi Tue, 02 Jun 2020 01:55:16 -0700

Hi Wang

Thanks for your quick reply and suggestion.



Since my partitions are very skewed I want to decide their placement on the
nodes, hence I cannot use the FULL_AUTO mode. As for SEMI_AUTO mode, since
I care only about the ONLINE partitions (OFFLINE is as good as DROPPED for
me), I believe it is not different from CUSTOMIZED in my case?
Although I think this problem is not related to the rebalancing mode, I did
switch my application to SEMI_AUTO so that I do not run into unexpected
problems - thanks for the advice. And I can still reproduce my problem
<https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-semi-auto-txt>
.


Furthermore, I read this under the *CUSTOMIZED mode* section on the page
you shared <https://helix.apache.org/0.9.7-docs/tutorial_rebalance.html>:

> *Suppose the current state of the system is ‘MyResource_0’ -> {N1:MASTER,
> N2:SLAVE} and the application changes the ideal state to ‘MyResource_0’ ->
> {N1:SLAVE,N2:MASTER}. While the application decides which node is MASTER
> and which is SLAVE, Helix will not blindly issue MASTER-->SLAVE to N1 and
> SLAVE-->MASTER to N2 in parallel, since that might result in a transient
> state where both N1 and N2 are masters, which violates the MasterSlave
> constraint that there is exactly one MASTER at a time. Helix will first
> issue MASTER-->SLAVE to N1 and after it is completed, it will issue
> SLAVE-->MASTER to N2.*

^This is exactly the responsibility I am trying to off-load to Helix but I
am seeing that Helix is issuing both those transitions (ONLINE-->OFFLINE
and OFFLINE-->ONLINE) in parallel. Is there any configuration that decides
this behaviour that I should be looking into?


Thank you again for your response - truly appreciate your efforts to help
me out.

Regards
Akshesh Doshi


On Tue, 2 Jun 2020 at 13:27, Wang Jiajun <[email protected]> wrote:

> Hi Akshesh,
>
> How did you set up your resource? I notice it is in the CUSTOMIZED mode.
> If you refer to this page
> <https://helix.apache.org/0.9.7-docs/tutorial_rebalance.html>, both
> replica location and the state will be defined by the application instead
> of Helix. I think you should use FULL_AUTO or at least SEMI_AUTO and then
> try again.
>
> Best Regards,
> Jiajun
>
>
> On Mon, Jun 1, 2020 at 10:34 PM Akshesh Doshi <[email protected]>
> wrote:
>
>> Hi Helix community
>>
>> Nice to e-meet you guys. I am pretty new to this project and it is my
>> first time writing to this mailing list - I apologize in advance for any
>> mistakes.
>>
>> I am trying to implement a system's state model requirement here but am
>> not able to achieve it. Hoping anyone here could point me in the right
>> direction.
>>
>>
>> GOAL
>> My system is a typical multi-node + multi-resource system with the
>> following properties:
>> 1. Any partition should have one & only one *online* partition at any
>> given point of time.
>> 2. The ONLINE -> OFFLINE transition is not instantaneous (typically takes
>> minutes).
>> 3. Offline partitions have no special role - they can be dropped as soon
>> as they become offline.
>>
>> If it helps in understanding better, my application is a tool which
>> copies data from Kafka to Hadoop.
>> And having two ONLINE partitions at the same time means I am duplicating
>> this data in Hadoop.
>>
>>
>> WHAT I HAVE TRIED
>> I was able to successfully modify the Quickstart
>> <https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/examples/Quickstart.java>
>>  script
>> to imitate my use-case so I believe Helix can handle this scenario.
>> But when I do it in my application I see that Helix fires the ONLINE ->
>> OFFLINE & OFFLINE -> ONLINE transitions (to the corresponding 2 nodes)
>> almost simultaneously. I want Helix to signal "ONLINE -> OFFLINE", then
>> wait until the partition goes offline and only then fire the "OFFLINE ->
>> ONLINE" transition to the new upcoming node.
>> I have implemented my *@Transition(from = "ONLINE", to = "OFFLINE")* function
>> in such a way that it waits for the partition to go offline (using
>> *latch.await()*
>> <https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CountDownLatch.html#await-->)
>> and only then returns (I have confirmed this from application logs).
>>
>> My application is different from my Quickstart app in the following ways
>> (or at least, these are the ones known to me, I am building upon someone
>> else's project so there might be code that I am not aware of):
>> 1. The rebalancing algo is *not* AUTO - I am using my own custom logic
>> to distribute partitions among nodes
>> 2. I have enabled nodes to auto-join i.e. 
>> *props.put(ZKHelixManager.ALLOW_PARTICIPANT_AUTO_JOIN,
>> String.valueOf(true));*
>> Is it possible for me to achieve this system with these settings enabled?
>>
>>
>> DEBUG LOGS / CODE
>> If it helps, this is what I see in Zookeeper after adding a 2nd node to
>> my cluster which had 1 node with 1 resource with 6 partitions -
>> https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-txt
>> As you can see
>> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-txt-L15>,
>> there are a few partitions which have 2 ONLINE replicas at the same time
>> (after a while the draining replica goes away but in that duration, my data
>> gets duplicated, which is the problem I want to overcome). I cannot
>> understand how this is possible when I have set up these bounds
>> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-onlineofflinestatemodel-java-L36>
>>  in my model definition
>> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-onlineofflinestatemodel-java>
>> .
>>
>>
>>
>> I would really appreciate if anyone here could give me any clues that
>> what I might be doing wrong (or what I am trying to achieve is even
>> possible or not with Helix).
>>
>> Thank you so much for building such a wonderful tool and having this
>> mailing list to help us out.
>>
>>
>> Regards
>> Akshesh Doshi
>>
>

Re: At-most-one-online-partition system with substantial draining time (to go offline)

Reply via email to