Hi Wang Thanks for your quick reply and suggestion.
Since my partitions are very skewed I want to decide their placement on the nodes, hence I cannot use the FULL_AUTO mode. As for SEMI_AUTO mode, since I care only about the ONLINE partitions (OFFLINE is as good as DROPPED for me), I believe it is not different from CUSTOMIZED in my case? Although I think this problem is not related to the rebalancing mode, I did switch my application to SEMI_AUTO so that I do not run into unexpected problems - thanks for the advice. And I can still reproduce my problem <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-semi-auto-txt> . Furthermore, I read this under the *CUSTOMIZED mode* section on the page you shared <https://helix.apache.org/0.9.7-docs/tutorial_rebalance.html>: > *Suppose the current state of the system is ‘MyResource_0’ -> {N1:MASTER, > N2:SLAVE} and the application changes the ideal state to ‘MyResource_0’ -> > {N1:SLAVE,N2:MASTER}. While the application decides which node is MASTER > and which is SLAVE, Helix will not blindly issue MASTER-->SLAVE to N1 and > SLAVE-->MASTER to N2 in parallel, since that might result in a transient > state where both N1 and N2 are masters, which violates the MasterSlave > constraint that there is exactly one MASTER at a time. Helix will first > issue MASTER-->SLAVE to N1 and after it is completed, it will issue > SLAVE-->MASTER to N2.* ^This is exactly the responsibility I am trying to off-load to Helix but I am seeing that Helix is issuing both those transitions (ONLINE-->OFFLINE and OFFLINE-->ONLINE) in parallel. Is there any configuration that decides this behaviour that I should be looking into? Thank you again for your response - truly appreciate your efforts to help me out. Regards Akshesh Doshi On Tue, 2 Jun 2020 at 13:27, Wang Jiajun <[email protected]> wrote: > Hi Akshesh, > > How did you set up your resource? I notice it is in the CUSTOMIZED mode. > If you refer to this page > <https://helix.apache.org/0.9.7-docs/tutorial_rebalance.html>, both > replica location and the state will be defined by the application instead > of Helix. I think you should use FULL_AUTO or at least SEMI_AUTO and then > try again. > > Best Regards, > Jiajun > > > On Mon, Jun 1, 2020 at 10:34 PM Akshesh Doshi <[email protected]> > wrote: > >> Hi Helix community >> >> Nice to e-meet you guys. I am pretty new to this project and it is my >> first time writing to this mailing list - I apologize in advance for any >> mistakes. >> >> I am trying to implement a system's state model requirement here but am >> not able to achieve it. Hoping anyone here could point me in the right >> direction. >> >> >> GOAL >> My system is a typical multi-node + multi-resource system with the >> following properties: >> 1. Any partition should have one & only one *online* partition at any >> given point of time. >> 2. The ONLINE -> OFFLINE transition is not instantaneous (typically takes >> minutes). >> 3. Offline partitions have no special role - they can be dropped as soon >> as they become offline. >> >> If it helps in understanding better, my application is a tool which >> copies data from Kafka to Hadoop. >> And having two ONLINE partitions at the same time means I am duplicating >> this data in Hadoop. >> >> >> WHAT I HAVE TRIED >> I was able to successfully modify the Quickstart >> <https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/examples/Quickstart.java> >> script >> to imitate my use-case so I believe Helix can handle this scenario. >> But when I do it in my application I see that Helix fires the ONLINE -> >> OFFLINE & OFFLINE -> ONLINE transitions (to the corresponding 2 nodes) >> almost simultaneously. I want Helix to signal "ONLINE -> OFFLINE", then >> wait until the partition goes offline and only then fire the "OFFLINE -> >> ONLINE" transition to the new upcoming node. >> I have implemented my *@Transition(from = "ONLINE", to = "OFFLINE")* function >> in such a way that it waits for the partition to go offline (using >> *latch.await()* >> <https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CountDownLatch.html#await-->) >> and only then returns (I have confirmed this from application logs). >> >> My application is different from my Quickstart app in the following ways >> (or at least, these are the ones known to me, I am building upon someone >> else's project so there might be code that I am not aware of): >> 1. The rebalancing algo is *not* AUTO - I am using my own custom logic >> to distribute partitions among nodes >> 2. I have enabled nodes to auto-join i.e. >> *props.put(ZKHelixManager.ALLOW_PARTICIPANT_AUTO_JOIN, >> String.valueOf(true));* >> Is it possible for me to achieve this system with these settings enabled? >> >> >> DEBUG LOGS / CODE >> If it helps, this is what I see in Zookeeper after adding a 2nd node to >> my cluster which had 1 node with 1 resource with 6 partitions - >> https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-txt >> As you can see >> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-txt-L15>, >> there are a few partitions which have 2 ONLINE replicas at the same time >> (after a while the draining replica goes away but in that duration, my data >> gets duplicated, which is the problem I want to overcome). I cannot >> understand how this is possible when I have set up these bounds >> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-onlineofflinestatemodel-java-L36> >> in my model definition >> <https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-onlineofflinestatemodel-java> >> . >> >> >> >> I would really appreciate if anyone here could give me any clues that >> what I might be doing wrong (or what I am trying to achieve is even >> possible or not with Helix). >> >> Thank you so much for building such a wonderful tool and having this >> mailing list to help us out. >> >> >> Regards >> Akshesh Doshi >> >
