At-most-one-online-partition system with substantial draining time (to go offline)

Akshesh Doshi Mon, 01 Jun 2020 22:35:08 -0700

Hi Helix community

Nice to e-meet you guys. I am pretty new to this project and it is my first
time writing to this mailing list - I apologize in advance for any mistakes.

I am trying to implement a system's state model requirement here but am not
able to achieve it. Hoping anyone here could point me in the right
direction.

GOAL
My system is a typical multi-node + multi-resource system with the
following properties:
1. Any partition should have one & only one *online* partition at any given
point of time.
2. The ONLINE -> OFFLINE transition is not instantaneous (typically takes
minutes).
3. Offline partitions have no special role - they can be dropped as soon as
they become offline.

If it helps in understanding better, my application is a tool which copies
data from Kafka to Hadoop.
And having two ONLINE partitions at the same time means I am duplicating
this data in Hadoop.

WHAT I HAVE TRIED
I was able to successfully modify the Quickstart
<https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/examples/Quickstart.java>
script
to imitate my use-case so I believe Helix can handle this scenario.
But when I do it in my application I see that Helix fires the ONLINE ->
OFFLINE & OFFLINE -> ONLINE transitions (to the corresponding 2 nodes)
almost simultaneously. I want Helix to signal "ONLINE -> OFFLINE", then
wait until the partition goes offline and only then fire the "OFFLINE ->
ONLINE" transition to the new upcoming node.
I have implemented my *@Transition(from = "ONLINE", to = "OFFLINE")* function
in such a way that it waits for the partition to go offline (using
*latch.await()*
<https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CountDownLatch.html#await-->)
and only then returns (I have confirmed this from application logs).

My application is different from my Quickstart app in the following ways
(or at least, these are the ones known to me, I am building upon someone
else's project so there might be code that I am not aware of):
1. The rebalancing algo is *not* AUTO - I am using my own custom logic to
distribute partitions among nodes
2. I have enabled nodes to auto-join i.e.
*props.put(ZKHelixManager.ALLOW_PARTICIPANT_AUTO_JOIN,
String.valueOf(true));*
Is it possible for me to achieve this system with these settings enabled?

DEBUG LOGS / CODE
If it helps, this is what I see in Zookeeper after adding a 2nd node to my
cluster which had 1 node with 1 resource with 6 partitions -
https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-txt
As you can see
<https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-zookeeper-output-txt-L15>,
there are a few partitions which have 2 ONLINE replicas at the same time
(after a while the draining replica goes away but in that duration, my data
gets duplicated, which is the problem I want to overcome). I cannot
understand how this is possible when I have set up these bounds
<https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-onlineofflinestatemodel-java-L36>
in my model definition
<https://gist.github.com/akki/1d80c97463198275b3abe39350688bda#file-onlineofflinestatemodel-java>
.

I would really appreciate if anyone here could give me any clues that what
I might be doing wrong (or what I am trying to achieve is even possible or
not with Helix).

Thank you so much for building such a wonderful tool and having this
mailing list to help us out.

Regards
Akshesh Doshi

At-most-one-online-partition system with substantial draining time (to go offline)

Reply via email to