Re: Temporarily preventing a state transition

Brent Thu, 05 Aug 2021 09:42:53 -0700

Thank you for the response Jiajun!

On the inclusivity thing, I'm glad to hear we're moving to different
terminology.  Our code actually wraps the MS state machine and renames the
terminology to "Leader" and "Follower" everywhere visible to our users and
operators for similar reasons.  :-)   I thought the Leader/Standby SMD was
a bit different which was why I wasn't using it, but looking at the
definition, I guess the only difference is it doesn't seem to define an
ERROR state like the MS SMD does.  So for the rest of this thread, let's
use the LEADER/STANDBY terminology instead.

For context, I have 1000-2000 shards of a database where each shard can be
100GB+ in size so bootstrapping nodes is expensive.  Your logic on
splitting up the STANDBY state into two states like SYNCING and STANDBY
makes sense (OFFLINE -> SYNCING -> STANDBY -> LEADER), though I'm still not
sure how I can prevent the state from transitioning from SYNCING to STANDBY
until the node is ready (i.e. has an up-to-date copy of the leader's
data).  Based on what you were saying, is it possible to have the Helix
controller tell a node it's in SYNCING state, but then have the node decide
when it's safe to transition itself to STANDBY?  Or can state transition
cancellation be used if the node isn't ready?  Or can I just let the
transition timeout if the node isn't ready?

This seems like it would be a pretty common problem with large,
expensive-to-move data (e.g. a shard of a large database), especially when
adding a new node to an existing system and needing to bootstrap it from
nothing.  I suspect people do this and I'm just thinking about it the wrong
way or there's a Helix strategy that I'm just not grasping correctly.

For the LinkedIn folks on the list, what does Espresso do for bootstrapping
new nodes and avoiding this problem of them getting promoted to LEADER
before they're ready?  It seems like a similar problem to mine (stateful
node with large data that needs a leader/standby setup).

Thanks again!

~Brent

On Wed, Aug 4, 2021 at 6:32 PM Wang Jiajun <[email protected]> wrote:

> Hi Brent,
>
> AFAIK, there is no way to tell the controller to suspend a certain state
> transition. Even if you reject the transition (although rejection is not
> officially supported either), the controller will probably retry in the
> next rebalance pipeline repeatedly.
>
> Alternatively, from your description, I think "Slave" means 2 states in
> your system. 1. new Slave that is out of sync. 2. sync-ed Slave. It is
> possible you define a customized mode that differentiates these 2 states?
> Offline -> Syncing -> Slave, etc.
> Even simpler, is it OK to restrict the definition of Slave to the 2nd
> case? Meaning before a partition syncs with the Master, it shall not mark
> itself as the Slave. It implies offline -> Slave transition would take a
> longer time, but once it is done, the Slave partition would be fully ready.
>
> BTW, we encourage the users to use inclusive language. Maybe you can
> consider changing to use the LeaderStandby SMD? We might deprecate
> MasterSlave SMD in the near future.
>
> Best Regards,
> Jiajun
>
>
> On Wed, Aug 4, 2021 at 3:41 PM Brent <[email protected]> wrote:
>
>> I had asked a question a while back about how to deal with a failed state
>> transition (
>> http://mail-archives.apache.org/mod_mbox/helix-user/202009.mbox/%[email protected]%3E)
>> and the correct answer there was to throw an exception to cause an ERROR
>> state in the state machine.
>>
>> I have a slightly different but related question now.  I'm using
>> the org.apache.helix.model.MasterSlaveSMD.  In our system, for a Slave
>> partition to become fully in-sync with a Master partition can take a long
>> time (maybe 30 minutes).  Under normal circumstances, until a Slave has
>> finished syncing data from a Master, it should not be eligible for
>> promotion to Master.
>>
>> So let's say a node (maybe newly added to the cluster) is the Slave for
>> partition 22 and has been online for 10 minutes (not long enough to have
>> sync-ed everything from the existing partition 22 Master) and receives a
>> state transition from Helix saying it should go from Slave->Master.  Is it
>> possible to temporarily reject that transition without going into ERROR
>> state for that partition?  ERROR state seems like slightly the wrong thing
>> because while it's not a valid transition right now, it will be a valid
>> transition 20 minutes from now when the initial sync completes.
>>
>> Is there a way to get this functionality to "fail" a transition, but not
>> fully go into ERROR state?  Or is there a different way I should be
>> thinking about solving this problem?  I was thinking this could potentially
>> be a frequent occurrence when new nodes are added to the cluster.
>>
>> Thank you for your time and help as always!
>>
>> ~Brent
>>
>

Re: Temporarily preventing a state transition

Reply via email to