I had asked a question a while back about how to deal with a failed state
transition (
http://mail-archives.apache.org/mod_mbox/helix-user/202009.mbox/%[email protected]%3E)
and the correct answer there was to throw an exception to cause an ERROR
state in the state machine.

I have a slightly different but related question now.  I'm using
the org.apache.helix.model.MasterSlaveSMD.  In our system, for a Slave
partition to become fully in-sync with a Master partition can take a long
time (maybe 30 minutes).  Under normal circumstances, until a Slave has
finished syncing data from a Master, it should not be eligible for
promotion to Master.

So let's say a node (maybe newly added to the cluster) is the Slave for
partition 22 and has been online for 10 minutes (not long enough to have
sync-ed everything from the existing partition 22 Master) and receives a
state transition from Helix saying it should go from Slave->Master.  Is it
possible to temporarily reject that transition without going into ERROR
state for that partition?  ERROR state seems like slightly the wrong thing
because while it's not a valid transition right now, it will be a valid
transition 20 minutes from now when the initial sync completes.

Is there a way to get this functionality to "fail" a transition, but not
fully go into ERROR state?  Or is there a different way I should be
thinking about solving this problem?  I was thinking this could potentially
be a frequent occurrence when new nodes are added to the cluster.

Thank you for your time and help as always!

~Brent

Reply via email to