I had asked a question a while back about how to deal with a failed state transition ( http://mail-archives.apache.org/mod_mbox/helix-user/202009.mbox/%[email protected]%3E) and the correct answer there was to throw an exception to cause an ERROR state in the state machine.
I have a slightly different but related question now. I'm using the org.apache.helix.model.MasterSlaveSMD. In our system, for a Slave partition to become fully in-sync with a Master partition can take a long time (maybe 30 minutes). Under normal circumstances, until a Slave has finished syncing data from a Master, it should not be eligible for promotion to Master. So let's say a node (maybe newly added to the cluster) is the Slave for partition 22 and has been online for 10 minutes (not long enough to have sync-ed everything from the existing partition 22 Master) and receives a state transition from Helix saying it should go from Slave->Master. Is it possible to temporarily reject that transition without going into ERROR state for that partition? ERROR state seems like slightly the wrong thing because while it's not a valid transition right now, it will be a valid transition 20 minutes from now when the initial sync completes. Is there a way to get this functionality to "fail" a transition, but not fully go into ERROR state? Or is there a different way I should be thinking about solving this problem? I was thinking this could potentially be a frequent occurrence when new nodes are added to the cluster. Thank you for your time and help as always! ~Brent
