Hi, We are looking into the possibility of implementing a Master-Master state machine using Helix. The idea is that the client is writing to n- replicas of a partition - these replicas are located on different machines. There is no communication b/w the replicas themselves and data is replicated through multiple writes.
Each replica exposes the following api(s): a) backup() - backs up at a sequence number b) getUpdatesSince(seq_no) - Get the updates since a given seq no c) setReadOnly(int partition) - make a partition RO We are trying to see if we can use helix to automate the shard copy operation: 1) Node goes down and controller computes new shard placements For a particular shard 2) Target replica goes into "BACKUP" state, in which it finds another replica which is serving the same shard using RoutingTableProvider and then copies it over. 3) Target replica goes into "SYNC" state, in which it uses the getUpdatesSince API to keep sync'ing with the source replica - this goes on indefinitely 4) Controller sets the "source replica" to RO (not a helix state) 5) Target replica catches up and moves to MASTER state (online) 6) Controller sees no one is syncing from target replica anymore and hence remarks the shard as not "RO" I am wondering if this could work with Helix. The major issue here is that some of these transitions need to be sequenced in a particular manner for the target replica and the source replica. Does helix have the ability to make the participants initiate state transitions ? Thanks ! Varun
