This is a good question. We have the same problem in Pinot and we solved it using a shutdownInProgress flag in instanceConfig znode, spectator will look into this flag and stop routing queries to that node. We avoided using disable instance solution.
The solution is as follows - the participant sets shutdownInProgress=true in InstanceConfig in its shutdownHook - Broker routing table gets updated because it listens to changes in instanceConfig. - the routingTableProvider treats this node as disabled if it sees this flag. (you need to extend routingtableprovider) - when the participant restarts, as part of registering itself in Helix, it sets the shudownInProgress=false. This is a valid feature and potentially be added to Helix. thanks, Kishore G On Wed, Apr 18, 2018 at 10:44 AM, Bo Liu <[email protected]> wrote: > Hi folks, > > We are running a service managed by Helix. When we rolling restart the > service, we first disable instances through Helix before restarting the > service processes in the hope that the read errors are minimized. > > However, the instances being restarted may get Online->Offline messages > before clients get the latest version of the external view. I am wondering > if there is any way to delay the Online->Offline messages generated by > instance "disable" command? > > -- > Best regards, > Bo > >
