Hi Baptiste,
There's an on-going work that enhances the impalad to be able to shut down gracefully: https://gerrit.cloudera.org/c/10744/ Thanks to Tim's efforts on this and hope it can be merged soon (so the patch can be more easier to merge into the 2.x branch). We've faced the similar scenario before. One idea to mitigate the service interruption is to set up another impala cluster as a temporary backup. Then switch the load-balancing to the backup cluster and perform long-time maintenance on the origin cluster. Finally, switch back the load-balancing to the origin cluster after all is done. Hope this helps. Regards, Quanlong -- Quanlong Huang Software Developer, Hulu At 2018-07-26 20:21:56, "Baptiste Mille-Mathias" <[email protected]> wrote: Hello, In operation I face having to stop a node or even to perform a rolling-restart over a whole cluster to apply a system patch or an update of configuration. The cluster is running Impala 2.10 and running behind load-balancing (haproxy). The problem is when an Impala server is stopped (being coordinator or executor) all queries it is handling are killed and clients will receive an error, which is quite bad, therefore when you do a rolling-restart that will create as many interruption as you have nodes. I've looked in a way to remove both roles dynamically in order to move the nodes properly out of the cluster before really stopping the service, so no service interruption is experienced but I did not see such API (only saw this possible in configuration file). Is it possible ? if not how do you handle this scenario. thanks for your advice. -- Les gens heureux ne sont pas pressés
