Hi,

so in our production, we see temporary networking failures (we are not quite
100% sure what they are) but now and then region server's zookeeper session
would get expired and in addition some ipc channels would throw 'channel
closed'.

This causes region server to exit. Which is not a very big deal, our
monitoring system would send a text message so somebody would restart the
region server.

however, this does happen a little more often than we probably would have
liked to do it manually.

Why is server not recovering/reconnecting automatically? is there a facility
to enable server restarts and region server nodes to rejoin the cluster
automatically?

Thanks.
-Dmitriy

Reply via email to