Hi,
I'm trying to roll out an upgrade from 0.20.0 to 0.21.0 with slaves
configured with checkpointing and with "reconnect" recovery.
I was investigating why the slaves would successfully re-register with the
master and recover, but would subsequently be asked to shutdown ("health
check timeout").
It turns out that our slaves had been unintentionally configured to use
port 5050 in the previous configuration. We decided to fix that during the
upgrade and have them use the default 5051 port.
This change seems to make the health checks fail and eventually kills the
slave due to inactivity.
I've confirmed that leaving the port to what it was in the previous
configuration makes the slave successfully re-register and is not asked to
shutdown later on.
Is this a known issue? I haven't been able to find a JIRA ticket for this.
Maybe it's the expected behaviour? Should I create a ticket?
Thanks,
Philippe