klsince commented on issue #2951: URL: https://github.com/apache/helix/issues/2951#issuecomment-2430619817
Hi @dang-stripe did you see this ERROR msg in server logs? Which got printed every minute. ` … ERROR [ZKHelixManager] [ZkClient-EventThread-172-pinot-zookeeper-headless:2181] fail to connect zkserver: pinot-zookeeper-headless:2181 in 60000ms. expiredSessionId: 3002f8e84690016, clusterName: pinot ` We saw a similar issue and still root causing it. After long GC pauses, the server was removed from the LIVE Instance list, since its ZK session expired. Although the server started a session per the server logs but it could not update it to Helix. From the error logs above, the server kept trying to reconnect every minute but all failed somehow. cc @Jackie fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
