dang-stripe commented on issue #2951: URL: https://github.com/apache/helix/issues/2951#issuecomment-2436162245
@klsince We do not see that error in logs. After looking deeper, I do see that the Helix ZK client is struggling to maintain a persistent connection with ZK. ``` [2024-10-23 21:28:43.769971] INFO [ZKHelixManager] [ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] KeeperState: SyncConnected, instance: Server_st-noir-test-uswest2b-1_8098, type: PARTICIPANT [2024-10-23 21:29:32.011416] WARN [ZKHelixManager] [ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] KeeperState:Disconnected, SessionId: 10071459e8408e9, instance: Server_st-noir-test-uswest2b-1_8098, type: PARTICIPANT [2024-10-23 21:29:32.157984] INFO [ZKHelixManager] [ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] KeeperState: SyncConnected, instance: Server_st-noir-test-uswest2b-1_8098, type: PARTICIPANT [2024-10-23 21:30:11.646452] WARN [ZKHelixManager] [ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] KeeperState:Disconnected, SessionId: 10071459e8408e9, instance: Server_st-noir-test-uswest2b-1_8098, type: PARTICIPANT [2024-10-23 21:30:12.001107] WARN [ZKHelixManager] [ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] KeeperState:Expired, SessionId: 10071459e8408e9, instance: Server_st-noir-test-uswest2b-1_8098, type: PARTICIPANT [2024-10-23 21:30:12.022364] INFO [ZKHelixManager] [ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] KeeperState: SyncConnected, instance: Server_st-noir-test-uswest2b-1_8098, type: PARTICIPANT [2024-10-23 21:30:12.024330] INFO [ZKHelixManager] [ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] Handling new session, session id: 10071459e840915, instance: Server_st-noir-test-uswest2b-1_8098, instanceTye: PARTICIPANT, cluster: rad-noir [2024-10-23 21:30:12.024355] INFO [ZKHelixManager] [ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] Handle new session, instance: Server_st-noir-test-uswest2b-1_8098, type: PARTICIPANT, session id: 10071459e840915. [2024-10-23 21:30:12.252857] WARN [ZKHelixManager] [HelixTaskExecutor-message_handle_STATE_TRANSITION:1688] zkClient to server1:2181,server2:2181,server3:2181 is not connected, wait for 180000ms. ``` When I look at ZK logs for one of the session IDs, it seems like ZK is trying to re-validate the session, but the client terminates its connection before that could be done. ``` [2024-10-23 21:30:12.021291] INFO LeaderSessionTracker:104 - Committing global session 0x10071459e840915 [2024-10-23 21:30:12.021328] INFO LearnerSessionTracker:116 - Committing global session 0x10071459e840915 [2024-10-23 21:30:12.021495] INFO LearnerSessionTracker:116 - Committing global session 0x10071459e840915 [2024-10-23 21:30:12.021550] INFO LearnerSessionTracker:116 - Committing global session 0x10071459e840915 [2024-10-23 21:30:12.021704] INFO LearnerSessionTracker:116 - Committing global session 0x10071459e840915 [2024-10-23 21:30:36.875070] WARN NIOServerCnxn:364 - Unexpected exception [2024-10-23 21:30:36.875101] EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /10.76.213.167:33230, session = 0x10071459e840915 [2024-10-23 21:30:36.875113] at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163) [2024-10-23 21:30:36.875126] at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326) [2024-10-23 21:30:36.875136] at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522) Show all 9 lines [2024-10-23 21:30:37.097688] INFO Learner:137 - Revalidating client: 0x10071459e840915 [2024-10-23 21:31:23.560963] WARN NIOServerCnxn:364 - Unexpected exception [2024-10-23 21:31:23.561024] EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /10.76.213.167:56312, session = 0x10071459e840915 [2024-10-23 21:31:23.561041] at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163) [2024-10-23 21:31:23.561049] at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326) [2024-10-23 21:31:23.561059] at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522) Show all 9 lines [2024-10-23 21:32:12.194924] WARN NIOServerCnxn:364 - Unexpected exception [2024-10-23 21:32:12.194955] EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /10.76.213.167:35592, session = 0x10071459e840915 [2024-10-23 21:32:12.194967] at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163) [2024-10-23 21:32:12.194975] at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326) [2024-10-23 21:32:12.194986] at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522) Show all 9 lines [2024-10-23 21:32:12.444665] INFO Learner:137 - Revalidating client: 0x10071459e840915 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
