dang-stripe commented on issue #2951:
URL: https://github.com/apache/helix/issues/2951#issuecomment-2436162245

   @klsince We do not see that error in logs. After looking deeper, I do see 
that the Helix ZK client is struggling to maintain a persistent connection with 
ZK.
   
   ```
   [2024-10-23 21:28:43.769971] INFO [ZKHelixManager] 
[ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] 
KeeperState: SyncConnected, instance: Server_st-noir-test-uswest2b-1_8098, 
type: PARTICIPANT
   [2024-10-23 21:29:32.011416] WARN [ZKHelixManager] 
[ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] 
KeeperState:Disconnected, SessionId: 10071459e8408e9, instance: 
Server_st-noir-test-uswest2b-1_8098, type: PARTICIPANT
   [2024-10-23 21:29:32.157984] INFO [ZKHelixManager] 
[ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] 
KeeperState: SyncConnected, instance: Server_st-noir-test-uswest2b-1_8098, 
type: PARTICIPANT
   [2024-10-23 21:30:11.646452] WARN [ZKHelixManager] 
[ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] 
KeeperState:Disconnected, SessionId: 10071459e8408e9, instance: 
Server_st-noir-test-uswest2b-1_8098, type: PARTICIPANT
   [2024-10-23 21:30:12.001107] WARN [ZKHelixManager] 
[ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] 
KeeperState:Expired, SessionId: 10071459e8408e9, instance: 
Server_st-noir-test-uswest2b-1_8098, type: PARTICIPANT
   [2024-10-23 21:30:12.022364] INFO [ZKHelixManager] 
[ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] 
KeeperState: SyncConnected, instance: Server_st-noir-test-uswest2b-1_8098, 
type: PARTICIPANT
   [2024-10-23 21:30:12.024330] INFO [ZKHelixManager] 
[ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] Handling 
new session, session id: 10071459e840915, instance: 
Server_st-noir-test-uswest2b-1_8098, instanceTye: PARTICIPANT, cluster: rad-noir
   [2024-10-23 21:30:12.024355] INFO [ZKHelixManager] 
[ZkClient-EventThread-111-server1:2181,server2:2181,server3:2181:111] Handle 
new session, instance: Server_st-noir-test-uswest2b-1_8098, type: PARTICIPANT, 
session id: 10071459e840915.
   [2024-10-23 21:30:12.252857] WARN [ZKHelixManager] 
[HelixTaskExecutor-message_handle_STATE_TRANSITION:1688] zkClient to 
server1:2181,server2:2181,server3:2181 is not connected, wait for 180000ms.
   ```
   
   When I look at ZK logs for one of the session IDs, it seems like ZK is 
trying to re-validate the session, but the client terminates its connection 
before that could be done.
   
   ```
   [2024-10-23 21:30:12.021291] INFO  LeaderSessionTracker:104 - Committing 
global session 0x10071459e840915
   [2024-10-23 21:30:12.021328] INFO  LearnerSessionTracker:116 - Committing 
global session 0x10071459e840915
   [2024-10-23 21:30:12.021495] INFO  LearnerSessionTracker:116 - Committing 
global session 0x10071459e840915
   [2024-10-23 21:30:12.021550] INFO  LearnerSessionTracker:116 - Committing 
global session 0x10071459e840915
   [2024-10-23 21:30:12.021704] INFO  LearnerSessionTracker:116 - Committing 
global session 0x10071459e840915
   [2024-10-23 21:30:36.875070] WARN  NIOServerCnxn:364 - Unexpected exception
   [2024-10-23 21:30:36.875101] EndOfStreamException: Unable to read additional 
data from client, it probably closed the socket: address = 
/10.76.213.167:33230, session = 0x10071459e840915
   [2024-10-23 21:30:36.875113]         at 
org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163)
   [2024-10-23 21:30:36.875126]         at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326)
   [2024-10-23 21:30:36.875136]         at 
org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
   Show all 9 lines
   [2024-10-23 21:30:37.097688] INFO  Learner:137 - Revalidating client: 
0x10071459e840915
   [2024-10-23 21:31:23.560963] WARN  NIOServerCnxn:364 - Unexpected exception
   [2024-10-23 21:31:23.561024] EndOfStreamException: Unable to read additional 
data from client, it probably closed the socket: address = 
/10.76.213.167:56312, session = 0x10071459e840915
   [2024-10-23 21:31:23.561041]         at 
org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163)
   [2024-10-23 21:31:23.561049]         at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326)
   [2024-10-23 21:31:23.561059]         at 
org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
   Show all 9 lines
   [2024-10-23 21:32:12.194924] WARN  NIOServerCnxn:364 - Unexpected exception
   [2024-10-23 21:32:12.194955] EndOfStreamException: Unable to read additional 
data from client, it probably closed the socket: address = 
/10.76.213.167:35592, session = 0x10071459e840915
   [2024-10-23 21:32:12.194967]         at 
org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163)
   [2024-10-23 21:32:12.194975]         at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326)
   [2024-10-23 21:32:12.194986]         at 
org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
   Show all 9 lines
   [2024-10-23 21:32:12.444665] INFO  Learner:137 - Revalidating client: 
0x10071459e840915
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to