vishalsuvagia opened a new issue, #3071:
URL: https://github.com/apache/helix/issues/3071

   ### Describe the bug
   Apache Ambari Metrics is using Helix for cluster management tasks. Recently 
tried to upgrade the Helix dependency from 0.6.6 to 1.3.2 / 1.4.3; however, we 
are seeing a failure in Metrics Collector  startup when the Hadoop cluster is 
deployed in kerberos enabled mode with the newer version of Helix.
   
   Based on the investigation, I would like to pin down the issues because of 
the change in the Helix Core Zk initialisation which fails to create the 
zookeeper client and service shutdown is triggered with below error in the 
trace.
   
   > 2025-09-17 10:54:29,633 WARN org.apache.helix.manager.zk.ZKHelixManager: 
**zkClient to testnode01.mycluster.org:2181 is not connected**, wait for 
10000ms.
   > 2025-09-17 10:54:39,635 ERROR org.apache.helix.manager.zk.ZKHelixManager: 
**zkClient is not connected after waiting 10000ms**., > clusterName: 
ambari-metrics-cluster, zkAddress: testnode01.mycluster.org:2181
   > **ERROR org.apache.helix.manager.zk.ZKHelixManager: fail to createClient. 
retry 1**
   > org.apache.helix.HelixException: HelixManager is not connected within 
retry timeout for cluster ambari-metrics-cluster
   >         at 
org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:417)
   >         at 
org.apache.helix.manager.zk.ZKHelixManager.getConfigAccessor(ZKHelixManager.java:687)
   >         at 
org.apache.helix.manager.zk.ParticipantManager.<init>(ParticipantManager.java:118)
   >         at 
org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:1440)
   >         at 
org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:1390)
   >         at 
org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:782)
   >         at 
org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:817)
   >         at 
org.apache.ambari.metrics.core.timeline.availability.AggregationTaskRunner.initialize(AggregationTaskRunner.java:135)
   >         at 
org.apache.ambari.metrics.core.timeline.availability.MetricCollectorHAController.startAggregators(MetricCollectorHAController.java:205)
   >         at 
org.apache.ambari.metrics.core.timeline.availability.MetricCollectorHAController.initializeHAController(MetricCollectorHAController.java:184)
   >         at 
org.apache.ambari.metrics.core.timeline.HBaseTimelineMetricsService.initializeSubsystem(HBaseTimelineMetricsService.java:133)
   >         at 
org.apache.ambari.metrics.core.timeline.HBaseTimelineMetricsService.serviceInit(HBaseTimelineMetricsService.java:102)
   
   I am trying to understand the change in behaviour from the library side and 
appropriate fix for the issue and tried few approaches by trying to set zk 
timeout with system properties, -D arguments and setting helix.zk [session and 
connection](https://github.com/apache/helix/blob/master/helix-common/src/main/java/org/apache/helix/SystemPropertyKeys.java#L51-L53)
 timeouts, rewriting ZkHelixManager object initialisation by adding a 
RealmAwareZkClient, RealmAwareZkClientConfig, CloudConfig and  
HelixManagerProperty object instances using required parameters, but so far 
none seem to have worked.  Request to kindly help and guide with an appropriate 
fix for the issue.
   For reference(Apache Ambari Metrics Helix upgrade 
https://github.com/apache/ambari-metrics/pull/173)
   
   cc: @jackjlli / @Jackie-Jiang
   ### To Reproduce
   Steps to reproduce the behavior.
   
   ### Expected behavior
   A clear and concise description of what you expected to happen.
   
   ### Additional context
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to