So I was able to get this to work by doing 2 things:
- Upgraded helix version to 0.7.1 from 0.6.5 and result client side changes - Changes the znode when security is enabled (separate znode for secure / unsecure) I do not see any ACLs set by Helix on the znode so still verifying if change # 2 is necessary. BR, Sid ________________________________ From: Siddharth Wagle Sent: Wednesday, July 13, 2016 11:24 AM To: [email protected] Subject: Re: MIT-Kerberos support for ZkHelixAdmin Hi Kishore, Quick summary of what I am doing (Controller and Participant are the same jvm): - Instantiate ZkHelixAdmin and create a cluster - Add the host as an instance to the cluster - Add the state model def - Add resources and rebalance - Start the participant - Start the controller https://github.com/apache/ambari/blob/trunk/ambari-metrics/ambari-metrics-timelineservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/metrics/timeline/availability/MetricCollectorHAController.java So I have everything up to the Controller part working. When security is enabled the ZkHelixManger for Controller stops working and throws following error: 2016-07-13 17:56:41,263 INFO org.apache.helix.manager.zk.ZKHelixManager: KeeperState: SyncConnected, zookeeper:State:CONNECTED Timeout:30000 sessionid:0x355e190123e001d local:/10.240.0.32:54434 remoteserver:ambari-sid-3.c.pramod-thangali.internal/10.240.0.30:2181 lastZxid:8589 935298 xid:9 sent:9 recv:11 queuedpkts:0 pendingresp:0 queuedevents:0 2016-07-13 17:57:41,264 ERROR org.apache.helix.manager.zk.ZKHelixManager: fail to connect zkserver: ambari-sid-1.c.pramod-thangali.internal:2181,ambari-sid-2.c.pramod-thangali.internal:2181,ambari-sid-3.c.pramod-thangali.internal:2181 in 60000ms. expiredSessionId: null, clusterName: ambari-metrics-cluster (Re-throws same exception and never gets out of this state). Zookeeper server logs do not indicate an incoming client connection. If I put a breakpoint which forces a new client session to Zookeeper for the CONTROLLER what I am seeing is, 2016-07-13 18:04:39,994 INFO org.apache.helix.manager.zk.ZKHelixManager: KeeperState: SyncConnected, zookeeper:State:CONNECTED Timeout:30000 sessionid:0x255e190122a0025 local:null remoteserver:null lastZxid:0 xid:2 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0 2016-07-13 18:04:39,994 INFO org.apache.helix.manager.zk.ZKHelixManager: KeeperState:Disconnected, disconnectedSessionId: 255e190122a0025, instance: ambari-sid-5.c.pramod-thangali.internal, type: CONTROLLER 2016-07-13 18:04:39,995 ERROR org.apache.helix.manager.zk.ZKHelixManager: fail to createClient. org.apache.helix.HelixException: Cluster structure is not set up for cluster: ambari-metrics-cluster Any idea what is going on? I see a instance type called CONTROLLER_PARTICIPANT, is this something that will allow not having to create a separate session for the controller during initialization. Best Regards, Sid ________________________________ From: kishore g <[email protected]> Sent: Wednesday, July 13, 2016 12:52 AM To: [email protected] Subject: Re: MIT-Kerberos support for ZkHelixAdmin Does not look like standard system variables used by Zookeeper. Take a look at this wiki https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide export CLIENT_JVMFLAGS=" -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty -Dzookeeper.client.secure=true -Dzookeeper.ssl.keyStore.location=/root/zookeeper/ssl/testKeyStore.jks -Dzookeeper.ssl.keyStore.password=testpass -Dzookeeper.ssl.trustStore.location=/root/zookeeper/ssl/testTrustStore.jks -Dzookeeper.ssl.trustStore.password=testpass" On Tue, Jul 12, 2016 at 8:12 PM, Siddharth Wagle <[email protected]<mailto:[email protected]>> wrote: Thanks Kishore, appreciate the help. I do have a jass.conf on the class path which works for Phoenix client connecting to ZK (in the same jvm) but does not work for Helix: -Djava.security.auth.login.config=/etc/ams-hbase/conf/ams_collector_jaas.conf [root@ambari-sid-4 ~]# cat /etc/ams-hbase/conf/ams_collector_jaas.conf Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true useTicketCache=false keyTab="/etc/security/keytabs/ams.collector.keytab" principal="amshbase/[email protected]<mailto:[email protected]>"; }; ________________________________ From: kishore g <[email protected]<mailto:[email protected]>> Sent: Tuesday, July 12, 2016 6:36 PM To: [email protected]<mailto:[email protected]> Subject: Re: MIT-Kerberos support for ZkHelixAdmin We haven't tried ZK with authentication. I think ZK authentication can be enabled by setting system properties. Will take a look at it and get back to you On Tue, Jul 12, 2016 at 5:12 PM, Siddharth Wagle <[email protected]<mailto:[email protected]>> wrote: Hi, I am working on Ambari Metrics System HA, https://issues.apache.org/jira/browse/AMBARI-15901 and using Helix for task partitioning as well as service discovery. The issue I am facing is that as soon as I enable Kerberos, Helix stops working as it cannot connect to the secure Zookeeper. Are there any examples or recommendations of how to get the ZkHelixAdmin to work with secure Zookeeper. I was unable to find any mention of this in the codebase. Thanks, Sid.
