Hi guys, I have a problem with Ambari Metrics for some days : whenever I try to start Metrics collector it seems it start and then stop after a minute or two. There is nothing in the logs except something pointing to a connection problem with Zookeper, that causes the Collector to shutdown after a while.
I switched the log to DEBUG mode, here they are : 2017-06-29 11:17:33,144 DEBUG org.apache.hadoop.hbase.ipc.AbstractRpcClient: Use KERBEROS authentication for service MasterService, sasl=true 2017-06-29 11:17:33,156 DEBUG org.apache.hadoop.hbase.ipc.AbstractRpcClient: Connecting to server03.net:61300 2017-06-29 11:17:33,159 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:amshbase/[email protected] (auth:KERBEROS) from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734) 2017-06-29 11:17:33,161 DEBUG org.apache.hadoop.hbase.security.HBaseSaslRpcClient: Creating SASL GSSAPI client. Server's Kerberos principal name is amshbase/[email protected] 2017-06-29 11:17:33,167 DEBUG org.apache.hadoop.hbase.security.HBaseSaslRpcClient: Have sent token of size 685 from initSASLContext. 2017-06-29 11:17:33,183 DEBUG org.apache.hadoop.hbase.security.HBaseSaslRpcClient: Will read input token of size 108 for processing by initSASLContext 2017-06-29 11:17:33,186 DEBUG org.apache.hadoop.hbase.security.HBaseSaslRpcClient: Will send token of size 0 from initSASLContext. 2017-06-29 11:17:33,186 DEBUG org.apache.hadoop.hbase.security.HBaseSaslRpcClient: Will read input token of size 32 for processing by initSASLContext 2017-06-29 11:17:33,187 DEBUG org.apache.hadoop.hbase.security.HBaseSaslRpcClient: Will send token of size 32 from initSASLContext. 2017-06-29 11:17:33,187 DEBUG org.apache.hadoop.hbase.security.HBaseSaslRpcClient: SASL client context established. Negotiated QoP: auth 2017-06-29 11:17:33,535 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x15cf2f282460011, packet:: clientPath:null serverPath:null finished:false header:: 6,3 replyHeader:: 6,73014444188,0 request:: '/ams-hbase-unsecure,F response:: s{68719476743,68719476743,1498721274392,1498721274392,0,36,0,0,0,10,73014444180} 2017-06-29 11:17:33,536 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x15cf2f282460011, packet:: clientPath:null serverPath:null finished:false header:: 7,4 replyHeader:: 7,73014444188,0 request:: '/ams-hbase-unsecure/master,F response:: #ffffffff000146d61737465723a363133303052ffffffb4ffffffc97fffffffdfffffff99ffffffc9ffffffec50425546a29a1c6e7077626930303033732e646174612e6d657368636f72652e6e657410fffffff4ffffffde318ffffffccffffffbbffffff90ffffff99ffffffcf2b10018fffffffeffffffde3,s{73014444180,73014444180,1498727850646,1498727850646,0,0,0,170278009368018956,78,0,73014444180} 2017-06-29 11:17:33,840 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x15cf2f282460011, packet:: clientPath:null serverPath:null finished:false header:: 8,3 replyHeader:: 8,73014444188,0 request:: '/ams-hbase-unsecure,F response:: s{68719476743,68719476743,1498721274392,1498721274392,0,36,0,0,0,10,73014444180} 2017-06-29 11:17:33,842 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x15cf2f282460011, packet:: clientPath:null serverPath:null finished:false header:: 9,4 replyHeader:: 9,73014444188,0 request:: '/ams-hbase-unsecure/master,F response:: #ffffffff000146d61737465723a363133303052ffffffb4ffffffc97fffffffdfffffff99ffffffc9ffffffec50425546a29a1c6e7077626930303033732e646174612e6d657368636f72652e6e657410fffffff4ffffffde318ffffffccffffffbbffffff90ffffff99ffffffcf2b10018fffffffeffffffde3,s{73014444180,73014444180,1498727850646,1498727850646,0,0,0,170278009368018956,78,0,73014444180} 2017-06-29 11:17:34,349 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x15cf2f282460011, packet:: clientPath:null serverPath:null finished:false header:: 10,3 replyHeader:: 10,73014444192,0 request:: '/ams-hbase-unsecure,F response:: s{68719476743,68719476743,1498721274392,1498721274392,0,37,0,0,0,9,73014444190} 2017-06-29 11:17:34,350 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x15cf2f282460011, packet:: clientPath:null serverPath:null finished:false header:: 11,4 replyHeader:: 11,73014444192,-101 request:: '/ams-hbase-unsecure/master,F response:: 2017-06-29 11:17:34,355 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x6d0b5baf-0x15cf2f282460011, quorum=server01:2181,server02:2181,server03:2181, baseZNode=/ams-hbase-unsecure Unable to get data of znode /ams-hbase-unsecure/master because node does not exist (not an error) 2017-06-29 11:17:35,360 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x15cf2f282460011, packet:: clientPath:null serverPath:null finished:false header:: 12,3 replyHeader:: 12,73014444192,0 request:: '/ams-hbase-unsecure,F response:: s{68719476743,68719476743,1498721274392,1498721274392,0,37,0,0,0,9,73014444190} 2017-06-29 11:17:35,361 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x15cf2f282460011, packet:: clientPath:null serverPath:null finished:false header:: 13,4 replyHeader:: 13,73014444192,-101 request:: '/ams-hbase-unsecure/master,F response:: 2017-06-29 11:17:35,361 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x6d0b5baf-0x15cf2f282460011, quorum=server01:2181,server02:2181,server03:2181, baseZNode=/ams-hbase-unsecure Unable to get data of znode /ams-hbase-unsecure/master because node does not exist (not an error) 2017-06-29 11:17:37,369 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x15cf2f282460011, packet:: clientPath:null serverPath:null finished:false header:: 14,3 replyHeader:: 14,73014444192,0 request:: '/ams-hbase-unsecure,F response:: s{68719476743,68719476743,1498721274392,1498721274392,0,37,0,0,0,9,73014444190} 2017-06-29 11:17:37,370 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x15cf2f282460011, packet:: clientPath:null serverPath:null finished:false header:: 15,4 replyHeader:: 15,73014444192,-101 request:: '/ams-hbase-unsecure/master,F response:: I tried to dig into ZooKeeper, and it seems indeed that znode /ams-hbase-unsecure/master does not exist, but as it's not an error I don't know what is. I checked Metrics collector and AMS HBase configuration : everything seems correct, and the AMS HBase ZooKeeper part looks like HBase Zookeeper native configuration. I saw a "solution" consisting in deleting Ambari Metrics Service then installing it again, but I'm in a production cluster and that seems very painfull and risky for a bug like this, so I'd like to avoid that kind of "solution". Did someone encouter that issue ? Is it solved ? Does anyone has an idea about where that might come from ? Thanks for your help, Loïc Loïc CHANEL System Big Data engineer MS&T - Worldline Analytics Platform - Worldline (Villeurbanne, France)
