Andrew Onischuk created AMBARI-23894: ----------------------------------------
Summary: ZooKeepers Show As Down After EU to HDP 3.0 But They Are Not Key: AMBARI-23894 URL: https://issues.apache.org/jira/browse/AMBARI-23894 Project: Ambari Issue Type: Bug Reporter: Andrew Onischuk Assignee: Andrew Onischuk Fix For: 2.7.0 Attachments: AMBARI-23894.patch STR: * Perform an EU from HDP 2.6 to HDP 3.0 After, 2 of my 3 ZKs are shown as being down. However, they are actually alive on my boxes: [root@c7402 ~]$ ps aux | grep [z]oo.cfg zookeep+ 22463 0.2 2.8 3064236 53728 ? Sl 20:41 0:01 /usr/jdk64/jdk1.8.0_144/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.log.file=zookeeper-zookeeper-server-c7402.ambari.apache.org.log -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp /usr/hdp/current/zookeeper-server/bin/../build/classes:/usr/hdp/current/zookeeper-server/bin/../build/lib/*.jar:/usr/hdp/current/zookeeper-server/bin/../lib/xercesMinimal-1.9.6.2.jar:/usr/hdp/current/zookeeper-server/bin/../lib/wagon-provider-api-2.4.jar:/usr/hdp/current/zookeeper-server/bin/../lib/wagon-http-shared4-2.4.jar:/usr/hdp/current/zookeeper-server/bin/../lib/wagon-http-shared-1.0-beta-6.jar:/usr/hdp/current/zookeeper-server/bin/../lib/wagon-http-lightweight-1.0-beta-6.jar:/usr/hdp/current/zookeeper-server/bin/../lib/wagon-http-2.4.jar:/usr/hdp/current/zookeeper-server/bin/../lib/wagon-file-1.0-beta-6.jar:/usr/hdp/current/zookeeper-server/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/slf4j-api-1.6.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/plexus-utils-3.0.8.jar:/usr/hdp/current/zookeeper-server/bin/../lib/plexus-interpolation-1.11.jar:/usr/hdp/current/zookeeper-server/bin/../lib/plexus-container-default-1.0-alpha-9-stable-1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/netty-3.10.5.Final.jar:/usr/hdp/current/zookeeper-server/bin/../lib/nekohtml-1.9.6.2.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-settings-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-repository-metadata-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-project-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-profile-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-plugin-registry-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-model-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-error-diagnostics-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-artifact-manager-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-artifact-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-ant-tasks-2.1.3.jar:/usr/hdp/current/zookeeper-server/bin/../lib/log4j-1.2.16.jar:/usr/hdp/current/zookeeper-server/bin/../lib/jsoup-1.7.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/jline-0.9.94.jar:/usr/hdp/current/zookeeper-server/bin/../lib/commons-logging-1.1.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/commons-io-2.2.jar:/usr/hdp/current/zookeeper-server/bin/../lib/commons-codec-1.6.jar:/usr/hdp/current/zookeeper-server/bin/../lib/classworlds-1.1-alpha-2.jar:/usr/hdp/current/zookeeper-server/bin/../lib/backport-util-concurrent-3.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/ant-launcher-1.8.0.jar:/usr/hdp/current/zookeeper-server/bin/../lib/ant-1.8.0.jar:/usr/hdp/current/zookeeper-server/bin/../zookeeper-3.4.6.3.0.0.0-1250.jar:/usr/hdp/current/zookeeper-server/bin/../src/java/lib/*.jar:/usr/hdp/current/zookeeper-server/conf::/usr/share/zookeeper/*:/usr/share/zookeeper/* -Xmx1024m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/hdp/current/zookeeper-server/conf/zoo.cfg [root@c7402 ~]$ telnet localhost 2181 Trying ::1... Connected to localhost. Escape character is '^]'. ^CConnection closed by foreign host. But you can see that we clearly think it's down on c7402: { "href" : "http://localhost:8080/api/v1/clusters/c1/hosts/c7402.ambari.apache.org/host_components/ZOOKEEPER_SERVER", "HostRoles" : { "cluster_name" : "c1", "component_name" : "ZOOKEEPER_SERVER", "desired_repository_version" : "3.0.0.0-1250", "desired_stack_id" : "HDP-3.0", "desired_state" : "STARTED", "display_name" : "ZooKeeper Server", "host_name" : "c7402.ambari.apache.org", "maintenance_state" : "OFF", "public_host_name" : "c7402.ambari.apache.org", "reload_configs" : false, "service_name" : "ZOOKEEPER", "stale_configs" : false, "state" : "INSTALLED", "upgrade_state" : "NONE", "version" : "3.0.0.0-1250", "actual_configs" : { } }, "host" : { "href" : "http://localhost:8080/api/v1/clusters/c1/hosts/c7402.ambari.apache.org" }, "component" : [ { "href" : "http://localhost:8080/api/v1/clusters/c1/services/ZOOKEEPER/components/ZOOKEEPER_SERVER", "ServiceComponentInfo" : { "cluster_name" : "c1", "component_name" : "ZOOKEEPER_SERVER", "service_name" : "ZOOKEEPER" } } ], "processes" : [ ] } The PID file looks correct: [root@c7402 zookeeper]$ cat /var/run/zookeeper/zookeeper_server.pid 22463 -- This message was sent by Atlassian JIRA (v7.6.3#76005)