Hi all, I'm running HBase 1.2.4 on a distributed setup with 12 virtual machines on the same local network. The "main" node (node26.example.com) runs the HMaster, while the other 11 machines run RegionServers. No backup HMaster. This cluster also runs Hadoop 2.7.2 smoothly.
Both HBase shell and accessing through the HBase client API run properly, and even importing a TSV file into a table with org.apache.hadoop.hbase.mapreduce.ImportTsv succeeds, remaining registered in the YARN history. But the problem appears when trying to access from a MapReduce job to an HBase table (using Hadoop 2.7.2). Here a minimal code that produces the issue by connecting and scanning an HBase table: http://pastebin.com/pm8tbbTq. The maps hang until timeout and then retries deploying new maps until failing, each map showing the following messages in the syslog: ---------- 2017-01-24 20:04:07,904 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2017-01-24 20:04:08,186 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2017-01-24 20:04:08,186 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started 2017-01-24 20:04:08,193 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens: 2017-01-24 20:04:08,193 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1485182272940_0336, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@6f03482) 2017-01-24 20:04:08,450 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now. 2017-01-24 20:04:08,994 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /tmp/hadoop-hdadmin/nm-local-dir/usercache/idstest/appcache/application_1485182272940_0336 2017-01-24 20:04:09,617 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 2017-01-24 20:04:10,968 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1 2017-01-24 20:04:11,044 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2017-01-24 20:04:11,455 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://node26.example.com/user/idstest/data/articles-50/99322.txt:0+3757 2017-01-24 20:04:11,596 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 2017-01-24 20:04:11,596 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100 2017-01-24 20:04:11,596 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 83886080 2017-01-24 20:04:11,596 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600 2017-01-24 20:04:11,596 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600 2017-01-24 20:04:11,612 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2017-01-24 20:04:12,211 INFO [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x650eab8 connecting to ZooKeeper ensemble=localhost:2181 2017-01-24 20:04:12,234 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2017-01-24 20:04:12,234 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:host.name=node27.example.com 2017-01-24 20:04:12,234 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.8.0_77-Debian 2017-01-24 20:04:12,234 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 2017-01-24 20:04:12,234 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre 2017-01-24 20:04:12,235 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.class.path= /* ... */ 2017-01-24 20:04:12,235 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/tmp/hadoop-hdadmin/nm-local-dir/usercache/idstest/appcache/application_1485182272940_0336/container_1485182272940_0336_01_000048:/opt/hadoop-2.7.2/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib 2017-01-24 20:04:12,235 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp/hadoop-hdadmin/nm-local-dir/usercache/idstest/appcache/application_1485182272940_0336/container_1485182272940_0336_01_000048/tmp 2017-01-24 20:04:12,235 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 2017-01-24 20:04:12,235 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux 2017-01-24 20:04:12,235 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64 2017-01-24 20:04:12,235 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.16.0-4-amd64 2017-01-24 20:04:12,235 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.name=hdadmin 2017-01-24 20:04:12,235 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/hdadmin 2017-01-24 20:04:12,235 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/tmp/hadoop-hdadmin/nm-local-dir/usercache/idstest/appcache/application_1485182272940_0336/container_1485182272940_0336_01_000048 2017-01-24 20:04:12,235 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x650eab80x0, quorum=localhost:2181, baseZNode=/hbase 2017-01-24 20:04:12,322 INFO [main-SendThread(node27.example.com:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server node27.example.com/192.168.0.17:2181. Will not attempt to authenticate using SASL (unknown error) 2017-01-24 20:04:12,323 INFO [main-SendThread(node27.example.com:2181)] org.apache.zookeeper.ClientCnxn: Socket connection established to node27.example.com/192.168.0.17:2181, initiating session 2017-01-24 20:04:12,331 INFO [main-SendThread(node27.example.com:2181)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server node27.example.com/192.168.0.17:2181, sessionid = 0x159d07980b50189, negotiated timeout = 90000 2017-01-24 20:04:51,657 INFO [hconnection-0x650eab8-metaLookup-shared--pool2-t1] org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=38852 ms ago, cancelled=false, msg=row 'idstest1,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node149.example.com,16020,1485261342943, seqNum=0 2017-01-24 20:05:01,664 INFO [hconnection-0x650eab8-metaLookup-shared--pool2-t1] org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=48860 ms ago, cancelled=false, msg=row 'idstest1,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node149.example.com,16020,1485261342943, seqNum=0 2017-01-24 20:05:40,044 INFO [hconnection-0x650eab8-metaLookup-shared--pool2-t2] org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=38273 ms ago, cancelled=false, msg=row 'idstest1,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node149.example.com,16020,1485261342943, seqNum=0 2017-01-24 20:05:50,059 INFO [hconnection-0x650eab8-metaLookup-shared--pool2-t2] org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=48288 ms ago, cancelled=false, msg=row 'idstest1,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node149.example.com,16020,1485261342943, seqNum=0 2017-01-24 20:06:28,543 INFO [hconnection-0x650eab8-metaLookup-shared--pool2-t3] org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=38273 ms ago, cancelled=false, msg=row 'idstest1,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node149.example.com,16020,1485261342943, seqNum=0 2017-01-24 20:06:38,632 INFO [hconnection-0x650eab8-metaLookup-shared--pool2-t3] org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=48362 ms ago, cancelled=false, msg=row 'idstest1,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node149.example.com,16020,1485261342943, seqNum=0 ... ---------- My hbase-site.xml in the Master node node26.example.com (almost same for the other nodes, just referring as 0.0.0.0 to themselves): ---------- <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://node26.example.com:8020/hbase</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/hdadmin/zookeeper</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>node26.example.com,node27.example.com,node144.example.com,node145.example.com,node146.example.com,node147.example.com,node148.example.com,node149.example.com,node150.example.com,node151.example.com,node152.example.com,node153.example.com</value> </property> <property> <name>hbase.zookeeper.property.server.0</name> <value>0.0.0.0:2888:3888</value> </property> <property> <name>hbase.zookeeper.property.server.1</name> <value>node27.example.com:2888:3888</value> </property> <property> <name>hbase.zookeeper.property.server.2</name> <value>node144.example.com:2888:3888</value> </property> <property> <name>hbase.zookeeper.property.server.3</name> <value>node145.example.com:2888:3888</value> </property> <property> <name>hbase.zookeeper.property.server.4</name> <value>node146.example.com:2888:3888</value> </property> <property> <name>hbase.zookeeper.property.server.5</name> <value>node147.example.com:2888:3888</value> </property> <property> <name>hbase.zookeeper.property.server.6</name> <value>node148.example.com:2888:3888</value> </property> <property> <name>hbase.zookeeper.property.server.7</name> <value>node149.example.com:2888:3888</value> </property> <property> <name>hbase.zookeeper.property.server.8</name> <value>node150.example.com:2888:3888</value> </property> <property> <name>hbase.zookeeper.property.server.9</name> <value>node151.example.com:2888:3888</value> </property> <property> <name>hbase.zookeeper.property.server.10</name> <value>node152.example.com:2888:3888</value> </property> <property> <name>hbase.zookeeper.property.server.11</name> <value>node153.example.com:2888:3888</value> </property> </configuration> ---------- After starting HBase using the start-hbase.sh script, the RegionServer logs have no warnings or errors, but the different ZooKeeper instances generate diverse outputs. 3 examples: - The master node has an empty log: ---------- Tue Jan 24 20:52:38 CET 2017 Starting zookeeper on node26.example.com core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 128914 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 65536 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 128914 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited ---------- - node153.example.com has the following small ZooKeeper log: ---------- Tue Jan 24 20:52:38 CET 2017 Starting zookeeper on node153.example.com core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 128914 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 65536 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 128914 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 2017-01-24 20:52:39,816 WARN [main] quorum.QuorumPeerConfig: Non-optimial configuration, consider an odd number of servers. 2017-01-24 20:52:39,817 INFO [main] quorum.QuorumPeerConfig: Defaulting to majority quorums 2017-01-24 20:52:39,984 INFO [main] quorum.QuorumPeerMain: Starting quorum peer 2017-01-24 20:52:39,994 INFO [main] server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181 ---------- As an extra hint, this node is not generating the /tmp/hbase-hdadmin-zookeeper.pid, so it shows an error when running the stop-hbase.sh script. - And node144.example.com (as other nodes) has the following ZooKeeper log with "connection refused" errors: ---------- Tue Jan 24 20:52:38 CET 2017 Starting zookeeper on node144.example.com core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 128914 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 65536 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 128914 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 2017-01-24 20:52:39,473 WARN [main] quorum.QuorumPeerConfig: Non-optimial configuration, consider an odd number of servers. 2017-01-24 20:52:39,473 INFO [main] quorum.QuorumPeerConfig: Defaulting to majority quorums 2017-01-24 20:52:39,567 INFO [main] quorum.QuorumPeerMain: Starting quorum peer 2017-01-24 20:52:39,581 INFO [main] server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181 2017-01-24 20:52:39,601 INFO [main] quorum.QuorumPeer: tickTime set to 3000 2017-01-24 20:52:39,601 INFO [main] quorum.QuorumPeer: minSessionTimeout set to -1 2017-01-24 20:52:39,601 INFO [main] quorum.QuorumPeer: maxSessionTimeout set to 90000 2017-01-24 20:52:39,601 INFO [main] quorum.QuorumPeer: initLimit set to 10 2017-01-24 20:52:39,613 INFO [main] persistence.FileSnap: Reading snapshot /home/hdadmin/zookeeper2/version-2/snapshot.1d0000023f 2017-01-24 20:52:39,714 INFO [Thread-2] quorum.QuorumCnxManager: My election bind port: node144.example.com/192.168.0.19:3888 2017-01-24 20:52:39,729 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.QuorumPeer: LOOKING 2017-01-24 20:52:39,731 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.FastLeaderElection: New election. My id = 2, proposed zxid=0x1e0000258d 2017-01-24 20:52:39,740 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 2 (n.leader), 0x1e0000258d (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,740 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 0 (n.leader), 0x1e0000258d (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,742 WARN [WorkerSender[myid=2]] quorum.QuorumCnxManager: Cannot open channel to 3 at election address node145.example.com/192.168.0.18:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:745) 2017-01-24 20:52:39,744 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 1 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,745 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 1 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,745 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 2 (n.leader), 0x1e0000258d (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,746 INFO [WorkerSender[myid=2]] quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (4, 2) 2017-01-24 20:52:39,747 INFO [WorkerSender[myid=2]] quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (5, 2) 2017-01-24 20:52:39,747 INFO [node144.example.com/192.168.0.19:3888] quorum.QuorumCnxManager: Received connection request /192.168.0.21:40064 2017-01-24 20:52:39,747 INFO [node144.example.com/192.168.0.19:3888] quorum.QuorumCnxManager: Received connection request /192.168.0.22:54362 2017-01-24 20:52:39,747 INFO [WorkerSender[myid=2]] quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (6, 2) 2017-01-24 20:52:39,748 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 4 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,748 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 5 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,748 INFO [node144.example.com/192.168.0.19:3888] quorum.QuorumCnxManager: Received connection request /192.168.0.23:53410 2017-01-24 20:52:39,749 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 5 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,749 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 0 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,749 INFO [WorkerSender[myid=2]] quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (7, 2) 2017-01-24 20:52:39,749 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 4 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,749 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 6 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,750 INFO [node144.example.com/192.168.0.19:3888] quorum.QuorumCnxManager: Received connection request /192.168.0.25:33085 2017-01-24 20:52:39,750 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 6 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,751 WARN [WorkerSender[myid=2]] quorum.QuorumCnxManager: Cannot open channel to 8 at election address node150.example.com/192.168.0.24:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:745) 2017-01-24 20:52:39,751 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 7 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,752 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 7 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,753 INFO [WorkerSender[myid=2]] quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (9, 2) 2017-01-24 20:52:39,753 INFO [WorkerSender[myid=2]] quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (10, 2) 2017-01-24 20:52:39,753 INFO [node144.example.com/192.168.0.19:3888] quorum.QuorumCnxManager: Received connection request /192.168.0.26:35997 2017-01-24 20:52:39,754 INFO [WorkerSender[myid=2]] quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (11, 2) 2017-01-24 20:52:39,755 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 2 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,755 INFO [WorkerSender[myid=2]] quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (3, 2) 2017-01-24 20:52:39,756 WARN [WorkerSender[myid=2]] quorum.QuorumCnxManager: Cannot open channel to 8 at election address node150.example.com/192.168.0.24:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:745) 2017-01-24 20:52:39,757 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 9 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,757 INFO [WorkerSender[myid=2]] quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (10, 2) 2017-01-24 20:52:39,757 INFO [WorkerSender[myid=2]] quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (11, 2) 2017-01-24 20:52:39,758 INFO [node144.example.com/192.168.0.19:3888] quorum.QuorumCnxManager: Received connection request /192.168.0.27:34404 2017-01-24 20:52:39,759 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 9 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,768 INFO [node144.example.com/192.168.0.19:3888] quorum.QuorumCnxManager: Received connection request /192.168.0.18:59032 2017-01-24 20:52:39,769 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 11 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,769 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 3 (n.leader), 0x1e0000258d (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,776 INFO [node144.example.com/192.168.0.19:3888] quorum.QuorumCnxManager: Received connection request /192.168.0.27:34405 2017-01-24 20:52:39,776 WARN [RecvWorker:11] quorum.QuorumCnxManager: Connection broken for id 11, my id = 2, error = java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765) 2017-01-24 20:52:39,777 WARN [RecvWorker:11] quorum.QuorumCnxManager: Interrupting SendWorker 2017-01-24 20:52:39,777 WARN [SendWorker:11] quorum.QuorumCnxManager: Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849) at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685) 2017-01-24 20:52:39,777 INFO [node144.example.com/192.168.0.19:3888] quorum.QuorumCnxManager: Received connection request /192.168.0.28:57974 2017-01-24 20:52:39,778 WARN [SendWorker:11] quorum.QuorumCnxManager: Send worker leaving thread 2017-01-24 20:52:39,778 INFO [node144.example.com/192.168.0.19:3888] quorum.QuorumCnxManager: Received connection request /192.168.0.28:57975 2017-01-24 20:52:39,778 WARN [RecvWorker:10] quorum.QuorumCnxManager: Connection broken for id 10, my id = 2, error = java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765) 2017-01-24 20:52:39,778 WARN [RecvWorker:10] quorum.QuorumCnxManager: Interrupting SendWorker 2017-01-24 20:52:39,778 WARN [SendWorker:10] quorum.QuorumCnxManager: Exception when using channel: for id 10 my id = 2 error = java.net.SocketException: Broken pipe 2017-01-24 20:52:39,778 WARN [SendWorker:10] quorum.QuorumCnxManager: Send worker leaving thread 2017-01-24 20:52:39,779 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 3 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,782 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 10 (n.leader), 0x1e0000258d (n.zxid), 0x1 (n.round), LOOKING (n.state), 10 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,788 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 11 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,789 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 10 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,815 INFO [node144.example.com/192.168.0.19:3888] quorum.QuorumCnxManager: Received connection request /192.168.0.24:56503 2017-01-24 20:52:39,829 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 8 (n.leader), 0x1e0000258d (n.zxid), 0x1 (n.round), LOOKING (n.state), 8 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:39,829 INFO [WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 1 (message format version), 11 (n.leader), 0x1e0000258d (n.zxid), 0x9 (n.round), LOOKING (n.state), 8 (n.sid), 0x1e (n.peerEpoch) LOOKING (my state) 2017-01-24 20:52:40,030 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.QuorumPeer: FOLLOWING 2017-01-24 20:52:40,035 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.Learner: TCP NoDelay set to: true 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:host.name=node144.example.com 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:java.version=1.8.0_77-Debian 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:java.vendor=Oracle Corporation 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:java.class.path= /* ... */ 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:java.io.tmpdir=/tmp 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:java.compiler=<NA> 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:os.name=Linux 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:os.arch=amd64 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:os.version=3.16.0-4-amd64 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:user.name=hdadmin 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:user.home=/home/hdadmin 2017-01-24 20:52:40,040 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:user.dir=/opt/hbase-1.2.4 2017-01-24 20:52:40,042 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 90000 datadir /home/hdadmin/zookeeper2/version-2 snapdir /home/hdadmin/zookeeper2/version-2 2017-01-24 20:52:40,043 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.Learner: FOLLOWING - LEADER ELECTION TOOK - 311 2017-01-24 20:52:40,046 WARN [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.Learner: Unexpected exception, tries=0, connecting to node153.example.com/192.168.0.27:2888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:225) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786) 2017-01-24 20:52:41,073 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.Learner: Getting a diff from the leader 0x1e0000258d 2017-01-24 20:52:41,116 INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] persistence.FileTxnSnapLog: Snapshotting: 0x1e0000258d to /home/hdadmin/zookeeper2/version-2/snapshot.1e0000258d 2017-01-24 20:52:42,408 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.29:44336 2017-01-24 20:52:42,418 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.29:44336 2017-01-24 20:52:42,421 WARN [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.Learner: Got zxid 0x1f00000001 expected 0x1 2017-01-24 20:52:42,421 INFO [SyncThread:2] persistence.FileTxnLog: Creating new log file: log.1f00000001 2017-01-24 20:52:42,440 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750000 with negotiated timeout 90000 for client /192.168.0.29:44336 2017-01-24 20:52:42,734 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.22:43377 2017-01-24 20:52:42,737 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.22:43377 2017-01-24 20:52:42,742 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750001 with negotiated timeout 90000 for client /192.168.0.22:43377 2017-01-24 20:52:43,257 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.23:59005 2017-01-24 20:52:43,258 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.23:59005 2017-01-24 20:52:43,262 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750002 with negotiated timeout 90000 for client /192.168.0.23:59005 ---------- I've manually checked connecting to the most relevant ports of HBase and ZooKeeper (using nc or telnet) and always succeeded. During a job execution, these are the type of errors that I'm getting: ---------- 2017-01-24 21:10:37,187 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.29:44750 2017-01-24 21:10:37,193 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.29:44750 2017-01-24 21:10:37,198 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750003 with negotiated timeout 90000 for client /192.168.0.29:44750 2017-01-24 21:10:38,136 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750003, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:10:38,137 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.29:44750 which had sessionid 0x259d20997750003 2017-01-24 21:18:45,469 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49706 2017-01-24 21:18:45,479 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49706 2017-01-24 21:18:45,543 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750004 with negotiated timeout 90000 for client /192.168.0.19:49706 2017-01-24 21:18:45,684 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49708 2017-01-24 21:18:45,689 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49708 2017-01-24 21:18:45,820 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49709 2017-01-24 21:18:45,824 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49709 2017-01-24 21:18:45,825 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750005 with negotiated timeout 90000 for client /192.168.0.19:49708 2017-01-24 21:18:45,840 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750006 with negotiated timeout 90000 for client /192.168.0.19:49709 2017-01-24 21:18:45,892 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49710 2017-01-24 21:18:45,896 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49710 2017-01-24 21:18:45,965 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750007 with negotiated timeout 90000 for client /192.168.0.19:49710 2017-01-24 21:18:46,010 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49711 2017-01-24 21:18:46,012 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49711 2017-01-24 21:18:46,065 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49712 2017-01-24 21:18:46,067 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49712 2017-01-24 21:18:46,133 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750008 with negotiated timeout 90000 for client /192.168.0.19:49711 2017-01-24 21:18:46,169 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750009 with negotiated timeout 90000 for client /192.168.0.19:49712 2017-01-24 21:18:46,571 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49713 2017-01-24 21:18:46,578 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49713 2017-01-24 21:18:46,655 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d2099775000a with negotiated timeout 90000 for client /192.168.0.19:49713 2017-01-24 21:18:46,875 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49714 2017-01-24 21:18:46,877 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49714 2017-01-24 21:18:46,894 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d2099775000b with negotiated timeout 90000 for client /192.168.0.19:49714 2017-01-24 21:29:04,319 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750006, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:29:04,320 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49709 which had sessionid 0x259d20997750006 2017-01-24 21:29:04,344 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d2099775000a, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:29:04,345 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49713 which had sessionid 0x259d2099775000a 2017-01-24 21:29:04,361 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750004, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:29:04,361 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49706 which had sessionid 0x259d20997750004 2017-01-24 21:29:04,381 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750008, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:29:04,381 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49711 which had sessionid 0x259d20997750008 2017-01-24 21:29:04,392 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d2099775000b, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:29:04,393 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49714 which had sessionid 0x259d2099775000b 2017-01-24 21:29:04,412 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750005, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:29:04,413 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49708 which had sessionid 0x259d20997750005 2017-01-24 21:29:04,425 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750007, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:29:04,426 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49710 which had sessionid 0x259d20997750007 2017-01-24 21:29:04,445 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750009, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:29:04,446 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49712 which had sessionid 0x259d20997750009 2017-01-24 21:29:10,042 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49761 2017-01-24 21:29:10,044 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49761 2017-01-24 21:29:10,051 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d2099775000c with negotiated timeout 90000 for client /192.168.0.19:49761 2017-01-24 21:29:12,150 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49765 2017-01-24 21:29:12,152 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49765 2017-01-24 21:29:12,165 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d2099775000d with negotiated timeout 90000 for client /192.168.0.19:49765 2017-01-24 21:29:13,332 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49769 2017-01-24 21:29:13,334 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49769 2017-01-24 21:29:13,343 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d2099775000e with negotiated timeout 90000 for client /192.168.0.19:49769 2017-01-24 21:29:13,933 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49772 2017-01-24 21:29:13,935 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49772 2017-01-24 21:29:13,939 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d2099775000f with negotiated timeout 90000 for client /192.168.0.19:49772 2017-01-24 21:39:34,294 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d2099775000e, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:39:34,295 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49769 which had sessionid 0x259d2099775000e 2017-01-24 21:39:34,314 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d2099775000d, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:39:34,315 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49765 which had sessionid 0x259d2099775000d 2017-01-24 21:39:34,343 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d2099775000f, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:39:34,343 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49772 which had sessionid 0x259d2099775000f 2017-01-24 21:39:34,366 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d2099775000c, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:39:34,367 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49761 which had sessionid 0x259d2099775000c 2017-01-24 21:39:38,640 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49817 2017-01-24 21:39:38,642 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49817 2017-01-24 21:39:38,647 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750010 with negotiated timeout 90000 for client /192.168.0.19:49817 2017-01-24 21:39:40,386 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49821 2017-01-24 21:39:40,392 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49821 2017-01-24 21:39:40,399 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750011 with negotiated timeout 90000 for client /192.168.0.19:49821 2017-01-24 21:39:41,708 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49825 2017-01-24 21:39:41,712 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49825 2017-01-24 21:39:41,716 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750012 with negotiated timeout 90000 for client /192.168.0.19:49825 2017-01-24 21:39:42,420 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49828 2017-01-24 21:39:42,421 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49828 2017-01-24 21:39:42,426 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750013 with negotiated timeout 90000 for client /192.168.0.19:49828 2017-01-24 21:50:04,309 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750013, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:50:04,309 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49828 which had sessionid 0x259d20997750013 2017-01-24 21:50:04,328 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750012, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:50:04,329 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49825 which had sessionid 0x259d20997750012 2017-01-24 21:50:04,384 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750010, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:50:04,387 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49817 which had sessionid 0x259d20997750010 2017-01-24 21:50:04,396 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750011, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 21:50:04,397 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49821 which had sessionid 0x259d20997750011 2017-01-24 21:50:08,380 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49857 2017-01-24 21:50:08,382 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49857 2017-01-24 21:50:08,387 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750014 with negotiated timeout 90000 for client /192.168.0.19:49857 2017-01-24 21:50:09,791 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49862 2017-01-24 21:50:09,792 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49862 2017-01-24 21:50:09,799 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750015 with negotiated timeout 90000 for client /192.168.0.19:49862 2017-01-24 21:50:11,381 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49866 2017-01-24 21:50:11,383 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49866 2017-01-24 21:50:11,390 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750016 with negotiated timeout 90000 for client /192.168.0.19:49866 2017-01-24 21:50:12,189 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.0.19:49869 2017-01-24 21:50:12,191 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.0.19:49869 2017-01-24 21:50:12,196 INFO [CommitProcessor:2] server.ZooKeeperServer: Established session 0x259d20997750017 with negotiated timeout 90000 for client /192.168.0.19:49869 2017-01-24 22:00:34,306 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750015, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 22:00:34,307 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49862 which had sessionid 0x259d20997750015 2017-01-24 22:00:34,330 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750017, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 22:00:34,337 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49869 which had sessionid 0x259d20997750017 2017-01-24 22:00:34,355 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750016, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 22:00:34,356 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49866 which had sessionid 0x259d20997750016 2017-01-24 22:00:34,369 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x259d20997750014, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-01-24 22:00:34,369 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.0.19:49857 which had sessionid 0x259d20997750014 ---------- The RegionServer logs are clean, no warnings. I assume this is a ZooKeeper problem more than HBase, but I tried many different configurations and nothing worked. Does anyone have an idea what could be happening? Best, Hernán.
