I have observed a very weird behavior that crashes zookeepers.

I have one job that uses 100 reducers to perform puts. The same code works fine 
with all other users/data, but just on a particular user with its particular 
data, it crashes zookeepers and bring down the entire cluster (wooh).

The log shows 140 “Opening socket connection” operation on just one failed 
attempt as shown below. I can understand that 140 * 100 = 14K connections could 
crash zookeepers, but my question is what cause it to open so many connections? 
The other successful run with the same logic opens one or two connections.


My code just do “connection.getTable(TableName)” five times to create five 
tables for each attempt.



Here is a part of log, showing many “Opening socket connections”.

==========================================================

2016-09-07 15:43:56,745 INFO 
[main-SendThread(hqhd02nm01.pclc0.merkle.local:2181)] 
org.apache.zookeeper.ClientCnxn: Opening socket connection to server 
hqhd02nm01.pclc0.merkle.local/10.129.8.13:2181. Will not attempt to 
authenticate using SASL (unknown error)
2016-09-07 15:43:56,746 INFO 
[main-SendThread(hqhd02nm01.pclc0.merkle.local:2181)] 
org.apache.zookeeper.ClientCnxn: Socket connection established to 
hqhd02nm01.pclc0.merkle.local/10.129.8.13:2181, initiating session
2016-09-07 15:43:56,747 INFO 
[main-SendThread(hqhd02nm01.pclc0.merkle.local:2181)] 
org.apache.zookeeper.ClientCnxn: Session establishment complete on server 
hqhd02nm01.pclc0.merkle.local/10.129.8.13:2181, sessionid = 0x257027220206f90, 
negotiated timeout = 120000
2016-09-07 15:44:22,530 INFO [main] 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process 
identifier=hconnection-0x2d90f90b connecting to ZooKeeper 
ensemble=hqhd02nm01.pclc0.merkle.local:2181,hqhd02nm02.pclc0.merkle.local:2181,hqhd02ed01.pclc0.merkle.local:2181
2016-09-07 15:44:22,530 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating 
client connection, 
connectString=hqhd02nm01.pclc0.merkle.local:2181,hqhd02nm02.pclc0.merkle.local:2181,hqhd02ed01.pclc0.merkle.local:2181
 sessionTimeout=120000 watcher=hconnection-0x2d90f90b0x0, 
quorum=hqhd02nm01.pclc0.merkle.local:2181,hqhd02nm02.pclc0.merkle.local:2181,hqhd02ed01.pclc0.merkle.local:2181,
 baseZNode=/hbase-secure
2016-09-07 15:44:22,530 INFO 
[main-SendThread(hqhd02nm01.pclc0.merkle.local:2181)] 
org.apache.zookeeper.ClientCnxn: Opening socket connection to server 
hqhd02nm01.pclc0.merkle.local/10.129.8.13:2181. Will not attempt to 
authenticate using SASL (unknown error)
2016-09-07 15:44:22,531 INFO 
[main-SendThread(hqhd02nm01.pclc0.merkle.local:2181)] 
org.apache.zookeeper.ClientCnxn: Socket connection established to 
hqhd02nm01.pclc0.merkle.local/10.129.8.13:2181, initiating session
2016-09-07 15:44:22,532 INFO 
[main-SendThread(hqhd02nm01.pclc0.merkle.local:2181)] 
org.apache.zookeeper.ClientCnxn: Session establishment complete on server 
hqhd02nm01.pclc0.merkle.local/10.129.8.13:2181, sessionid = 0x257027220206fe3, 
negotiated timeout = 120000
2016-09-07 15:45:09,081 INFO [main] 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process 
identifier=hconnection-0x43e65f2a connecting to ZooKeeper 
ensemble=hqhd02nm01.pclc0.merkle.local:2181,hqhd02nm02.pclc0.merkle.local:2181,hqhd02ed01.pclc0.merkle.local:2181
2016-09-07 15:45:09,081 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating 
client connection, 
connectString=hqhd02nm01.pclc0.merkle.local:2181,hqhd02nm02.pclc0.merkle.local:2181,hqhd02ed01.pclc0.merkle.local:2181
 sessionTimeout=120000 watcher=hconnection-0x43e65f2a0x0, 
quorum=hqhd02nm01.pclc0.merkle.local:2181,hqhd02nm02.pclc0.merkle.local:2181,hqhd02ed01.pclc0.merkle.local:2181,
 baseZNode=/hbase-secure
2016-09-07 15:45:09,082 INFO 
[main-SendThread(hqhd02ed01.pclc0.merkle.local:2181)] 
org.apache.zookeeper.ClientCnxn: Opening socket connection to server 
hqhd02ed01.pclc0.merkle.local/10.129.8.11:2181. Will not attempt to 
authenticate using SASL (unknown error)



Access the Q2 2016 Digital Marketing Report for a fresh set of trends and 
benchmarks in digital 
marketing<http://www2.merkleinc.com/l/47252/2016-07-26/47gt7c>

Download our latest report titled “The Case for Change: Exploring the Myths of 
Customer-Centric Transformation” 
<http://www2.merkleinc.com/l/47252/2016-08-04/4b9p7c>

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.

Reply via email to