Hi JD,

Thanks for your response. I was planning to use replication for my 
production/development servers but it seems like work is still going on this 
issue. I want to know that which version release is planned for this bug. 
Currently Im using Hbase 0.90.3

Some of my queries are :
1.       Will running 3-4 zookeeper node helps in case of failure of 1-2 
zookeeper node? Will the cluster keeps on running or it will be down ?

Thanks
-Stuti

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Jean-Daniel 
Cryans
Sent: Monday, September 19, 2011 11:04 PM
To: [email protected]
Subject: Re: Unexpected shutdown of Zookeeper

I think this is just:

https://issues.apache.org/jira/browse/HBASE-3130

J-D

On Sun, Sep 18, 2011 at 10:15 PM, Stuti Awasthi <[email protected]> wrote:
> Hi All,
>
> I was running a 2 node cluster with 1 zookeeper node and 2 region server 
> node. I had also setup cluster replication with another single node 
> Hbase-Hadoop cluster. Replication was successful and I left the cluster 
> running over the weekend with no data for replication.
>
> Today I can see that in  Master cluster Zookeeper is dead. 1 region server 
> which was running on slave machine is also dead. The cluster to which I was 
> replicating is running fine.
>
> My queries are :
>
> 1.       Can zookeeper be dead because there is no replication over the 
> network for long time ?
>
> 2.       How to cater to these situations ? Running 3-4 zookeeper node will 
> help ?
>
> 3.       If I run multiple Zookeeper node, then will the cluster keep on 
> running normally even if 2-3 zookeeper are dead?
>
> 4.       In my case, out of 2 region server, 1 is dead but 1 is still 
> working, if my zookeeper node was running, will I able to access hbase 
> properly.
>
> Logs :
> hbase-root-zookeeper-master.log :
>
> 2011-09-19 10:07:55,753 INFO
> org.apache.zookeeper.server.NIOServerCnxn: Accepted socket connection
> from /10.33.64.235:44706
> 2011-09-19 10:07:55,758 INFO
> org.apache.zookeeper.server.NIOServerCnxn: Client attempting to
> establish new session at /10.33.64.235:44706
> 2011-09-19 10:07:55,761 INFO
> org.apache.zookeeper.server.NIOServerCnxn: Established session
> 0x13271b6c4f1000c with negotiated timeout 180000 for client
> /10.33.64.235:44706
> 2011-09-19 10:10:48,318 WARN
> org.apache.zookeeper.server.NIOServerCnxn: EndOfStreamException:
> Unable to read additional data from client sessionid
> 0x13271b6c4f1000c, likely client has closed socket
> 2011-09-19 10:10:48,319 INFO
> org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
> for client /10.33.64.235:44706 which had sessionid 0x13271b6c4f1000c
> 2011-09-19 10:12:57,002 INFO
> org.apache.zookeeper.server.ZooKeeperServer: Expiring session
> 0x13271b6c4f1000c, timeout of 180000ms exceeded
> 2011-09-19 10:12:57,002 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Processed session
> termination for sessionid: 0x13271b6c4f1000c
>
> hbase-root-regionserver-slave.log:
>
> 2011-09-16 16:00:50,354 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server listener on 60020: readAndProcess threw exception
> java.io.IOException: Connection reset by peer. Count of bytes read: 0
> java.io.IOException: Connection reset by peer
>       at sun.nio.ch.FileDispatcher.read0(Native Method)
>       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>       at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> 2011-09-16 16:00:51,058 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> Opening log for replication slave%3A60020.1316168146136 at 663246
> 2011-09-16 16:00:51,064 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> currentNbOperations:5003 and seenEntries:0 and size: 0
> 2011-09-16 16:00:51,064 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceMana
> ger: Going to report log #slave%3A60020.1316168146136 for position
> 663246 in
> hdfs://master:54310/hbase/.logs/slave,60020,1316168145427/slave%3A6002
> 0.1316168146136
> 2011-09-16 16:00:51,066 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceMana
> ger: Removing 0 logs in the list: []
> 2011-09-16 16:00:51,066 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> Nothing to replicate, sleeping 1000 times 2
> 2011-09-16 16:00:53,068 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening 
> log for replication slave%3A60020.1316168146136 at 663246 
> ..................................
> 2011-09-16 17:14:49,440 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x13271b5395c0007 for server null, unexpected error, closing socket
> connection and attempting reconnect
> java.net.ConnectException: Connection timed out
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>       at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
> 2011-09-16 17:14:51,039 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceMana
> ger: /hbase/rs/master,60020,1316167798366 znode expired, trying to
> lock it
> 2011-09-16 17:14:51,088 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server slave1/172.28.96.239:2181
> 2011-09-16 17:14:51,089 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to slave1/172.28.96.239:2181, initiating
> session
> 2011-09-16 17:14:51,093 INFO org.apache.zookeeper.ClientCnxn: Unable
> to reconnect to ZooKeeper service, session 0x13271b5395c0007 has
> expired, closing socket connection
> 2011-09-16 17:14:51,094 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> server serverName=slave,60020,1316168145427, load=(requests=0,
> regions=6, usedHeap=29, maxHeap=996): connection to cluster:
> 1-0x13271b5395c0007 connection to cluster: 1-0x13271b5395c0007
> received expired from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>       at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(Zoo
> KeeperWatcher.java:343)
>       at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWa
> tcher.java:261)
>       at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.ja
> va:530)
>       at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> 2011-09-16 17:14:51,094 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> requests=0, regions=6, stores=6, storefiles=5, storefileIndexSize=0,
> memstoreSize=0, compactionQueueSize=0, flushQueueSize=0, usedHeap=29,
> maxHeap=996, blockCacheSize=982352, blockCacheFree=208064384,
> blockCacheCount=2, blockCacheHitCount=31, blockCacheMissCount=2,
> blockCacheEvictedCount=0, blockCacheHitRatio=93,
> blockCacheHitCachingRatio=93
> 2011-09-16 17:14:51,094 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED:
> connection to cluster: 1-0x13271b5395c0007 connection to cluster:
> 1-0x13271b5395c0007 received expired from ZooKeeper, aborting
> 2011-09-16 17:14:51,094 INFO org.apache.zookeeper.ClientCnxn:
> EventThread shut down
> 2011-09-16 17:14:51,114 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> Source exiting 1
> 2011-09-16 17:14:52,476 INFO org.apache.hadoop.ipc.HBaseServer:
> Stopping server on 60020
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 0 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI
> IPC Server handler 2 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 1 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI
> IPC Server handler 0 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 2 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI
> IPC Server handler 9 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 3 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI
> IPC Server handler 8 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI
> IPC Server handler 6 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 4 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 5 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 7 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 6 on 60020: exiting
> 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 8 on 60020: exiting
> 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 9 on 60020: exiting
> 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: PRI
> IPC Server handler 1 on 60020: exiting
> 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: PRI
> IPC Server handler 3 on 60020: exiting
> 2011-09-16 17:14:52,478 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping
> infoServer
> 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer:
> Stopping IPC Server listener on 60020
> 2011-09-16 17:14:52,479 INFO org.apache.hadoop.ipc.HBaseServer: PRI
> IPC Server handler 4 on 60020: exiting
> 2011-09-16 17:14:52,479 INFO org.apache.hadoop.ipc.HBaseServer: PRI
> IPC Server handler 5 on 60020: exiting
> 2011-09-16 17:14:52,479 INFO org.apache.hadoop.ipc.HBaseServer:
> Stopping IPC Server Responder
> 2011-09-16 17:14:52,479 INFO org.apache.hadoop.ipc.HBaseServer: PRI
> IPC Server handler 7 on 60020: exiting
> 2011-09-16 17:14:52,481 INFO org.mortbay.log: Stopped
> [email protected]:60030
> 2011-09-16 17:14:52,585 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> regionserver60020.compactor exiting
> 2011-09-16 17:14:52,585 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
> regionserver60020.cacheFlusher exiting
> 2011-09-16 17:14:52,586 INFO org.apache.hadoop.hbase.regionserver.LogRoller: 
> LogRoller exiting.
> 2011-09-16 17:14:52,586 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChec
> ker: regionserver60020.majorCompactionChecker exiting
> 2011-09-16 17:14:52,587 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
> close of backup,,1315992791196.e5ff1d9eb66e1157d0ca8bfaaf493480.
> 2011-09-16 17:14:52,588 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> regionserver60020.logSyncer interrupted while waiting for sync
> requests
> 2011-09-16 17:14:52,588 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: Closing
> backup,,1315992791196.e5ff1d9eb66e1157d0ca8bfaaf493480.: disabling
> compactions & flushes
> 2011-09-16 17:14:52,588 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
> close of testArchiveBackup,,1315915407547.e05ec3159a022f28aa92e1a01ca50fec.
> 2011-09-16 17:14:52,588 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
> close of replication,,1316166014290.5937efd76493915556d3641aa9c0b6df.
> 2011-09-16 17:14:52,589 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> regionserver60020.logSyncer exiting
> 2011-09-16 17:14:52,588 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
> Processing close of -ROOT-,,0.70236052
> 2011-09-16 17:14:52,589 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLog: closing hlog writer in
> hdfs://master:54310/hbase/.logs/slave,60020,1316168145427
> 2011-09-16 17:14:52,589 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: Closing 
> replication,,1316166014290.5937efd76493915556d3641aa9c0b6df.: disabling 
> compactions & flushes ............................
> 2011-09-16 17:14:52,602 INFO org.apache.zookeeper.ClientCnxn:
> EventThread shut down
> 2011-09-16 17:14:52,602 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x13271b6c4f10003 closed
> 2011-09-16 17:14:52,605 INFO org.apache.zookeeper.ClientCnxn:
> EventThread shut down
> 2011-09-16 17:14:52,605 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x13271b6c4f10005 closed
> 2011-09-16 17:14:52,605 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> Closing source 1 because: Region server is closing
> 2011-09-16 17:14:52,605 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020
> exiting
> 2011-09-16 17:14:53,040 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceMana
> ger: Not transferring queue since we are shutting down
> 2011-09-16 17:14:53,042 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
> starting; hbase.shutdown.hook=true;
> fsShutdownHook=Thread[Thread-14,5,main]
> 2011-09-16 17:14:53,042 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown
> hook
> 2011-09-16 17:14:53,042 INFO 
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown hook 
> thread.
> 2011-09-16 17:14:53,042 INFO 
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.
>
> Please suggest.
>
> Thanks
>
> ________________________________
> ::DISCLAIMER::
> ----------------------------------------------------------------------
> -------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and 
> intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its
> affiliates. Any views or opinions presented in this email are solely those of 
> the author and may not necessarily reflect the opinions of HCL or its 
> affiliates.
> Any form of reproduction, dissemination, copying, disclosure,
> modification, distribution and / or publication of this message
> without the prior written consent of the author of this e-mail is
> strictly prohibited. If you have received this email in error please delete 
> it and notify the sender immediately. Before opening any mail and attachments 
> please check them for viruses and defect.
>
> ----------------------------------------------------------------------
> -------------------------------------------------
>

Reply via email to