Thanks Samir, I checked that blog,helped me to understand what is going on when datanode dies. But still not sure why it took 15 minutes. I will check the server logs again, carefully.
Regards, Hironori Ogibayashi 2016-02-18 22:56 GMT+09:00 Samir Ahmic <[email protected]>: > Thanks for explaining Hironori, > By shuting down DataNode process on host7516 we also have to add hadoop > recovery time in total recovery time. > Here is great blog explaining how hadoop recovery process works. > > *http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/ > <http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/>* > > Regards > Samir > > On Thu, Feb 18, 2016 at 11:04 AM, おぎばやしひろのり <[email protected]> wrote: > >> Samir, >> >> Thank you for your advice. >> The actual operation I had to shutdown host7516 was, just click "stop" >> button on our VM console. I don't know the internal, but I can see >> some process terminating messages in /var/log/messages, so it looks >> like a kind of graceful shutdown. >> About reproducing, yes, I can, but not every time. maybe once in 3,4 >> times or so. I am not sure the difference. >> >> I also checked the regionserver logs. There ware warnings related to >> DataNode access, this is because there was DataNode on the shutdown >> host7516, too. >> Regarding log split, I couldn't find any errors. Both host finished >> the task within a few seconds. >> >> --- host7517 >> 2016-02-17 15:39:59,002 INFO [LruBlockCacheStatsExecutor] >> hfile.LruBlockCache: totalSize=420.92 KB, freeSize=399.19 MB, >> max=399.60 MB, blockCount=0, accesses=24227, hits=0, hitRatio=0, >> cachingAccesses=0, cachingHits=0, cachingHitsRatio=0 >> ,evictions=7289, evicted=0, evictedPerRun=0.0 >> 2016-02-17 15:39:59,869 INFO [HOST7517:16020Replication Statistics >> #0] regionserver.Replication: Normal source for cluster 1: Total >> replicated edits: 0, currently replicating from: >> >> hdfs://hdpts/apps/hbase/data/WALs/host7517.mydomain,16020,1455618296942/host7517.mydomain%2C16020%2C1455618296942.default.1455690301451 >> at position: 96609590 >> >> 2016-02-17 15:40:50,953 WARN [ResponseProcessor for block >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125] >> hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125 >> java.io.IOException: Bad response ERROR for block >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125 from >> datanode DatanodeInfoWithStorage[192.168.184.73:50010 >> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK] >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:786) >> 2016-02-17 15:40:50,954 WARN [DataStreamer for file >> >> /apps/hbase/data/WALs/host7517.mydomain,16020,1455618296942/host7517.mydomain%2C16020%2C1455618296942.default.1455690301451 >> block BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125] >> hdfs.DFSClient: Error Recovery for block >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125 in >> pipeline DatanodeInfoWithStorage[192.168.185.249:50010 >> ,DS-52f00167-e863-48d8-8f98-3b613f774b0c,DISK], >> DatanodeInfoWithStorage[192.168.184.85:50010 >> ,DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28,DISK], >> DatanodeInfoWithStorage[192.168.184.73:50010 >> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK]: >> bad datanode DatanodeInfoWithStorage[192.168.184.73:50010 >> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK] >> ... >> 2016-02-17 15:56:55,633 INFO [SplitLogWorker-HOST7517:16020] >> coordination.ZkSplitLogWorkerCoordination: worker >> host7517.mydomain,16020,1455618296942 acquired task >> >> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.default.1455690304230 >> 2016-02-17 15:56:55,677 INFO [RS_LOG_REPLAY_OPS-HOST7517:16020-0] >> wal.WALSplitter: Splitting wal: >> >> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230, >> length=83 >> 2016-02-17 15:56:55,677 INFO [RS_LOG_REPLAY_OPS-HOST7517:16020-0] >> wal.WALSplitter: DistributedLogReplay = false >> 2016-02-17 15:56:55,685 INFO [RS_LOG_REPLAY_OPS-HOST7517:16020-0] >> util.FSHDFSUtils: Recovering lease on dfs file >> >> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230 >> 2016-02-17 15:56:55,696 INFO [RS_LOG_REPLAY_OPS-HOST7517:16020-0] >> util.FSHDFSUtils: recoverLease=false, attempt=0 on >> >> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230 >> after 11ms >> 2016-02-17 15:56:59,698 INFO [RS_LOG_REPLAY_OPS-HOST7517:16020-0] >> util.FSHDFSUtils: recoverLease=true, attempt=1 on >> >> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020 >> 2016-02-17 15:56:59,771 INFO >> [RS_LOG_REPLAY_OPS-HOST7517:16020-0-Writer-1] wal.WALSplitter: >> Creating writer >> >> path=hdfs://hdpts/apps/hbase/data/data/default/usertable/64199c31957c01b5bd9ee50b02e1f7fd/recovered.edits/0000000000000542072.temp >> region=64199c31957c01b5bd9ee50b02e1f7fd >> ...(similar lines) >> 2016-02-17 15:57:01,001 INFO [RS_LOG_REPLAY_OPS-HOST7517:16020-0] >> wal.WALSplitter: Split writers finished >> 2016-02-17 15:57:01,034 INFO [split-log-closeStream-3] >> wal.WALSplitter: Rename >> >> hdfs://hdpts/apps/hbase/data/data/default/usertable/06ed7277b7b9539a3ba597e0041acb12/recovered.edits/0000000000000054100.temp >> to >> hdfs://hdpts/apps/hbase/data/data/default/usertable/06ed7277b7b9539a3ba597e0041acb12/recovered.edits/0000000000000055498 >> ...(similar lines) >> 2016-02-17 15:57:01,332 INFO [RS_LOG_REPLAY_OPS-HOST7517:16020-0] >> wal.WALSplitter: Processed 73326 edits across 53 regions; edits >> skipped=0; log >> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230, >> length=83, corrupted=false, progress failed=false >> 2016-02-17 15:57:01,336 INFO [RS_LOG_REPLAY_OPS-HOST7517:16020-0] >> coordination.ZkSplitLogWorkerCoordination: successfully transitioned >> task >> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.default.1455690304230 >> to final state DONE host7517.mydomain,16020,1455618296942 >> 2016-02-17 15:57:01,336 INFO [RS_LOG_REPLAY_OPS-HOST7517:16020-0] >> handler.WALSplitterHandler: worker >> host7517.mydomain,16020,1455618296942 done with task >> >> org.apache.hadoop.hbase.coordination.ZkSplitLogWorkerCoordination$ZkSplitTaskDetails@7b0d8dc7 >> in 5691ms >> >> --- host7518 >> >> 2016-02-17 15:40:55,957 WARN [ResponseProcessor for block >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753896_13177] >> hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753896_13177 >> java.io.IOException: Bad response ERROR for block >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753896_13177 from >> datanode DatanodeInfoWithStorage[192.168.184.73:50010 >> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK] >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:786) >> 2016-02-17 15:40:55,959 WARN [DataStreamer for file >> >> /apps/hbase/data/WALs/host7518.mydomain,16020,1455618298475/host7518.mydomain%2C16020%2C1455618298475.default.1455691227152 >> block BP-1843495860-192.168.189.219-1453778090403:blk_1073753896_13177] >> hdfs.DFSClient: Error Recovery for block >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753896_13177 in >> pipeline DatanodeInfoWithStorage[192.168.184.85:50010 >> ,DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28,DISK], >> DatanodeInfoWithStorage[192.168.184.73:50010 >> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK], >> DatanodeInfoWithStorage[192.168.185.249:50010 >> ,DS-52f00167-e863-48d8-8f98-3b613f774b0c,DISK]: >> bad datanode DatanodeInfoWithStorage[192.168.184.73:50010 >> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK] >> 2016-02-17 15:40:56,043 INFO [sync.4] wal.FSHLog: Slow sync cost: >> 3774 ms, current pipeline: >> [DatanodeInfoWithStorage[192.168.184.85:50010 >> ,DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28,DISK], >> DatanodeInfoWithStorage[192.168.185.249:50010 >> ,DS-52f00167-e863-48d8-8f98-3b613f774b0c,DISK], >> DatanodeInfoWithStorage[192.168.189.208:50010 >> ,DS-882dca14-8651-4ffe-a102-8beebc774575,DISK]] >> 2016-02-17 15:40:56,044 INFO [sync.3] wal.FSHLog: Slow sync cost: >> 20103 ms, current pipeline: >> [DatanodeInfoWithStorage[192.168.184.85:50010 >> ,DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28,DISK], >> DatanodeInfoWithStorage[192.168.185.249:50010 >> ,DS-52f00167-e863-48d8-8f98-3b613f774b0c,DISK], >> DatanodeInfoWithStorage[192.168.189.208:50010 >> ,DS-882dca14-8651-4ffe-a102-8beebc774575,DISK]] >> ... >> 2016-02-17 15:41:16,016 INFO [SplitLogWorker-HOST7518:16020] >> coordination.ZkSplitLogWorkerCoordination: worker >> host7518.mydomain,16020,1455618298475 acquired task >> >> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..meta.1455690873222.meta >> 2016-02-17 15:41:16,194 INFO [RS_LOG_REPLAY_OPS-HOST7518:16020-0] >> wal.WALSplitter: Splitting wal: >> >> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta, >> length=83 >> 2016-02-17 15:41:16,194 INFO [RS_LOG_REPLAY_OPS-HOST7518:16020-0] >> wal.WALSplitter: DistributedLogReplay = false >> 2016-02-17 15:41:16,252 INFO [RS_LOG_REPLAY_OPS-HOST7518:16020-0] >> util.FSHDFSUtils: Recovering lease on dfs file >> >> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta >> 2016-02-17 15:41:16,396 INFO [RS_LOG_REPLAY_OPS-HOST7518:16020-0] >> util.FSHDFSUtils: recoverLease=false, attempt=0 on >> >> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta >> after 144ms >> 2016-02-17 15:41:20,398 INFO [RS_LOG_REPLAY_OPS-HOST7518:16020-0] >> util.FSHDFSUtils: recoverLease=true, attempt=1 on >> >> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta >> after 4146ms >> 2016-02-17 15:41:20,456 INFO >> [RS_LOG_REPLAY_OPS-HOST7518:16020-0-Writer-2] wal.WALSplitter: >> Creating writer >> >> path=hdfs://hdpts/apps/hbase/data/data/hbase/meta/1588230740/recovered.edits/0000000000000009659.temp >> region=1588230740 >> 2016-02-17 15:41:20,457 INFO [RS_LOG_REPLAY_OPS-HOST7518:16020-0] >> wal.WALSplitter: Split writers finished >> 2016-02-17 15:41:20,479 INFO [split-log-closeStream-1] >> wal.WALSplitter: Rename >> >> hdfs://hdpts/apps/hbase/data/data/hbase/meta/1588230740/recovered.edits/0000000000000009659.temp >> to >> hdfs://hdpts/apps/hbase/data/data/hbase/meta/1588230740/recovered.edits/0000000000000009661 >> 2016-02-17 15:41:20,479 INFO [RS_LOG_REPLAY_OPS-HOST7518:16020-0] >> wal.WALSplitter: Processed 3 edits across 1 regions; edits >> skipped=148; log >> >> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta, >> length=83, corrupted=false, progress failed=false >> 2016-02-17 15:41:20,482 INFO [RS_LOG_REPLAY_OPS-HOST7518:16020-0] >> coordination.ZkSplitLogWorkerCoordination: successfully transitioned >> task >> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..meta.1455690873222.meta >> to final state DONE host7518.mydomain,16020,1455618298475 >> 2016-02-17 15:41:20,482 INFO [RS_LOG_REPLAY_OPS-HOST7518:16020-0] >> handler.WALSplitterHandler: worker >> host7518.mydomain,16020,1455618298475 done with task >> >> org.apache.hadoop.hbase.coordination.ZkSplitLogWorkerCoordination$ZkSplitTaskDetails@42ed43fe >> in 4359ms >> >> Regards, >> Hironori Ogibayashi >> >> 2016-02-18 5:12 GMT+09:00 Samir Ahmic <[email protected]>: >> > Hi Hironori, >> > >> > Based on your logs you have shutdown regionserver that was hosting system >> > "hbase:meta" table. This is special case since user regions can not be >> > reassigned to other servers until "hbase:meta" table region is >> reassigned. >> > This can partially explain delay in recovery time you have experienced. >> > Also there is WAL splitting process which must be completed before >> regions >> > are reassigned. Still 15 minutes is too long period for recovery. >> > How did you shutdown host7516? Also you should check logs for errors >> > from host7517 and host7518 around this time 15:41 because they have >> > acquired WAL splitting tasks. >> > Also can you reproduce this every time in your tests ? >> > >> > Regards >> > Samir >> > >> > On Wed, Feb 17, 2016 at 8:49 AM, おぎばやしひろのり <[email protected]> wrote: >> > >> >> Hello, >> >> >> >> I am testing how HBase works in case of failure and experienced >> >> unexpected behavior. If someone could tell me how to shorten recovery >> >> time, that would be appreciated. >> >> >> >> What I tried was, first client start writing to HBase, and then, shut >> >> down one of the slave node machines. I expected the writing operation >> >> stops for a while (1 minute or so) and after HBase automatically moves >> >> failed regions to another slave node, client will be able to write >> >> again. >> >> But actually, it took about 15 minutes for the cluster to recover. >> >> What was going on? I already have applied configurations presented on >> >> https://hbase.apache.org/book.html#mttr >> >> >> >> My environment is: >> >> - 3 master(NN, HMaster, ZK, etc.) nodes and 4 slave nodes(DN, RS), all >> >> of them are virtual machines. >> >> - HDP 2.4.3 (HBase 1.1.1) >> >> - CentOS 6.7 >> >> >> >> Here are HMaster and NameNode log. HMaster detected failure of a >> >> regionserver at 15:41:16 and trying to start recovery, but throwing >> >> RetriesExhaustedException again and again, then, finally finished >> >> recoverying at 15:57:11. I noticed that there are some logs about >> >> lease recovery in NameNode log, but not sure this is related. >> >> >> >> ---- HMaster log --- >> >> 2016-02-17 15:41:16,003 INFO [main-EventThread] >> >> zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, >> >> processing expiration [host7516.mydomain,16020,1455618299902] >> >> 2016-02-17 15:41:16,005 INFO >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] >> >> handler.MetaServerShutdownHandler: Splitting hbase:meta logs for >> >> host7516.mydomain,16020,1455618299902 >> >> 2016-02-17 15:41:16,010 INFO >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] >> >> master.SplitLogManager: dead splitlog workers >> >> [host7516.mydomain,16020,1455618299902] >> >> 2016-02-17 15:41:16,011 INFO >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] >> >> master.SplitLogManager: started splitting 1 logs in >> >> >> >> >> [hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting] >> >> for [host7516. >> >> mydomain,16020,1455618299902] >> >> 2016-02-17 15:41:16,016 INFO [main-EventThread] >> >> coordination.SplitLogManagerCoordination: task >> >> >> >> >> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..met >> >> a.1455690873222.meta acquired by host7518.mydomain,16020,1455618298475 >> >> 2016-02-17 15:41:16,266 INFO >> >> [host7512.mydomain,16000,1455618257882_splitLogManager__ChoreService_1] >> >> master.SplitLogManager: total tasks = 1 unassigned = 0 >> >> >> tasks={/hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C14556182 >> >> >> >> >> 99902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..meta.1455690873222.meta=last_update >> >> = 1455691276196 last_version = 2 cur_worker_name = >> >> host7518.mydomain,16020,1455618298475 status = in_progress incarnation >> >> = 0 resub >> >> mits = 0 batch = installed = 1 done = 0 error = 0} >> >> 2016-02-17 15:41:20,481 INFO [main-EventThread] >> >> coordination.SplitLogManagerCoordination: task >> >> >> >> >> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..met >> >> a.1455690873222.meta entered state: DONE >> >> host7518.mydomain,16020,1455618298475 >> >> 2016-02-17 15:41:20,487 INFO [main-EventThread] wal.WALSplitter: >> >> Archived processed log >> >> >> >> >> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta >> >> to >> >> >> hdfs://hdpts/apps/hbase/data/oldWALs/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta >> >> 2016-02-17 15:41:20,487 INFO [main-EventThread] >> >> coordination.SplitLogManagerCoordination: Done splitting >> >> >> >> >> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..meta.1455690873222.meta >> >> 2016-02-17 15:41:20,492 WARN >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] >> >> master.SplitLogManager: returning success without actually splitting >> >> and deleting all the log files in path >> >> >> >> >> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting >> >> 2016-02-17 15:41:20,492 INFO >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] >> >> master.SplitLogManager: finished splitting (more than or equal to) 83 >> >> bytes in 1 log files in >> >> >> >> >> [hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting] >> >> in 4481ms >> >> 2016-02-17 15:41:20,492 INFO >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] >> >> handler.MetaServerShutdownHandler: Server >> >> host7516.mydomain,16020,1455618299902 was carrying META. Trying to >> >> assign. >> >> 2016-02-17 15:41:20,492 INFO >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] master.RegionStates: >> >> Transition {1588230740 state=OPEN, ts=1455690873330, >> >> server=host7516.mydomain,16020,1455618299902} to {1588230740 >> >> state=OFFLINE, ts=1455691280492, >> >> server=host7516.mydomain,16020,1455618299902} >> >> 2016-02-17 15:41:20,493 INFO >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] master.RegionStates: >> >> Offlined 1588230740 from host7516.mydomain,16020,1455618299902 >> >> 2016-02-17 15:41:27,495 INFO >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] >> >> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at >> >> address=host7516.mydomain,16020,1455618299902, exception=Connection >> >> timed out >> >> 2016-02-17 15:41:27,495 INFO >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] >> >> zookeeper.MetaTableLocator: Deleting hbase:meta region location in >> >> ZooKeeper >> >> 2016-02-17 15:41:27,500 INFO >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] >> >> master.AssignmentManager: Setting node as OFFLINED in ZooKeeper for >> >> region {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => >> >> '', ENDKEY => ''} >> >> 2016-02-17 15:41:27,502 INFO >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] >> >> master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to >> >> host7517.mydomain,16020,1455618296942 >> >> 2016-02-17 15:41:27,502 INFO >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] master.RegionStates: >> >> Transition {1588230740 state=OFFLINE, ts=1455691287500, >> >> server=host7516.mydomain,16020,1455618299902} to {1588230740 >> >> state=PENDING_OPEN, ts=1455691287502, >> >> server=host7517.mydomain,16020,1455618296942} >> >> 2016-02-17 15:41:27,512 INFO [AM.ZK.Worker-pool2-t62] >> >> master.RegionStates: Transition {1588230740 state=PENDING_OPEN, >> >> ts=1455691287502, server=host7517.mydomain,16020,1455618296942} to >> >> {1588230740 state=OPENING, ts=1455691287512, >> >> server=host7517.mydomain,16020,1455618296942} >> >> 2016-02-17 15:41:27,598 INFO [AM.ZK.Worker-pool2-t63] >> >> master.RegionStates: Transition {1588230740 state=OPENING, >> >> ts=1455691287512, server=host7517.mydomain,16020,1455618296942} to >> >> {1588230740 state=OPEN, ts=1455691287598, >> >> server=host7517.mydomain,16020,1455618296942} >> >> 2016-02-17 15:41:27,598 INFO [AM.ZK.Worker-pool2-t63] >> >> coordination.ZkOpenRegionCoordination: Handling OPENED of 1588230740 >> >> from host7512.mydomain,16000,1455618257882; deleting unassigned node >> >> 2016-02-17 15:42:57,601 INFO >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] >> >> handler.ServerShutdownHandler: Received exception accessing hbase:meta >> >> during server shutdown of host7516.mydomain,16020,1455618299902, >> >> retrying hbase:meta read >> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after >> >> attempts=351, exceptions: >> >> Wed Feb 17 15:42:57 KST 2016, null, java.net.SocketTimeoutException: >> >> callTimeout=60000, callDuration=90304: row '' on table 'hbase:meta' at >> >> region=hbase:meta,,1.1588230740, >> >> hostname=host7516.mydomain,16020,1455618299902, seqNum=0 >> >> >> >> at >> >> >> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:271) >> >> at >> >> >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:195) >> >> at >> >> >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59) >> >> at >> >> >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) >> >> at >> >> >> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) >> >> at >> >> >> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295) >> >> at >> >> >> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160) >> >> at >> >> >> org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155) >> >> at >> >> org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821) >> >> at >> >> >> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602) >> >> at >> >> >> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:156) >> >> at >> >> >> org.apache.hadoop.hbase.MetaTableAccessor.getServerUserRegions(MetaTableAccessor.java:555) >> >> at >> >> >> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:177) >> >> at >> >> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) >> >> at >> >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> >> at >> >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> >> at java.lang.Thread.run(Thread.java:745) >> >> Caused by: java.net.SocketTimeoutException: callTimeout=60000, >> >> callDuration=90304: row '' on table 'hbase:meta' at >> >> region=hbase:meta,,1.1588230740, >> >> hostname=host7516.mydomain,16020,1455618299902, seqNum=0 >> >> at >> >> >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159) >> >> at >> >> >> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64) >> >> ... 3 more >> >> Caused by: java.io.IOException: Call to >> >> HOST7516.mydomain/192.168.184.73:16020 failed on local exception: >> >> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1388, >> >> waitTime=90001, operationTimeout=90000 expired. >> >> at >> >> >> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1262) >> >> at >> >> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1230) >> >> at >> >> >> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) >> >> at >> >> >> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) >> >> at >> >> >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651) >> >> at >> >> >> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:372) >> >> at >> >> >> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:199) >> >> at >> >> >> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) >> >> at >> >> >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) >> >> at >> >> >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:346) >> >> at >> >> >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:320) >> >> at >> >> >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) >> >> ... 4 more >> >> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call >> >> id=1388, waitTime=90001, operationTimeout=90000 expired. >> >> at >> >> org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70) >> >> at >> >> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1204) >> >> ... 14 more >> >> ...(Same error happens again and again) >> >> 2016-02-17 15:56:55,625 INFO >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] >> >> handler.ServerShutdownHandler: Splitting logs for >> >> host7516.mydomain,16020,1455618299902 before assignment; region >> >> count=73 >> >> 2016-02-17 15:56:55,627 INFO >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.SplitLogManager: >> >> dead splitlog workers [host7516.mydomain,16020,1455618299902] >> >> 2016-02-17 15:56:55,628 INFO >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.SplitLogManager: >> >> started splitting 1 logs in >> >> >> >> >> [hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting] >> >> for [host7516.line. >> >> ism,16020,1455618299902] >> >> 2016-02-17 15:56:55,634 INFO [main-EventThread] >> >> coordination.SplitLogManagerCoordination: task >> >> >> >> >> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.defa >> >> ult.1455690304230 acquired by host7517.mydomain,16020,1455618296942 >> >> 2016-02-17 15:56:56,266 INFO >> >> [host7512.mydomain,16000,1455618257882_splitLogManager__ChoreService_1] >> >> master.SplitLogManager: total tasks = 1 unassigned = 0 >> >> >> tasks={/hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C14556182 >> >> >> >> >> 99902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.default.1455690304230=last_update >> >> = 1455692215679 last_version = 2 cur_worker_name = >> >> host7517.mydomain,16020,1455618296942 status = in_progress incarnation >> >> = 0 resubmit >> >> s = 0 batch = installed = 1 done = 0 error = 0} >> >> 2016-02-17 15:57:01,336 INFO [main-EventThread] >> >> coordination.SplitLogManagerCoordination: task >> >> >> >> >> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.defa >> >> ult.1455690304230 entered state: DONE >> host7517.mydomain,16020,1455618296942 >> >> 2016-02-17 15:57:01,343 INFO [main-EventThread] wal.WALSplitter: >> >> Archived processed log >> >> >> >> >> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.14556903042 >> >> 30 to >> >> >> hdfs://hdpts/apps/hbase/data/oldWALs/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230 >> >> 2016-02-17 15:57:01,343 INFO [main-EventThread] >> >> coordination.SplitLogManagerCoordination: Done splitting >> >> >> >> >> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C14556182 >> >> 99902.default.1455690304230 >> >> 2016-02-17 15:57:01,346 INFO >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.SplitLogManager: >> >> finished splitting (more than or equal to) 83 bytes in 1 log files in >> >> [hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,14556 >> >> 18299902-splitting] in 5718ms >> >> ... >> >> 2016-02-17 15:57:01,353 INFO >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.AssignmentManager: >> >> Bulk assigning 73 region(s) across 3 server(s), round-robin=true >> >> 2016-02-17 15:57:01,353 INFO >> >> [host7512.mydomain,16000,1455618257882-GeneralBulkAssigner-0] >> >> master.AssignmentManager: Assigning 24 region(s) to >> >> host7515.mydomain,16020,1455691124033 >> >> 2016-02-17 15:57:01,353 INFO >> >> [host7512.mydomain,16000,1455618257882-GeneralBulkAssigner-1] >> >> master.AssignmentManager: Assigning 24 region(s) to >> >> host7517.mydomain,16020,1455618296942 >> >> 2016-02-17 15:57:01,353 INFO >> >> [host7512.mydomain,16000,1455618257882-GeneralBulkAssigner-2] >> >> master.AssignmentManager: Assigning 25 region(s) to >> >> host7518.mydomain,16020,1455618298475 >> >> ... >> >> 2016-02-17 15:57:11,005 INFO [AM.ZK.Worker-pool2-t71] >> >> master.RegionStates: Offlined 50cc594cd9e74287ddd5b960bfa2d8e9 from >> >> host7516.mydomain,16020,1455618299902 >> >> 2016-02-17 15:57:11,005 INFO >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.AssignmentManager: >> >> Bulk assigning done >> >> 2016-02-17 15:57:11,005 INFO >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] >> >> handler.ServerShutdownHandler: Finished processing of shutdown of >> >> host7516.mydomain,16020,1455618299902 >> >> ---- >> >> >> >> ---- NameNode log --- >> >> 2016-02-17 15:41:16,008 INFO namenode.FSEditLog >> >> (FSEditLog.java:printStatistics(699)) - Number of transactions: 45 >> >> Total time for transactions(ms): 4 Number of transactions batched in >> >> Syncs: 0 Number of syncs: 42 SyncTimes(ms): 35 11 >> >> 2016-02-17 15:41:16,383 INFO namenode.FSNamesystem >> >> (FSNamesystem.java:recoverLeaseInternal(2907)) - recoverLease: [Lease. >> >> Holder: DFSClient_NONMAPREDUCE_1038296970_1, pendingcreates: 2], >> >> >> >> >> src=/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta >> >> from client DFSClient_NONMAPREDUCE_1038296970_1 >> >> 2016-02-17 15:41:16,383 INFO namenode.FSNamesystem >> >> (FSNamesystem.java:internalReleaseLease(4036)) - Recovering [Lease. >> >> Holder: DFSClient_NONMAPREDUCE_1038296970_1, pendingcreates: 2], >> >> >> >> >> src=/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta >> >> 2016-02-17 15:41:16,383 WARN hdfs.StateChange >> >> (FSNamesystem.java:internalReleaseLease(4157)) - DIR* >> >> NameSystem.internalReleaseLease: File >> >> >> >> >> /apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta >> >> has not been closed. Lease recovery is in progress. RecoveryId = 13180 >> >> for block blk_1073753851_13131{UCState=UNDER_RECOVERY, >> >> truncateBlock=null, primaryNodeIndex=2, >> >> >> >> >> replicas=[ReplicaUC[[DISK]DS-9787b201-fc64-450e-a20f-dcc79fb94b6f:NORMAL: >> 192.168.184.73:50010 >> >> |RBW], >> >> >> >> ReplicaUC[[DISK]DS-52f00167-e863-48d8-8f98-3b613f774b0c:NORMAL: >> 192.168.185.249:50010 >> >> |RBW], >> >> >> >> ReplicaUC[[DISK]DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28:NORMAL: >> 192.168.184.85:50010 >> >> |RBW]]} >> >> 2016-02-17 15:41:18,588 INFO BlockStateChange >> >> (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) - >> >> chooseUnderReplicatedBlocks selected 8 blocks at priority level 2; >> >> Total=8 Reset bookmarks? false >> >> 2016-02-17 15:41:18,588 INFO BlockStateChange >> >> (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK* >> >> neededReplications = 75, pendingReplications = 0. >> >> 2016-02-17 15:41:18,588 INFO blockmanagement.BlockManager >> >> (BlockManager.java:computeReplicationWorkForBlocks(1534)) - Blocks >> >> chosen but could not be replicated = 8; of which 8 have no target, 0 >> >> have no source, 0 are UC, 0 are abandoned, 0 already have enough >> >> replicas. >> >> 2016-02-17 15:41:18,960 WARN security.UserGroupInformation >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available >> >> for user anonymous >> >> 2016-02-17 15:41:18,961 WARN security.UserGroupInformation >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available >> >> for user anonymous >> >> 2016-02-17 15:41:18,962 WARN security.UserGroupInformation >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available >> >> for user anonymous >> >> 2016-02-17 15:41:18,963 WARN security.UserGroupInformation >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available >> >> for user anonymous >> >> 2016-02-17 15:41:18,964 WARN security.UserGroupInformation >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available >> >> for user anonymous >> >> 2016-02-17 15:41:18,966 WARN security.UserGroupInformation >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available >> >> for user anonymous >> >> 2016-02-17 15:41:18,967 WARN security.UserGroupInformation >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available >> >> for user anonymous >> >> 2016-02-17 15:41:18,967 WARN security.UserGroupInformation >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available >> >> for user anonymous >> >> 2016-02-17 15:41:18,969 WARN security.UserGroupInformation >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available >> >> for user anonymous >> >> 2016-02-17 15:41:19,096 WARN security.UserGroupInformation >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available >> >> for user anonymous >> >> 2016-02-17 15:41:19,098 WARN security.UserGroupInformation >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available >> >> for user anonymous >> >> 2016-02-17 15:41:19,099 WARN security.UserGroupInformation >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available >> >> for user anonymous >> >> 016-02-17 15:41:19,307 INFO blockmanagement.DatanodeManager >> >> (DatanodeManager.java:handleHeartbeat(1426)) - Skipped stale nodes for >> >> recovery : 1 >> >> 2016-02-17 15:41:19,694 INFO namenode.FSNamesystem >> >> (FSNamesystem.java:commitBlockSynchronization(4269)) - >> >> >> >> >> commitBlockSynchronization(oldBlock=BP-1843495860-192.168.189.219-1453778090403:blk_1073753851_13131, >> >> newgenerationstamp=13180, newl >> >> ength=60238, newtargets=[192.168.185.249:50010, 192.168.184.85:50010], >> >> closeFile=true, deleteBlock=false) >> >> 2016-02-17 15:41:19,696 INFO namenode.FSNamesystem >> >> (FSNamesystem.java:commitBlockSynchronization(4434)) - >> >> >> >> >> commitBlockSynchronization(oldBlock=BP-1843495860-192.168.189.219-1453778090403:blk_1073753851_13131, >> >> file=/apps/hbase/data/WALs/cch >> >> >> >> >> dpts7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta, >> >> newgenerationstamp=13180, newlength=60238, >> >> newtargets=[192.168.185.249:50010, 192.168.184.85:50010]) successful >> >> 2016-02-17 15:41:20,459 INFO hdfs.StateChange >> >> (FSNamesystem.java:saveAllocatedBlock(3583)) - BLOCK* allocate >> >> blk_1073753897_13181{UCState=UNDER_CONSTRUCTION, truncateBlock=null, >> >> primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-f8c4c3b2- >> >> a7cc-40e0-9ebb-2b9c352c0d28:NORMAL:192.168.184.85:50010|RBW], >> >> >> >> ReplicaUC[[DISK]DS-882dca14-8651-4ffe-a102-8beebc774575:NORMAL: >> 192.168.189.208:50010 >> >> |RBW], >> >> >> >> ReplicaUC[[DISK]DS-52f00167-e863-48d8-8f98-3b613f774b0c:NORMAL: >> 192.168.185.249:50010 >> >> |RBW >> >> ]]} for >> >> >> /apps/hbase/data/data/hbase/meta/1588230740/recovered.edits/0000000000000009659.temp >> >> 2016-02-17 15:41:20,474 INFO hdfs.StateChange >> >> (FSNamesystem.java:completeFile(3503)) - DIR* completeFile: >> >> >> >> >> /apps/hbase/data/data/hbase/meta/1588230740/recovered.edits/0000000000000009659.temp >> >> is closed by DFSClient_NONMAPREDUCE_443617401_ >> >> 1 >> >> 2016-02-17 15:41:20,488 INFO ipc.Server (Server.java:run(2172)) - IPC >> >> Server handler 8 on 8020, call >> >> org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from >> >> 192.168.189.219:51393 Call#8040 Retry#0 >> >> org.apache.hadoop.fs.PathIsNotEmptyDirectoryException: >> >> `/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting >> >> is non empty': Directory is not empty >> >> at >> >> >> org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:89) >> >> at >> >> >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3723) >> >> at >> >> >> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1017) >> >> at >> >> >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:611) >> >> at >> >> >> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >> >> at >> >> >> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) >> >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) >> >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) >> >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) >> >> at java.security.AccessController.doPrivileged(Native Method) >> >> at javax.security.auth.Subject.doAs(Subject.java:422) >> >> at >> >> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) >> >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) >> >> 2016-02-17 15:41:21,588 INFO BlockStateChange >> >> (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) - >> >> chooseUnderReplicatedBlocks selected 8 blocks at priority level 2; >> >> Total=8 Reset bookmarks? false >> >> 2016-02-17 15:41:21,588 INFO BlockStateChange >> >> (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK* >> >> neededReplications = 76, pendingReplications = 0. >> >> ... >> >> 2016-02-17 15:56:15,384 INFO namenode.FSNamesystem >> >> (FSNamesystem.java:rollEditLog(5937)) - Roll Edit Log from >> >> 192.168.184.72 >> >> 2016-02-17 15:56:15,384 INFO namenode.FSEditLog >> >> (FSEditLog.java:rollEditLog(1202)) - Rolling edit logs >> >> 2016-02-17 15:56:15,384 INFO namenode.FSEditLog >> >> (FSEditLog.java:endCurrentLogSegment(1258)) - Ending log segment >> >> 369158 >> >> 2016-02-17 15:56:15,395 INFO namenode.FSEditLog >> >> (FSEditLog.java:printStatistics(699)) - Number of transactions: 14 >> >> Total time for transactions(ms): 5 Number of transactions batched in >> >> Syncs: 0 Number of syncs: 14 SyncTimes(ms): 26 6 >> >> 2016-02-17 15:56:15,421 INFO namenode.FileJournalManager >> >> (FileJournalManager.java:finalizeLogSegment(133)) - Finalizing edits >> >> file /hadoop/hdfs/namenode/current/edits_inprogress_0000000000000369158 >> >> -> /hadoop/hdfs/namenode/current/edits_ >> >> 0000000000000369158-0000000000000369171 >> >> 2016-02-17 15:56:15,421 INFO namenode.FSEditLog >> >> (FSEditLog.java:startLogSegment(1218)) - Starting log segment at >> >> 369172 >> >> 2016-02-17 15:56:15,764 INFO BlockStateChange >> >> (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) - >> >> chooseUnderReplicatedBlocks selected 3 blocks at priority level 1; 3 >> >> blocks at priority level 2; Total=6 Reset bookmarks? fals >> >> e >> >> 2016-02-17 15:56:15,764 INFO BlockStateChange >> >> (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK* >> >> neededReplications = 91, pendingReplications = 4. >> >> 2016-02-17 15:56:15,764 INFO blockmanagement.BlockManager >> >> (BlockManager.java:computeReplicationWorkForBlocks(1534)) - Blocks >> >> chosen but could not be replicated = 3; of which 3 have no target, 0 >> >> have no source, 0 are UC, 0 are abandoned, >> >> 0 already have enough replicas. >> >> 2016-02-17 15:56:18,764 INFO BlockStateChange >> >> (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) - >> >> chooseUnderReplicatedBlocks selected 6 blocks at priority level 2; >> >> Total=6 Reset bookmarks? false >> >> 2016-02-17 15:56:18,765 INFO BlockStateChange >> >> (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK* >> >> neededReplications = 87, pendingReplications = 5. >> >> 2016-02-17 15:56:18,765 INFO blockmanagement.BlockManager >> >> (BlockManager.java:computeReplicationWorkForBlocks(1534)) - Blocks >> >> chosen but could not be replicated = 2; of which 0 have no target, 2 >> >> have no source, 0 are UC, 0 are abandoned, >> >> 0 already have enough replicas. >> >> ... >> >> 2016-02-17 15:56:55,688 INFO namenode.FSNamesystem >> >> (FSNamesystem.java:recoverLeaseInternal(2907)) - recoverLease: [Lease. >> >> Holder: DFSClient_NONMAPREDUCE_1038296970_1, pendingcreates: 1], >> >> src=/apps/hbase/data/WALs/host7516.mydomain,16 >> >> >> >> >> 020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230 >> >> from client DFSClient_NONMAPREDUCE_1038296970_1 >> >> 2016-02-17 15:56:55,688 INFO namenode.FSNamesystem >> >> (FSNamesystem.java:internalReleaseLease(4036)) - Recovering [Lease. >> >> Holder: DFSClient_NONMAPREDUCE_1038296970_1, pendingcreates: 1], >> >> src=/apps/hbase/data/WALs/host7516.mydomain,16020 >> >> >> >> >> ,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230 >> >> 2016-02-17 15:56:55,688 WARN hdfs.StateChange >> >> (FSNamesystem.java:internalReleaseLease(4157)) - DIR* >> >> NameSystem.internalReleaseLease: File >> >> >> >> >> /apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C160 >> >> 20%2C1455618299902.default.1455690304230 has not been closed. Lease >> >> recovery is in progress. RecoveryId = 13183 for block >> >> blk_1073753849_13127{UCState=UNDER_RECOVERY, truncateBlock=null, >> >> primaryNodeIndex=2, replicas=[ReplicaUC[[DISK]DS-97 >> >> 87b201-fc64-450e-a20f-dcc79fb94b6f:NORMAL:192.168.184.73:50010|RBW], >> >> >> >> ReplicaUC[[DISK]DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28:NORMAL: >> 192.168.184.85:50010 >> >> |RBW], >> >> ReplicaUC[[DISK]DS-52f00167-e863-48d8-8f98-3b613f774b0c:NORMAL: >> >> 192.168.185.249:500 >> >> 10|RBW]]} >> >> 2016-02-17 15:56:58,687 INFO namenode.FSNamesystem >> >> (FSNamesystem.java:commitBlockSynchronization(4434)) - >> >> >> >> >> commitBlockSynchronization(oldBlock=BP-1843495860-192.168.189.219-1453778090403:blk_1073753849_13127, >> >> >> >> >> file=/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230, >> >> newgenerationstamp=13183, newlength=118884436, >> >> newtargets=[192.168.184.85:50010, 192.168.185.249:50010]) successful >> >> >> >> ---- >> >> >> >> Thanks, >> >> Hironori Ogibayashi >> >> >> >>
