Does 10.240.187.182 <http://10.240.187.182:50010/> correspond with w-0 or m ?
Looks like hdfs was intermittently unstable. Have you run fsck ? Cheers On Fri, Aug 7, 2015 at 12:59 AM, Adrià Vilà <[email protected]> wrote: > Hello, > > HBase RegionServers fail once in a while: > - it can be any regionserver, not always de same - it can happen when > all the cluster is idle (at least not executing any human launched task) > - it can happen at any time, not always the same > > The cluster versions: > - Phoenix 4.4 (or 4.5) - HBase 1.1.1 - Hadoop/HDFS 2.7.1 - Zookeeper > 3.4.6 Some configs: > - ulimit -a > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 103227 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 10240 > cpu time (seconds, -t) unlimited > max user processes (-u) 103227 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > - have increased default timeouts for: hbase rpc, zookeeper session, dks > socket, regionserver lease and client scanner. > > Next you can find the logs for the master, the regionserver that failed > first, another failed and the datanode log for master and worker. > > > The timing was aproximately: > 14:05 start hbase > 14.11 w-0 down > 14.14 w-1 down > 14.15 stop hbase > > > ------------- > hbase master log (m) > ------------- > 2015-08-06 14:11:13,640 ERROR > [PriorityRpcServer.handler=19,queue=1,port=16000] master.MasterRpcServices: > Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 reported a > fatal error: > ABORTING region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: > Unrecoverable exception while closing region > SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., > still finishing close > Cause: > java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > > -------------- > hbase regionserver log (w-0) > -------------- > 2015-08-06 14:11:13,611 INFO > [PriorityRpcServer.handler=0,queue=0,port=16020] > regionserver.RSRpcServices: Close 888f017eb1c0557fbe7079b50626c891, moving > to hdp-m.c.dks-hadoop.internal,16020,1438869954062 > 2015-08-06 14:11:13,615 INFO > [StoreCloserThread-SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.-1] > regionserver.HStore: Closed 0 > 2015-08-06 14:11:13,616 FATAL > [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.append-pool1-t1] > wal.FSHLog: Could not append. Requesting close of wal > java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > 2015-08-06 14:11:13,617 ERROR [sync.4] wal.FSHLog: Error syncing, request > close of wal > java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0] > regionserver.HRegionServer: ABORTING region server > hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception > while closing region > SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., > still finishing close > java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0] > regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: > [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, > org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, > org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, > org.apache.phoenix.coprocessor.ScanRegionObserver, > org.apache.phoenix.hbase.index.Indexer, > org.apache.phoenix.coprocessor.SequenceRegionObserver, > org.apache.phoenix.coprocessor.MetaDataEndpointImpl] > 2015-08-06 14:11:13,627 INFO [RS_CLOSE_REGION-hdp-w-0:16020-0] > regionserver.HRegionServer: Dump of metrics as JSON on abort: { > "beans" : [ { > "name" : "java.lang:type=Memory", > "modelerType" : "sun.management.MemoryImpl", > "Verbose" : true, > "HeapMemoryUsage" : { > "committed" : 2104754176, > "init" : 2147483648, > "max" : 2104754176, > "used" : 262288688 > }, > "ObjectPendingFinalizationCount" : 0, > "NonHeapMemoryUsage" : { > "committed" : 137035776, > "init" : 136773632, > "max" : 184549376, > "used" : 49168288 > }, > "ObjectName" : "java.lang:type=Memory" > } ], > "beans" : [ { > "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC", > "modelerType" : "RegionServer,sub=IPC", > "tag.Context" : "regionserver", > "tag.Hostname" : "hdp-w-0" > } ], > "beans" : [ { > "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication", > "modelerType" : "RegionServer,sub=Replication", > "tag.Context" : "regionserver", > "tag.Hostname" : "hdp-w-0" > } ], > "beans" : [ { > "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server", > "modelerType" : "RegionServer,sub=Server", > "tag.Context" : "regionserver", > "tag.Hostname" : "hdp-w-0" > } ] > } > 2015-08-06 14:11:13,640 ERROR [sync.0] wal.FSHLog: Error syncing, request > close of wal > java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > 2015-08-06 14:11:13,640 WARN > [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] > wal.FSHLog: Failed last sync but no outstanding unsync edits so falling > through to close; java.io.IOException: All datanodes > DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] > are bad. Aborting... > 2015-08-06 14:11:13,641 ERROR > [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] > wal.ProtobufLogWriter: Got IOException while writing trailer > java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > 2015-08-06 14:11:13,641 WARN > [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] > wal.FSHLog: Riding over failed WAL close of > hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576, > cause="All datanodes > DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] > are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS > SYNCED SO SHOULD BE OK > 2015-08-06 14:11:13,642 INFO > [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] > wal.FSHLog: Rolled WAL > /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 > with entries=101, filesize=30.38 KB; new WAL > /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617 > 2015-08-06 14:11:13,643 INFO [RS_CLOSE_REGION-hdp-w-0:16020-0] > regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing > region > SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., > still finishing close > 2015-08-06 14:11:13,643 INFO > [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] > wal.FSHLog: Archiving > hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 > to > hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 > 2015-08-06 14:11:13,643 ERROR [RS_CLOSE_REGION-hdp-w-0:16020-0] > executor.EventHandler: Caught throwable while processing event > M_RS_CLOSE_REGION > java.lang.RuntimeException: java.io.IOException: All datanodes > DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] > are bad. Aborting... > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > > ------------ > hbase regionserver log (w-1) > ------------ > 2015-08-06 14:11:14,267 INFO [main-EventThread] > replication.ReplicationTrackerZKImpl: > /hbase-unsecure/rs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 znode > expired, triggering replicatorRemoved event > 2015-08-06 14:12:08,203 INFO [ReplicationExecutor-0] > replication.ReplicationQueuesZKImpl: Atomically moving > hdp-w-0.c.dks-hadoop.internal,16020,1438869946905's wals to my queue > 2015-08-06 14:12:56,252 INFO > [PriorityRpcServer.handler=5,queue=1,port=16020] > regionserver.RSRpcServices: Close 918ed7c6568e7500fb434f4268c5bbc5, moving > to hdp-m.c.dks-hadoop.internal,16020,1438869954062 > 2015-08-06 14:12:56,260 INFO > [StoreCloserThread-SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.-1] > regionserver.HStore: Closed 0 > 2015-08-06 14:12:56,261 FATAL > [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.append-pool1-t1] > wal.FSHLog: Could not append. Requesting close of wal > java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > 2015-08-06 14:12:56,261 ERROR [sync.3] wal.FSHLog: Error syncing, request > close of wal > java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0] > regionserver.HRegionServer: ABORTING region server > hdp-w-1.c.dks-hadoop.internal,16020,1438869946909: Unrecoverable exception > while closing region > SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5., > still finishing close > java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0] > regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: > [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, > org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, > org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, > org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint, > org.apache.phoenix.coprocessor.ScanRegionObserver, > org.apache.phoenix.hbase.index.Indexer, > org.apache.phoenix.coprocessor.SequenceRegionObserver] > 2015-08-06 14:12:56,281 INFO [RS_CLOSE_REGION-hdp-w-1:16020-0] > regionserver.HRegionServer: Dump of metrics as JSON on abort: { > "beans" : [ { > "name" : "java.lang:type=Memory", > "modelerType" : "sun.management.MemoryImpl", > "ObjectPendingFinalizationCount" : 0, > "NonHeapMemoryUsage" : { > "committed" : 137166848, > "init" : 136773632, > "max" : 184549376, > "used" : 48667528 > }, > "HeapMemoryUsage" : { > "committed" : 2104754176, > "init" : 2147483648, > "max" : 2104754176, > "used" : 270075472 > }, > "Verbose" : true, > "ObjectName" : "java.lang:type=Memory" > } ], > "beans" : [ { > "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC", > "modelerType" : "RegionServer,sub=IPC", > "tag.Context" : "regionserver", > "tag.Hostname" : "hdp-w-1" > } ], > "beans" : [ { > "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication", > "modelerType" : "RegionServer,sub=Replication", > "tag.Context" : "regionserver", > "tag.Hostname" : "hdp-w-1" > } ], > "beans" : [ { > "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server", > "modelerType" : "RegionServer,sub=Server", > "tag.Context" : "regionserver", > "tag.Hostname" : "hdp-w-1" > } ] > } > 2015-08-06 14:12:56,284 ERROR [sync.4] wal.FSHLog: Error syncing, request > close of wal > java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > 2015-08-06 14:12:56,285 WARN > [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] > wal.FSHLog: Failed last sync but no outstanding unsync edits so falling > through to close; java.io.IOException: All datanodes > DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] > are bad. Aborting... > 2015-08-06 14:12:56,285 ERROR > [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] > wal.ProtobufLogWriter: Got IOException while writing trailer > java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > 2015-08-06 14:12:56,285 WARN > [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] > wal.FSHLog: Riding over failed WAL close of > hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359, > cause="All datanodes > DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] > are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS > SYNCED SO SHOULD BE OK > 2015-08-06 14:12:56,287 INFO > [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] > wal.FSHLog: Rolled WAL > /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 > with entries=100, filesize=30.73 KB; new WAL > /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438870376262 > 2015-08-06 14:12:56,288 INFO > [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] > wal.FSHLog: Archiving > hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 > to > hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 > 2015-08-06 14:12:56,315 INFO [RS_CLOSE_REGION-hdp-w-1:16020-0] > regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing > region > SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5., > still finishing close > 2015-08-06 14:12:56,315 INFO [regionserver/hdp-w-1.c.dks-hadoop.internal/ > 10.240.2.235:16020] regionserver.SplitLogWorker: Sending interrupt to > stop the worker thread > 2015-08-06 14:12:56,315 ERROR [RS_CLOSE_REGION-hdp-w-1:16020-0] > executor.EventHandler: Caught throwable while processing event > M_RS_CLOSE_REGION > java.lang.RuntimeException: java.io.IOException: All datanodes > DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] > are bad. Aborting... > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[ > 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > > ------------- > m datanode log > ------------- > 2015-07-27 14:11:16,082 INFO datanode.DataNode > (BlockReceiver.java:run(1348)) - PacketResponder: > BP-369072949-10.240.200.196-1437998325049:blk_1073742677_1857, > type=HAS_DOWNSTREAM_IN_PIPELINE terminating > 2015-07-27 14:11:16,132 INFO datanode.DataNode > (DataXceiver.java:writeBlock(655)) - Receiving > BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858 src: / > 10.240.200.196:56767 dest: /10.240.200.196:50010 > 2015-07-27 14:11:16,155 INFO DataNode.clienttrace > (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:56767, > dest: /10.240.200.196:50010, bytes: 117761, op: HDFS_WRITE, cliID: > DFSClient_NONMAPREDUCE_177514816_1, offset: 0, srvID: > 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid: > BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, duration: > 6385289 > 2015-07-27 14:11:16,155 INFO datanode.DataNode > (BlockReceiver.java:run(1348)) - PacketResponder: > BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, > type=HAS_DOWNSTREAM_IN_PIPELINE terminating > 2015-07-27 14:11:16,267 ERROR datanode.DataNode > (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver > error processing unknown operation src: /127.0.0.1:60513 dst: / > 127.0.0.1:50010 > java.io.EOFException > at java.io.DataInputStream.readShort(DataInputStream.java:315) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227) > at java.lang.Thread.run(Thread.java:745) > 2015-07-27 14:11:16,405 INFO datanode.DataNode > (DataNode.java:transferBlock(1943)) - DatanodeRegistration( > 10.240.200.196:50010, datanodeUuid=329bbe62-bcea-4a6d-8c97-e800631deb81, > infoPort=50075, infoSecurePort=0, ipcPort=8010, > storageInfo=lv=-56;cid=CID-1247f294-77a9-4605-b6d3-4c1398bb5db0;nsid=2032226938;c=0) > Starting thread to transfer > BP-369072949-10.240.200.196-1437998325049:blk_1073742649_1829 to > 10.240.2.235:50010 10.240.164.0:50010 > > ------------- > w-0 datanode log > ------------- > 2015-07-27 14:11:25,019 ERROR datanode.DataNode > (DataXceiver.java:run(278)) - > hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown > operation src: /127.0.0.1:47993 dst: /127.0.0.1:50010 > java.io.EOFException > at java.io.DataInputStream.readShort(DataInputStream.java:315) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227) > at java.lang.Thread.run(Thread.java:745) > 2015-07-27 14:11:25,077 INFO DataNode.clienttrace > (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest: > 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073742631, srvID: > a5eea5a8-5112-46da-9f18-64274486c472, success: true > > > ----------------------------- > Thank you in advance, > > Adrià > > >
