About the logs attached in this conversation: only w-0 and w-1 nodes had failed, first w-0 and then w-1 10.240.187.182 = w-2 w-0 internal IP address is 10.240.164.0 w-1 IP is 10.240.2.235 m IP is 10.240.200.196
FSCK (hadoop fsck / | egrep -v '^\.+$' | grep -v eplica) output: - Connecting to namenode via http://hdp-m.c.dks-hadoop.internal:50070/fsck?ugi=root&path=%2F FSCK started by root (auth:SIMPLE) from /10.240.200.196 for path / at Fri Aug 07 14:51:22 UTC 2015 /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438946915810-splitting/hdp-w-0.c.dks-hadoop.internal%2C1602 0%2C1438946915810..meta.1438950914376.meta: MISSING 1 blocks of total size 90 B...... /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438959061234/hdp-w-1.c.dks-hadoop.internal%2C16020%2C143895 9061234.default.1438959069800: MISSING 1 blocks of total size 90 B... /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438959056208/hdp-w-2.c.dks-hadoop.internal%2C16020%2C143895 9056208..meta.1438959068352.meta: MISSING 1 blocks of total size 90 B. /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438959056208/hdp-w-2.c.dks-hadoop.internal%2C16020%2C143895 9056208.default.1438959061922: MISSING 1 blocks of total size 90 B........................... .........Status: CORRUPT Total size: 54919712019 B (Total open files size: 360 B) Total dirs: 1709 Total files: 2628 Total symlinks: 0 (Files currently being written: 6) Total blocks (validated): 2692 (avg. block size 20401081 B) (Total open file blocks (not validated): 4) ******************************** UNDER MIN REPL'D BLOCKS: 4 (0.1485884 %) CORRUPT FILES: 4 MISSING BLOCKS: 4 MISSING SIZE: 360 B ******************************** Corrupt blocks: 0 Number of data-nodes: 4 Number of racks: 1 FSCK ended at Fri Aug 07 14:51:26 UTC 2015 in 4511 milliseconds The filesystem under path '/' is CORRUPT - Thank you for your time. Desde: "Ted Yu" <[email protected]> Enviado: viernes, 07 de agosto de 2015 16:07 Para: "[email protected]" <[email protected]>, [email protected] Asunto: Re: RegionServers shutdown randomly Does 10.240.187.182 correspond with w-0 or m ? Looks like hdfs was intermittently unstable. Have you run fsck ? Cheers On Fri, Aug 7, 2015 at 12:59 AM, Adrià Vilà <[email protected]> wrote: Hello, HBase RegionServers fail once in a while: - it can be any regionserver, not always de same - it can happen when all the cluster is idle (at least not executing any human launched task) - it can happen at any time, not always the same The cluster versions: - Phoenix 4.4 (or 4.5) - HBase 1.1.1 - Hadoop/HDFS 2.7.1 - Zookeeper 3.4.6 Some configs: - ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 103227 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 103227 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited - have increased default timeouts for: hbase rpc, zookeeper session, dks socket, regionserver lease and client scanner. Next you can find the logs for the master, the regionserver that failed first, another failed and the datanode log for master and worker. The timing was aproximately: 14:05 start hbase 14.11 w-0 down 14.14 w-1 down 14.15 stop hbase ------------- hbase master log (m) ------------- 2015-08-06 14:11:13,640 ERROR [PriorityRpcServer.handler=19,queue=1,port=16000] master.MasterRpcServices: Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 reported a fatal error: ABORTING region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception while closing region SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., still finishing close Cause: java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) -------------- hbase regionserver log (w-0) -------------- 2015-08-06 14:11:13,611 INFO [PriorityRpcServer.handler=0,queue=0,port=16020] regionserver.RSRpcServices: Close 888f017eb1c0557fbe7079b50626c891, moving to hdp-m.c.dks-hadoop.internal,16020,1438869954062 2015-08-06 14:11:13,615 INFO [StoreCloserThread-SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.-1] regionserver.HStore: Closed 0 2015-08-06 14:11:13,616 FATAL [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.append-pool1-t1] wal.FSHLog: Could not append. Requesting close of wal java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) 2015-08-06 14:11:13,617 ERROR [sync.4] wal.FSHLog: Error syncing, request close of wal java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0] regionserver.HRegionServer: ABORTING region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception while closing region SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., still finishing close java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, org.apache.phoenix.coprocessor.ScanRegionObserver, org.apache.phoenix.hbase.index.Indexer, org.apache.phoenix.coprocessor.SequenceRegionObserver, org.apache.phoenix.coprocessor.MetaDataEndpointImpl] 2015-08-06 14:11:13,627 INFO [RS_CLOSE_REGION-hdp-w-0:16020-0] regionserver.HRegionServer: Dump of metrics as JSON on abort: { "beans" : [ { "name" : "java.lang:type=Memory", "modelerType" : "sun.management.MemoryImpl", "Verbose" : true, "HeapMemoryUsage" : { "committed" : 2104754176, "init" : 2147483648, "max" : 2104754176, "used" : 262288688 }, "ObjectPendingFinalizationCount" : 0, "NonHeapMemoryUsage" : { "committed" : 137035776, "init" : 136773632, "max" : 184549376, "used" : 49168288 }, "ObjectName" : "java.lang:type=Memory" } ], "beans" : [ { "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC", "modelerType" : "RegionServer,sub=IPC", "tag.Context" : "regionserver", "tag.Hostname" : "hdp-w-0" } ], "beans" : [ { "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication", "modelerType" : "RegionServer,sub=Replication", "tag.Context" : "regionserver", "tag.Hostname" : "hdp-w-0" } ], "beans" : [ { "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server", "modelerType" : "RegionServer,sub=Server", "tag.Context" : "regionserver", "tag.Hostname" : "hdp-w-0" } ] } 2015-08-06 14:11:13,640 ERROR [sync.0] wal.FSHLog: Error syncing, request close of wal java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) 2015-08-06 14:11:13,640 WARN [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] wal.FSHLog: Failed last sync but no outstanding unsync edits so falling through to close; java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... 2015-08-06 14:11:13,641 ERROR [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] wal.ProtobufLogWriter: Got IOException while writing trailer java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) 2015-08-06 14:11:13,641 WARN [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] wal.FSHLog: Riding over failed WAL close of hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576, cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK 2015-08-06 14:11:13,642 INFO [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] wal.FSHLog: Rolled WAL /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 with entries=101, filesize=30.38 KB; new WAL /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617 2015-08-06 14:11:13,643 INFO [RS_CLOSE_REGION-hdp-w-0:16020-0] regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing region SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., still finishing close 2015-08-06 14:11:13,643 INFO [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] wal.FSHLog: Archiving hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 to hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 2015-08-06 14:11:13,643 ERROR [RS_CLOSE_REGION-hdp-w-0:16020-0] executor.EventHandler: Caught throwable while processing event M_RS_CLOSE_REGION java.lang.RuntimeException: java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) ------------ hbase regionserver log (w-1) ------------ 2015-08-06 14:11:14,267 INFO [main-EventThread] replication.ReplicationTrackerZKImpl: /hbase-unsecure/rs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 znode expired, triggering replicatorRemoved event 2015-08-06 14:12:08,203 INFO [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl: Atomically moving hdp-w-0.c.dks-hadoop.internal,16020,1438869946905's wals to my queue 2015-08-06 14:12:56,252 INFO [PriorityRpcServer.handler=5,queue=1,port=16020] regionserver.RSRpcServices: Close 918ed7c6568e7500fb434f4268c5bbc5, moving to hdp-m.c.dks-hadoop.internal,16020,1438869954062 2015-08-06 14:12:56,260 INFO [StoreCloserThread-SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.-1] regionserver.HStore: Closed 0 2015-08-06 14:12:56,261 FATAL [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.append-pool1-t1] wal.FSHLog: Could not append. Requesting close of wal java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) 2015-08-06 14:12:56,261 ERROR [sync.3] wal.FSHLog: Error syncing, request close of wal java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0] regionserver.HRegionServer: ABORTING region server hdp-w-1.c.dks-hadoop.internal,16020,1438869946909: Unrecoverable exception while closing region SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5., still finishing close java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint, org.apache.phoenix.coprocessor.ScanRegionObserver, org.apache.phoenix.hbase.index.Indexer, org.apache.phoenix.coprocessor.SequenceRegionObserver] 2015-08-06 14:12:56,281 INFO [RS_CLOSE_REGION-hdp-w-1:16020-0] regionserver.HRegionServer: Dump of metrics as JSON on abort: { "beans" : [ { "name" : "java.lang:type=Memory", "modelerType" : "sun.management.MemoryImpl", "ObjectPendingFinalizationCount" : 0, "NonHeapMemoryUsage" : { "committed" : 137166848, "init" : 136773632, "max" : 184549376, "used" : 48667528 }, "HeapMemoryUsage" : { "committed" : 2104754176, "init" : 2147483648, "max" : 2104754176, "used" : 270075472 }, "Verbose" : true, "ObjectName" : "java.lang:type=Memory" } ], "beans" : [ { "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC", "modelerType" : "RegionServer,sub=IPC", "tag.Context" : "regionserver", "tag.Hostname" : "hdp-w-1" } ], "beans" : [ { "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication", "modelerType" : "RegionServer,sub=Replication", "tag.Context" : "regionserver", "tag.Hostname" : "hdp-w-1" } ], "beans" : [ { "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server", "modelerType" : "RegionServer,sub=Server", "tag.Context" : "regionserver", "tag.Hostname" : "hdp-w-1" } ] } 2015-08-06 14:12:56,284 ERROR [sync.4] wal.FSHLog: Error syncing, request close of wal java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) 2015-08-06 14:12:56,285 WARN [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] wal.FSHLog: Failed last sync but no outstanding unsync edits so falling through to close; java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... 2015-08-06 14:12:56,285 ERROR [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] wal.ProtobufLogWriter: Got IOException while writing trailer java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) 2015-08-06 14:12:56,285 WARN [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] wal.FSHLog: Riding over failed WAL close of hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359, cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK 2015-08-06 14:12:56,287 INFO [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] wal.FSHLog: Rolled WAL /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 with entries=100, filesize=30.73 KB; new WAL /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438870376262 2015-08-06 14:12:56,288 INFO [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] wal.FSHLog: Archiving hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 to hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 2015-08-06 14:12:56,315 INFO [RS_CLOSE_REGION-hdp-w-1:16020-0] regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing region SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5., still finishing close 2015-08-06 14:12:56,315 INFO [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020] regionserver.SplitLogWorker: Sending interrupt to stop the worker thread 2015-08-06 14:12:56,315 ERROR [RS_CLOSE_REGION-hdp-w-1:16020-0] executor.EventHandler: Caught throwable while processing event M_RS_CLOSE_REGION java.lang.RuntimeException: java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) ------------- m datanode log ------------- 2015-07-27 14:11:16,082 INFO datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073742677_1857, type=HAS_DOWNSTREAM_IN_PIPELINE terminating 2015-07-27 14:11:16,132 INFO datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858 src: /10.240.200.196:56767 dest: /10.240.200.196:50010 2015-07-27 14:11:16,155 INFO DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:56767, dest: /10.240.200.196:50010, bytes: 117761, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_177514816_1, offset: 0, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, duration: 6385289 2015-07-27 14:11:16,155 INFO datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, type=HAS_DOWNSTREAM_IN_PIPELINE terminating 2015-07-27 14:11:16,267 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation src: /127.0.0.1:60513 dst: /127.0.0.1:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227) at java.lang.Thread.run(Thread.java:745) 2015-07-27 14:11:16,405 INFO datanode.DataNode (DataNode.java:transferBlock(1943)) - DatanodeRegistration(10.240.200.196:50010, datanodeUuid=329bbe62-bcea-4a6d-8c97-e800631deb81, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-56;cid=CID-1247f294-77a9-4605-b6d3-4c1398bb5db0;nsid=2032226938;c=0) Starting thread to transfer BP-369072949-10.240.200.196-1437998325049:blk_1073742649_1829 to 10.240.2.235:50010 10.240.164.0:50010 ------------- w-0 datanode log ------------- 2015-07-27 14:11:25,019 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation src: /127.0.0.1:47993 dst: /127.0.0.1:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227) at java.lang.Thread.run(Thread.java:745) 2015-07-27 14:11:25,077 INFO DataNode.clienttrace (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073742631, srvID: a5eea5a8-5112-46da-9f18-64274486c472, success: true ----------------------------- Thank you in advance, Adrià
