Hello,
HBase RegionServers fail once in a while:
- it can be any regionserver, not always de same - it can happen when all
the cluster is idle (at least not executing any human launched task) - it can
happen at any time, not always the same
The cluster versions:
- Phoenix 4.4 (or 4.5) - HBase 1.1.1 - Hadoop/HDFS 2.7.1 - Zookeeper 3.4.6
Some configs:
- ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 103227
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 103227
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
- have increased default timeouts for: hbase rpc, zookeeper session, dks
socket, regionserver lease and client scanner.
Next you can find the logs for the master, the regionserver that failed first,
another failed and the datanode log for master and worker.
The timing was aproximately:
14:05 start hbase
14.11 w-0 down
14.14 w-1 down
14.15 stop hbase
-------------
hbase master log (m)
-------------
2015-08-06 14:11:13,640 ERROR
[PriorityRpcServer.handler=19,queue=1,port=16000] master.MasterRpcServices:
Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 reported a
fatal error:
ABORTING region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905:
Unrecoverable exception while closing region
SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
still finishing close
Cause:
java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
--------------
hbase regionserver log (w-0)
--------------
2015-08-06 14:11:13,611 INFO [PriorityRpcServer.handler=0,queue=0,port=16020]
regionserver.RSRpcServices: Close 888f017eb1c0557fbe7079b50626c891, moving to
hdp-m.c.dks-hadoop.internal,16020,1438869954062
2015-08-06 14:11:13,615 INFO
[StoreCloserThread-SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.-1]
regionserver.HStore: Closed 0
2015-08-06 14:11:13,616 FATAL
[regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.append-pool1-t1]
wal.FSHLog: Could not append. Requesting close of wal
java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
2015-08-06 14:11:13,617 ERROR [sync.4] wal.FSHLog: Error syncing, request
close of wal
java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0]
regionserver.HRegionServer: ABORTING region server
hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception
while closing region
SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
still finishing close
java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0]
regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
[org.apache.phoenix.coprocessor.ServerCachingEndpointImpl,
org.apache.hadoop.hbase.regionserver.LocalIndexSplitter,
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver,
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver,
org.apache.phoenix.coprocessor.ScanRegionObserver,
org.apache.phoenix.hbase.index.Indexer,
org.apache.phoenix.coprocessor.SequenceRegionObserver,
org.apache.phoenix.coprocessor.MetaDataEndpointImpl]
2015-08-06 14:11:13,627 INFO [RS_CLOSE_REGION-hdp-w-0:16020-0]
regionserver.HRegionServer: Dump of metrics as JSON on abort: {
"beans" : [ {
"name" : "java.lang:type=Memory",
"modelerType" : "sun.management.MemoryImpl",
"Verbose" : true,
"HeapMemoryUsage" : {
"committed" : 2104754176,
"init" : 2147483648,
"max" : 2104754176,
"used" : 262288688
},
"ObjectPendingFinalizationCount" : 0,
"NonHeapMemoryUsage" : {
"committed" : 137035776,
"init" : 136773632,
"max" : 184549376,
"used" : 49168288
},
"ObjectName" : "java.lang:type=Memory"
} ],
"beans" : [ {
"name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
"modelerType" : "RegionServer,sub=IPC",
"tag.Context" : "regionserver",
"tag.Hostname" : "hdp-w-0"
} ],
"beans" : [ {
"name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
"modelerType" : "RegionServer,sub=Replication",
"tag.Context" : "regionserver",
"tag.Hostname" : "hdp-w-0"
} ],
"beans" : [ {
"name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
"modelerType" : "RegionServer,sub=Server",
"tag.Context" : "regionserver",
"tag.Hostname" : "hdp-w-0"
} ]
}
2015-08-06 14:11:13,640 ERROR [sync.0] wal.FSHLog: Error syncing, request
close of wal
java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
2015-08-06 14:11:13,640 WARN
[regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
wal.FSHLog: Failed last sync but no outstanding unsync edits so falling through
to close; java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
2015-08-06 14:11:13,641 ERROR
[regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
wal.ProtobufLogWriter: Got IOException while writing trailer
java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
2015-08-06 14:11:13,641 WARN
[regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
wal.FSHLog: Riding over failed WAL close of
hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576,
cause="All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED
SO SHOULD BE OK
2015-08-06 14:11:13,642 INFO
[regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
wal.FSHLog: Rolled WAL
/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
with entries=101, filesize=30.38 KB; new WAL
/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
2015-08-06 14:11:13,643 INFO [RS_CLOSE_REGION-hdp-w-0:16020-0]
regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing
region
SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
still finishing close
2015-08-06 14:11:13,643 INFO
[regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
wal.FSHLog: Archiving
hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
to
hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
2015-08-06 14:11:13,643 ERROR [RS_CLOSE_REGION-hdp-w-0:16020-0]
executor.EventHandler: Caught throwable while processing event M_RS_CLOSE_REGION
java.lang.RuntimeException: java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
------------
hbase regionserver log (w-1)
------------
2015-08-06 14:11:14,267 INFO [main-EventThread]
replication.ReplicationTrackerZKImpl:
/hbase-unsecure/rs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 znode
expired, triggering replicatorRemoved event
2015-08-06 14:12:08,203 INFO [ReplicationExecutor-0]
replication.ReplicationQueuesZKImpl: Atomically moving
hdp-w-0.c.dks-hadoop.internal,16020,1438869946905's wals to my queue
2015-08-06 14:12:56,252 INFO [PriorityRpcServer.handler=5,queue=1,port=16020]
regionserver.RSRpcServices: Close 918ed7c6568e7500fb434f4268c5bbc5, moving to
hdp-m.c.dks-hadoop.internal,16020,1438869954062
2015-08-06 14:12:56,260 INFO
[StoreCloserThread-SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.-1]
regionserver.HStore: Closed 0
2015-08-06 14:12:56,261 FATAL
[regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.append-pool1-t1]
wal.FSHLog: Could not append. Requesting close of wal
java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
2015-08-06 14:12:56,261 ERROR [sync.3] wal.FSHLog: Error syncing, request
close of wal
java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0]
regionserver.HRegionServer: ABORTING region server
hdp-w-1.c.dks-hadoop.internal,16020,1438869946909: Unrecoverable exception
while closing region
SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.,
still finishing close
java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0]
regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
[org.apache.phoenix.coprocessor.ServerCachingEndpointImpl,
org.apache.hadoop.hbase.regionserver.LocalIndexSplitter,
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver,
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver,
org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint,
org.apache.phoenix.coprocessor.ScanRegionObserver,
org.apache.phoenix.hbase.index.Indexer,
org.apache.phoenix.coprocessor.SequenceRegionObserver]
2015-08-06 14:12:56,281 INFO [RS_CLOSE_REGION-hdp-w-1:16020-0]
regionserver.HRegionServer: Dump of metrics as JSON on abort: {
"beans" : [ {
"name" : "java.lang:type=Memory",
"modelerType" : "sun.management.MemoryImpl",
"ObjectPendingFinalizationCount" : 0,
"NonHeapMemoryUsage" : {
"committed" : 137166848,
"init" : 136773632,
"max" : 184549376,
"used" : 48667528
},
"HeapMemoryUsage" : {
"committed" : 2104754176,
"init" : 2147483648,
"max" : 2104754176,
"used" : 270075472
},
"Verbose" : true,
"ObjectName" : "java.lang:type=Memory"
} ],
"beans" : [ {
"name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
"modelerType" : "RegionServer,sub=IPC",
"tag.Context" : "regionserver",
"tag.Hostname" : "hdp-w-1"
} ],
"beans" : [ {
"name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
"modelerType" : "RegionServer,sub=Replication",
"tag.Context" : "regionserver",
"tag.Hostname" : "hdp-w-1"
} ],
"beans" : [ {
"name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
"modelerType" : "RegionServer,sub=Server",
"tag.Context" : "regionserver",
"tag.Hostname" : "hdp-w-1"
} ]
}
2015-08-06 14:12:56,284 ERROR [sync.4] wal.FSHLog: Error syncing, request
close of wal
java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
2015-08-06 14:12:56,285 WARN
[regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
wal.FSHLog: Failed last sync but no outstanding unsync edits so falling through
to close; java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
2015-08-06 14:12:56,285 ERROR
[regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
wal.ProtobufLogWriter: Got IOException while writing trailer
java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
2015-08-06 14:12:56,285 WARN
[regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
wal.FSHLog: Riding over failed WAL close of
hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359,
cause="All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED
SO SHOULD BE OK
2015-08-06 14:12:56,287 INFO
[regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
wal.FSHLog: Rolled WAL
/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
with entries=100, filesize=30.73 KB; new WAL
/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438870376262
2015-08-06 14:12:56,288 INFO
[regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
wal.FSHLog: Archiving
hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
to
hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
2015-08-06 14:12:56,315 INFO [RS_CLOSE_REGION-hdp-w-1:16020-0]
regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing
region
SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.,
still finishing close
2015-08-06 14:12:56,315 INFO
[regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020]
regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
2015-08-06 14:12:56,315 ERROR [RS_CLOSE_REGION-hdp-w-1:16020-0]
executor.EventHandler: Caught throwable while processing event M_RS_CLOSE_REGION
java.lang.RuntimeException: java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: All datanodes
DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
-------------
m datanode log
-------------
2015-07-27 14:11:16,082 INFO datanode.DataNode (BlockReceiver.java:run(1348))
- PacketResponder:
BP-369072949-10.240.200.196-1437998325049:blk_1073742677_1857,
type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2015-07-27 14:11:16,132 INFO datanode.DataNode
(DataXceiver.java:writeBlock(655)) - Receiving
BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858 src:
/10.240.200.196:56767 dest: /10.240.200.196:50010
2015-07-27 14:11:16,155 INFO DataNode.clienttrace
(BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:56767, dest:
/10.240.200.196:50010, bytes: 117761, op: HDFS_WRITE, cliID:
DFSClient_NONMAPREDUCE_177514816_1, offset: 0, srvID:
329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, duration: 6385289
2015-07-27 14:11:16,155 INFO datanode.DataNode (BlockReceiver.java:run(1348))
- PacketResponder:
BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858,
type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2015-07-27 14:11:16,267 ERROR datanode.DataNode (DataXceiver.java:run(278)) -
hdp-m.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
operation src: /127.0.0.1:60513 dst: /127.0.0.1:50010
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:315)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
at java.lang.Thread.run(Thread.java:745)
2015-07-27 14:11:16,405 INFO datanode.DataNode
(DataNode.java:transferBlock(1943)) -
DatanodeRegistration(10.240.200.196:50010,
datanodeUuid=329bbe62-bcea-4a6d-8c97-e800631deb81, infoPort=50075,
infoSecurePort=0, ipcPort=8010,
storageInfo=lv=-56;cid=CID-1247f294-77a9-4605-b6d3-4c1398bb5db0;nsid=2032226938;c=0)
Starting thread to transfer
BP-369072949-10.240.200.196-1437998325049:blk_1073742649_1829 to
10.240.2.235:50010 10.240.164.0:50010
-------------
w-0 datanode log
-------------
2015-07-27 14:11:25,019 ERROR datanode.DataNode (DataXceiver.java:run(278)) -
hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
operation src: /127.0.0.1:47993 dst: /127.0.0.1:50010
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:315)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
at java.lang.Thread.run(Thread.java:745)
2015-07-27 14:11:25,077 INFO DataNode.clienttrace
(DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073742631, srvID:
a5eea5a8-5112-46da-9f18-64274486c472, success: true
-----------------------------
Thank you in advance,
Adrià