Some WAL related files were marked corrupt. Can you try repairing them ?
Please check namenode log. Search HDFS JIRA for any pending fix - I haven't tracked HDFS movement closely recently. Thanks On Fri, Aug 7, 2015 at 7:54 AM, Adrià Vilà <[email protected]> wrote: > About the logs attached in this conversation: only w-0 and w-1 nodes had > failed, first w-0 and then w-1 > 10.240.187.182 = w-2 > w-0 internal IP address is 10.240.164.0 > w-1 IP is 10.240.2.235 > m IP is 10.240.200.196 > > FSCK (hadoop fsck / | egrep -v '^\.+$' | grep -v eplica) output: > - > Connecting to namenode via > http://hdp-m.c.dks-hadoop.internal:50070/fsck?ugi=root&path=%2F FSCK > started by root (auth:SIMPLE) from /10.240.200.196 for path / at Fri Aug > 07 14:51:22 UTC 2015 > /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438946915810-splitting/hdp-w-0.c.dks-hadoop.internal%2C1602 > 0%2C1438946915810..meta.1438950914376.meta: MISSING 1 blocks of total size > 90 B...... > /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438959061234/hdp-w-1.c.dks-hadoop.internal%2C16020%2C143895 > 9061234.default.1438959069800: MISSING 1 blocks of total size 90 B... > /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438959056208/hdp-w-2.c.dks-hadoop.internal%2C16020%2C143895 > 9056208..meta.1438959068352.meta: MISSING 1 blocks of total size 90 B. > /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438959056208/hdp-w-2.c.dks-hadoop.internal%2C16020%2C143895 > 9056208.default.1438959061922: MISSING 1 blocks of total size 90 > B........................... > > .........Status: CORRUPT > Total size: 54919712019 B (Total open files size: 360 B) > Total dirs: 1709 Total files: 2628 > Total symlinks: 0 (Files currently being written: 6) > Total blocks (validated): 2692 (avg. block size 20401081 B) (Total open > file blocks (not validated): 4) > ******************************** > UNDER MIN REPL'D BLOCKS: 4 (0.1485884 %) > CORRUPT FILES: 4 > MISSING BLOCKS: 4 > MISSING SIZE: 360 B > ******************************** > Corrupt blocks: 0 > Number of data-nodes: 4 > Number of racks: 1 > FSCK ended at Fri Aug 07 14:51:26 UTC 2015 in 4511 milliseconds > > The filesystem under path '/' is CORRUPT > - > > Thank you for your time. > > *Desde*: "Ted Yu" <[email protected]> > *Enviado*: viernes, 07 de agosto de 2015 16:07 > *Para*: "[email protected]" <[email protected]>, > [email protected] > *Asunto*: Re: RegionServers shutdown randomly > > Does 10.240.187.182 <http://10.240.187.182:50010/> correspond with w-0 or > m ? > > Looks like hdfs was intermittently unstable. > Have you run fsck ? > > Cheers > > On Fri, Aug 7, 2015 at 12:59 AM, Adrià Vilà <[email protected]> wrote: >> >> Hello, >> >> HBase RegionServers fail once in a while: >> - it can be any regionserver, not always de same - it can happen when >> all the cluster is idle (at least not executing any human launched task) >> - it can happen at any time, not always the same >> >> The cluster versions: >> - Phoenix 4.4 (or 4.5) - HBase 1.1.1 - Hadoop/HDFS 2.7.1 - Zookeeper >> 3.4.6 Some configs: >> - ulimit -a >> core file size (blocks, -c) 0 >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 103227 >> max locked memory (kbytes, -l) 64 >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size (512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 10240 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 103227 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> - have increased default timeouts for: hbase rpc, zookeeper session, dks >> socket, regionserver lease and client scanner. >> >> Next you can find the logs for the master, the regionserver that failed >> first, another failed and the datanode log for master and worker. >> >> >> The timing was aproximately: >> 14:05 start hbase >> 14.11 w-0 down >> 14.14 w-1 down >> 14.15 stop hbase >> >> >> ------------- >> hbase master log (m) >> ------------- >> 2015-08-06 14:11:13,640 ERROR >> [PriorityRpcServer.handler=19,queue=1,port=16000] master.MasterRpcServices: >> Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 reported a >> fatal error: >> ABORTING region server >> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception >> while closing region >> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., >> still finishing close >> Cause: >> java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> >> -------------- >> hbase regionserver log (w-0) >> -------------- >> 2015-08-06 14:11:13,611 INFO >> [PriorityRpcServer.handler=0,queue=0,port=16020] >> regionserver.RSRpcServices: Close 888f017eb1c0557fbe7079b50626c891, moving >> to hdp-m.c.dks-hadoop.internal,16020,1438869954062 >> 2015-08-06 14:11:13,615 INFO >> [StoreCloserThread-SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.-1] >> regionserver.HStore: Closed 0 >> 2015-08-06 14:11:13,616 FATAL >> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.append-pool1-t1] >> wal.FSHLog: Could not append. Requesting close of wal >> java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> 2015-08-06 14:11:13,617 ERROR [sync.4] wal.FSHLog: Error syncing, >> request close of wal >> java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0] >> regionserver.HRegionServer: ABORTING region server >> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception >> while closing region >> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., >> still finishing close >> java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0] >> regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: >> [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, >> org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, >> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, >> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, >> org.apache.phoenix.coprocessor.ScanRegionObserver, >> org.apache.phoenix.hbase.index.Indexer, >> org.apache.phoenix.coprocessor.SequenceRegionObserver, >> org.apache.phoenix.coprocessor.MetaDataEndpointImpl] >> 2015-08-06 14:11:13,627 INFO [RS_CLOSE_REGION-hdp-w-0:16020-0] >> regionserver.HRegionServer: Dump of metrics as JSON on abort: { >> "beans" : [ { >> "name" : "java.lang:type=Memory", >> "modelerType" : "sun.management.MemoryImpl", >> "Verbose" : true, >> "HeapMemoryUsage" : { >> "committed" : 2104754176, >> "init" : 2147483648, >> "max" : 2104754176, >> "used" : 262288688 >> }, >> "ObjectPendingFinalizationCount" : 0, >> "NonHeapMemoryUsage" : { >> "committed" : 137035776, >> "init" : 136773632, >> "max" : 184549376, >> "used" : 49168288 >> }, >> "ObjectName" : "java.lang:type=Memory" >> } ], >> "beans" : [ { >> "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC", >> "modelerType" : "RegionServer,sub=IPC", >> "tag.Context" : "regionserver", >> "tag.Hostname" : "hdp-w-0" >> } ], >> "beans" : [ { >> "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication", >> "modelerType" : "RegionServer,sub=Replication", >> "tag.Context" : "regionserver", >> "tag.Hostname" : "hdp-w-0" >> } ], >> "beans" : [ { >> "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server", >> "modelerType" : "RegionServer,sub=Server", >> "tag.Context" : "regionserver", >> "tag.Hostname" : "hdp-w-0" >> } ] >> } >> 2015-08-06 14:11:13,640 ERROR [sync.0] wal.FSHLog: Error syncing, >> request close of wal >> java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> 2015-08-06 14:11:13,640 WARN >> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] >> wal.FSHLog: Failed last sync but no outstanding unsync edits so falling >> through to close; java.io.IOException: All datanodes >> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] >> are bad. Aborting... >> 2015-08-06 14:11:13,641 ERROR >> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] >> wal.ProtobufLogWriter: Got IOException while writing trailer >> java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> 2015-08-06 14:11:13,641 WARN >> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] >> wal.FSHLog: Riding over failed WAL close of >> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576, >> cause="All datanodes >> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] >> are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS >> SYNCED SO SHOULD BE OK >> 2015-08-06 14:11:13,642 INFO >> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] >> wal.FSHLog: Rolled WAL >> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 >> with entries=101, filesize=30.38 KB; new WAL >> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617 >> 2015-08-06 14:11:13,643 INFO [RS_CLOSE_REGION-hdp-w-0:16020-0] >> regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing >> region >> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., >> still finishing close >> 2015-08-06 14:11:13,643 INFO >> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] >> wal.FSHLog: Archiving >> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 >> to >> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 >> 2015-08-06 14:11:13,643 ERROR [RS_CLOSE_REGION-hdp-w-0:16020-0] >> executor.EventHandler: Caught throwable while processing event >> M_RS_CLOSE_REGION >> java.lang.RuntimeException: java.io.IOException: All datanodes >> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] >> are bad. Aborting... >> at >> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152) >> at >> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> >> ------------ >> hbase regionserver log (w-1) >> ------------ >> 2015-08-06 14:11:14,267 INFO [main-EventThread] >> replication.ReplicationTrackerZKImpl: >> /hbase-unsecure/rs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 znode >> expired, triggering replicatorRemoved event >> 2015-08-06 14:12:08,203 INFO [ReplicationExecutor-0] >> replication.ReplicationQueuesZKImpl: Atomically moving >> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905's wals to my queue >> 2015-08-06 14:12:56,252 INFO >> [PriorityRpcServer.handler=5,queue=1,port=16020] >> regionserver.RSRpcServices: Close 918ed7c6568e7500fb434f4268c5bbc5, moving >> to hdp-m.c.dks-hadoop.internal,16020,1438869954062 >> 2015-08-06 14:12:56,260 INFO >> [StoreCloserThread-SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.-1] >> regionserver.HStore: Closed 0 >> 2015-08-06 14:12:56,261 FATAL >> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.append-pool1-t1] >> wal.FSHLog: Could not append. Requesting close of wal >> java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> 2015-08-06 14:12:56,261 ERROR [sync.3] wal.FSHLog: Error syncing, >> request close of wal >> java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0] >> regionserver.HRegionServer: ABORTING region server >> hdp-w-1.c.dks-hadoop.internal,16020,1438869946909: Unrecoverable exception >> while closing region >> SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5., >> still finishing close >> java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0] >> regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: >> [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, >> org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, >> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, >> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, >> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint, >> org.apache.phoenix.coprocessor.ScanRegionObserver, >> org.apache.phoenix.hbase.index.Indexer, >> org.apache.phoenix.coprocessor.SequenceRegionObserver] >> 2015-08-06 14:12:56,281 INFO [RS_CLOSE_REGION-hdp-w-1:16020-0] >> regionserver.HRegionServer: Dump of metrics as JSON on abort: { >> "beans" : [ { >> "name" : "java.lang:type=Memory", >> "modelerType" : "sun.management.MemoryImpl", >> "ObjectPendingFinalizationCount" : 0, >> "NonHeapMemoryUsage" : { >> "committed" : 137166848, >> "init" : 136773632, >> "max" : 184549376, >> "used" : 48667528 >> }, >> "HeapMemoryUsage" : { >> "committed" : 2104754176, >> "init" : 2147483648, >> "max" : 2104754176, >> "used" : 270075472 >> }, >> "Verbose" : true, >> "ObjectName" : "java.lang:type=Memory" >> } ], >> "beans" : [ { >> "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC", >> "modelerType" : "RegionServer,sub=IPC", >> "tag.Context" : "regionserver", >> "tag.Hostname" : "hdp-w-1" >> } ], >> "beans" : [ { >> "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication", >> "modelerType" : "RegionServer,sub=Replication", >> "tag.Context" : "regionserver", >> "tag.Hostname" : "hdp-w-1" >> } ], >> "beans" : [ { >> "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server", >> "modelerType" : "RegionServer,sub=Server", >> "tag.Context" : "regionserver", >> "tag.Hostname" : "hdp-w-1" >> } ] >> } >> 2015-08-06 14:12:56,284 ERROR [sync.4] wal.FSHLog: Error syncing, >> request close of wal >> java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> 2015-08-06 14:12:56,285 WARN >> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] >> wal.FSHLog: Failed last sync but no outstanding unsync edits so falling >> through to close; java.io.IOException: All datanodes >> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] >> are bad. Aborting... >> 2015-08-06 14:12:56,285 ERROR >> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] >> wal.ProtobufLogWriter: Got IOException while writing trailer >> java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> 2015-08-06 14:12:56,285 WARN >> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] >> wal.FSHLog: Riding over failed WAL close of >> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359, >> cause="All datanodes >> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] >> are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS >> SYNCED SO SHOULD BE OK >> 2015-08-06 14:12:56,287 INFO >> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] >> wal.FSHLog: Rolled WAL >> /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 >> with entries=100, filesize=30.73 KB; new WAL >> /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438870376262 >> 2015-08-06 14:12:56,288 INFO >> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] >> wal.FSHLog: Archiving >> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 >> to >> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 >> 2015-08-06 14:12:56,315 INFO [RS_CLOSE_REGION-hdp-w-1:16020-0] >> regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing >> region >> SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5., >> still finishing close >> 2015-08-06 14:12:56,315 INFO >> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020] >> regionserver.SplitLogWorker: Sending interrupt to stop the worker thread >> 2015-08-06 14:12:56,315 ERROR [RS_CLOSE_REGION-hdp-w-1:16020-0] >> executor.EventHandler: Caught throwable while processing event >> M_RS_CLOSE_REGION >> java.lang.RuntimeException: java.io.IOException: All datanodes >> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] >> are bad. Aborting... >> at >> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152) >> at >> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[ >> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are >> bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) >> >> ------------- >> m datanode log >> ------------- >> 2015-07-27 14:11:16,082 INFO datanode.DataNode >> (BlockReceiver.java:run(1348)) - PacketResponder: >> BP-369072949-10.240.200.196-1437998325049:blk_1073742677_1857, >> type=HAS_DOWNSTREAM_IN_PIPELINE terminating >> 2015-07-27 14:11:16,132 INFO datanode.DataNode >> (DataXceiver.java:writeBlock(655)) - Receiving >> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858 src: / >> 10.240.200.196:56767 dest: /10.240.200.196:50010 >> 2015-07-27 14:11:16,155 INFO DataNode.clienttrace >> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:56767, >> dest: /10.240.200.196:50010, bytes: 117761, op: HDFS_WRITE, cliID: >> DFSClient_NONMAPREDUCE_177514816_1, offset: 0, srvID: >> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid: >> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, duration: >> 6385289 >> 2015-07-27 14:11:16,155 INFO datanode.DataNode >> (BlockReceiver.java:run(1348)) - PacketResponder: >> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, >> type=HAS_DOWNSTREAM_IN_PIPELINE terminating >> 2015-07-27 14:11:16,267 ERROR datanode.DataNode >> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver >> error processing unknown operation src: /127.0.0.1:60513 dst: / >> 127.0.0.1:50010 >> java.io.EOFException >> at java.io.DataInputStream.readShort(DataInputStream.java:315) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227) >> at java.lang.Thread.run(Thread.java:745) >> 2015-07-27 14:11:16,405 INFO datanode.DataNode >> (DataNode.java:transferBlock(1943)) - DatanodeRegistration( >> 10.240.200.196:50010, datanodeUuid=329bbe62-bcea-4a6d-8c97-e800631deb81, >> infoPort=50075, infoSecurePort=0, ipcPort=8010, >> storageInfo=lv=-56;cid=CID-1247f294-77a9-4605-b6d3-4c1398bb5db0;nsid=2032226938;c=0) >> Starting thread to transfer >> BP-369072949-10.240.200.196-1437998325049:blk_1073742649_1829 to >> 10.240.2.235:50010 10.240.164.0:50010 >> >> ------------- >> w-0 datanode log >> ------------- >> 2015-07-27 14:11:25,019 ERROR datanode.DataNode >> (DataXceiver.java:run(278)) - >> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown >> operation src: /127.0.0.1:47993 dst: /127.0.0.1:50010 >> java.io.EOFException >> at java.io.DataInputStream.readShort(DataInputStream.java:315) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227) >> at java.lang.Thread.run(Thread.java:745) >> 2015-07-27 14:11:25,077 INFO DataNode.clienttrace >> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest: >> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073742631, srvID: >> a5eea5a8-5112-46da-9f18-64274486c472, success: true >> >> >> ----------------------------- >> Thank you in advance, >> >> Adrià >> >> > >
