>From what I heard, reporting of CORRUPT for WAL related files was false alarm.
There is no evidence that hbase 1.1 produces corrupt WAL files. Cheers On Fri, Aug 7, 2015 at 7:59 PM, James Estes <[email protected]> wrote: > There is this > > http://mail-archives.apache.org/mod_mbox/hbase-user/201507.mbox/%3CCAE8tVdmyUfG%2BajK0gvMG_tLjoStZ0HjrQxJuuJzQ3Z%2B4vbzSuQ%40mail.gmail.com%3E > Which points to > https://issues.apache.org/jira/browse/HDFS-8809 > > But (at least for us) this hasn't lead to region server > crashing...though I'm definitely interested in what issues it may be > able to cause. > > James > > > On Fri, Aug 7, 2015 at 11:05 AM, Ted Yu <[email protected]> wrote: > > Some WAL related files were marked corrupt. > > > > Can you try repairing them ? > > > > Please check namenode log. > > Search HDFS JIRA for any pending fix - I haven't tracked HDFS movement > > closely recently. > > > > Thanks > > > > On Fri, Aug 7, 2015 at 7:54 AM, Adrià Vilà <[email protected]> wrote: > > > >> About the logs attached in this conversation: only w-0 and w-1 nodes had > >> failed, first w-0 and then w-1 > >> 10.240.187.182 = w-2 > >> w-0 internal IP address is 10.240.164.0 > >> w-1 IP is 10.240.2.235 > >> m IP is 10.240.200.196 > >> > >> FSCK (hadoop fsck / | egrep -v '^\.+$' | grep -v eplica) output: > >> - > >> Connecting to namenode via > >> http://hdp-m.c.dks-hadoop.internal:50070/fsck?ugi=root&path=%2F FSCK > >> started by root (auth:SIMPLE) from /10.240.200.196 for path / at Fri > Aug > >> 07 14:51:22 UTC 2015 > >> > /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438946915810-splitting/hdp-w-0.c.dks-hadoop.internal%2C1602 > >> 0%2C1438946915810..meta.1438950914376.meta: MISSING 1 blocks of total > size > >> 90 B...... > >> > /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438959061234/hdp-w-1.c.dks-hadoop.internal%2C16020%2C143895 > >> 9061234.default.1438959069800: MISSING 1 blocks of total size 90 B... > >> > /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438959056208/hdp-w-2.c.dks-hadoop.internal%2C16020%2C143895 > >> 9056208..meta.1438959068352.meta: MISSING 1 blocks of total size 90 B. > >> > /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438959056208/hdp-w-2.c.dks-hadoop.internal%2C16020%2C143895 > >> 9056208.default.1438959061922: MISSING 1 blocks of total size 90 > >> B........................... > >> > >> .........Status: CORRUPT > >> Total size: 54919712019 B (Total open files size: 360 B) > >> Total dirs: 1709 Total files: 2628 > >> Total symlinks: 0 (Files currently being written: 6) > >> Total blocks (validated): 2692 (avg. block size 20401081 B) (Total open > >> file blocks (not validated): 4) > >> ******************************** > >> UNDER MIN REPL'D BLOCKS: 4 (0.1485884 %) > >> CORRUPT FILES: 4 > >> MISSING BLOCKS: 4 > >> MISSING SIZE: 360 B > >> ******************************** > >> Corrupt blocks: 0 > >> Number of data-nodes: 4 > >> Number of racks: 1 > >> FSCK ended at Fri Aug 07 14:51:26 UTC 2015 in 4511 milliseconds > >> > >> The filesystem under path '/' is CORRUPT > >> - > >> > >> Thank you for your time. > >> > >> *Desde*: "Ted Yu" <[email protected]> > >> *Enviado*: viernes, 07 de agosto de 2015 16:07 > >> *Para*: "[email protected]" <[email protected]>, > >> [email protected] > >> *Asunto*: Re: RegionServers shutdown randomly > >> > >> Does 10.240.187.182 <http://10.240.187.182:50010/> correspond with w-0 > or > >> m ? > >> > >> Looks like hdfs was intermittently unstable. > >> Have you run fsck ? > >> > >> Cheers > >> > >> On Fri, Aug 7, 2015 at 12:59 AM, Adrià Vilà <[email protected]> > wrote: > >>> > >>> Hello, > >>> > >>> HBase RegionServers fail once in a while: > >>> - it can be any regionserver, not always de same - it can happen > when > >>> all the cluster is idle (at least not executing any human launched > task) > >>> - it can happen at any time, not always the same > >>> > >>> The cluster versions: > >>> - Phoenix 4.4 (or 4.5) - HBase 1.1.1 - Hadoop/HDFS 2.7.1 - > Zookeeper > >>> 3.4.6 Some configs: > >>> - ulimit -a > >>> core file size (blocks, -c) 0 > >>> data seg size (kbytes, -d) unlimited > >>> scheduling priority (-e) 0 > >>> file size (blocks, -f) unlimited > >>> pending signals (-i) 103227 > >>> max locked memory (kbytes, -l) 64 > >>> max memory size (kbytes, -m) unlimited > >>> open files (-n) 1024 > >>> pipe size (512 bytes, -p) 8 > >>> POSIX message queues (bytes, -q) 819200 > >>> real-time priority (-r) 0 > >>> stack size (kbytes, -s) 10240 > >>> cpu time (seconds, -t) unlimited > >>> max user processes (-u) 103227 > >>> virtual memory (kbytes, -v) unlimited > >>> file locks (-x) unlimited > >>> - have increased default timeouts for: hbase rpc, zookeeper session, > dks > >>> socket, regionserver lease and client scanner. > >>> > >>> Next you can find the logs for the master, the regionserver that > failed > >>> first, another failed and the datanode log for master and worker. > >>> > >>> > >>> The timing was aproximately: > >>> 14:05 start hbase > >>> 14.11 w-0 down > >>> 14.14 w-1 down > >>> 14.15 stop hbase > >>> > >>> > >>> ------------- > >>> hbase master log (m) > >>> ------------- > >>> 2015-08-06 14:11:13,640 ERROR > >>> [PriorityRpcServer.handler=19,queue=1,port=16000] > master.MasterRpcServices: > >>> Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 > reported a > >>> fatal error: > >>> ABORTING region server > >>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable > exception > >>> while closing region > >>> > SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., > >>> still finishing close > >>> Cause: > >>> java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> > >>> -------------- > >>> hbase regionserver log (w-0) > >>> -------------- > >>> 2015-08-06 14:11:13,611 INFO > >>> [PriorityRpcServer.handler=0,queue=0,port=16020] > >>> regionserver.RSRpcServices: Close 888f017eb1c0557fbe7079b50626c891, > moving > >>> to hdp-m.c.dks-hadoop.internal,16020,1438869954062 > >>> 2015-08-06 14:11:13,615 INFO > >>> > [StoreCloserThread-SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.-1] > >>> regionserver.HStore: Closed 0 > >>> 2015-08-06 14:11:13,616 FATAL > >>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020 > .append-pool1-t1] > >>> wal.FSHLog: Could not append. Requesting close of wal > >>> java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> 2015-08-06 14:11:13,617 ERROR [sync.4] wal.FSHLog: Error syncing, > >>> request close of wal > >>> java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0] > >>> regionserver.HRegionServer: ABORTING region server > >>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable > exception > >>> while closing region > >>> > SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., > >>> still finishing close > >>> java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0] > >>> regionserver.HRegionServer: RegionServer abort: loaded coprocessors > are: > >>> [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, > >>> org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, > >>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, > >>> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, > >>> org.apache.phoenix.coprocessor.ScanRegionObserver, > >>> org.apache.phoenix.hbase.index.Indexer, > >>> org.apache.phoenix.coprocessor.SequenceRegionObserver, > >>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl] > >>> 2015-08-06 14:11:13,627 INFO [RS_CLOSE_REGION-hdp-w-0:16020-0] > >>> regionserver.HRegionServer: Dump of metrics as JSON on abort: { > >>> "beans" : [ { > >>> "name" : "java.lang:type=Memory", > >>> "modelerType" : "sun.management.MemoryImpl", > >>> "Verbose" : true, > >>> "HeapMemoryUsage" : { > >>> "committed" : 2104754176, > >>> "init" : 2147483648, > >>> "max" : 2104754176, > >>> "used" : 262288688 > >>> }, > >>> "ObjectPendingFinalizationCount" : 0, > >>> "NonHeapMemoryUsage" : { > >>> "committed" : 137035776, > >>> "init" : 136773632, > >>> "max" : 184549376, > >>> "used" : 49168288 > >>> }, > >>> "ObjectName" : "java.lang:type=Memory" > >>> } ], > >>> "beans" : [ { > >>> "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC", > >>> "modelerType" : "RegionServer,sub=IPC", > >>> "tag.Context" : "regionserver", > >>> "tag.Hostname" : "hdp-w-0" > >>> } ], > >>> "beans" : [ { > >>> "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication", > >>> "modelerType" : "RegionServer,sub=Replication", > >>> "tag.Context" : "regionserver", > >>> "tag.Hostname" : "hdp-w-0" > >>> } ], > >>> "beans" : [ { > >>> "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server", > >>> "modelerType" : "RegionServer,sub=Server", > >>> "tag.Context" : "regionserver", > >>> "tag.Hostname" : "hdp-w-0" > >>> } ] > >>> } > >>> 2015-08-06 14:11:13,640 ERROR [sync.0] wal.FSHLog: Error syncing, > >>> request close of wal > >>> java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> 2015-08-06 14:11:13,640 WARN > >>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020 > .logRoller] > >>> wal.FSHLog: Failed last sync but no outstanding unsync edits so falling > >>> through to close; java.io.IOException: All datanodes > >>> DatanodeInfoWithStorage[10.240.187.182:50010 > ,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] > >>> are bad. Aborting... > >>> 2015-08-06 14:11:13,641 ERROR > >>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020 > .logRoller] > >>> wal.ProtobufLogWriter: Got IOException while writing trailer > >>> java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> 2015-08-06 14:11:13,641 WARN > >>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020 > .logRoller] > >>> wal.FSHLog: Riding over failed WAL close of > >>> > hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576, > >>> cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010 > ,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] > >>> are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS > >>> SYNCED SO SHOULD BE OK > >>> 2015-08-06 14:11:13,642 INFO > >>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020 > .logRoller] > >>> wal.FSHLog: Rolled WAL > >>> > /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 > >>> with entries=101, filesize=30.38 KB; new WAL > >>> > /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617 > >>> 2015-08-06 14:11:13,643 INFO [RS_CLOSE_REGION-hdp-w-0:16020-0] > >>> regionserver.HRegionServer: STOPPED: Unrecoverable exception while > closing > >>> region > >>> > SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., > >>> still finishing close > >>> 2015-08-06 14:11:13,643 INFO > >>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020 > .logRoller] > >>> wal.FSHLog: Archiving > >>> > hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 > >>> to > >>> > hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 > >>> 2015-08-06 14:11:13,643 ERROR [RS_CLOSE_REGION-hdp-w-0:16020-0] > >>> executor.EventHandler: Caught throwable while processing event > >>> M_RS_CLOSE_REGION > >>> java.lang.RuntimeException: java.io.IOException: All datanodes > >>> DatanodeInfoWithStorage[10.240.187.182:50010 > ,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] > >>> are bad. Aborting... > >>> at > >>> > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152) > >>> at > >>> > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > >>> at > >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >>> at > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >>> at java.lang.Thread.run(Thread.java:745) > >>> Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> > >>> ------------ > >>> hbase regionserver log (w-1) > >>> ------------ > >>> 2015-08-06 14:11:14,267 INFO [main-EventThread] > >>> replication.ReplicationTrackerZKImpl: > >>> /hbase-unsecure/rs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 > znode > >>> expired, triggering replicatorRemoved event > >>> 2015-08-06 14:12:08,203 INFO [ReplicationExecutor-0] > >>> replication.ReplicationQueuesZKImpl: Atomically moving > >>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905's wals to my queue > >>> 2015-08-06 14:12:56,252 INFO > >>> [PriorityRpcServer.handler=5,queue=1,port=16020] > >>> regionserver.RSRpcServices: Close 918ed7c6568e7500fb434f4268c5bbc5, > moving > >>> to hdp-m.c.dks-hadoop.internal,16020,1438869954062 > >>> 2015-08-06 14:12:56,260 INFO > >>> > [StoreCloserThread-SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.-1] > >>> regionserver.HStore: Closed 0 > >>> 2015-08-06 14:12:56,261 FATAL > >>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020 > .append-pool1-t1] > >>> wal.FSHLog: Could not append. Requesting close of wal > >>> java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> 2015-08-06 14:12:56,261 ERROR [sync.3] wal.FSHLog: Error syncing, > >>> request close of wal > >>> java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0] > >>> regionserver.HRegionServer: ABORTING region server > >>> hdp-w-1.c.dks-hadoop.internal,16020,1438869946909: Unrecoverable > exception > >>> while closing region > >>> > SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5., > >>> still finishing close > >>> java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0] > >>> regionserver.HRegionServer: RegionServer abort: loaded coprocessors > are: > >>> [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, > >>> org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, > >>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, > >>> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, > >>> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint, > >>> org.apache.phoenix.coprocessor.ScanRegionObserver, > >>> org.apache.phoenix.hbase.index.Indexer, > >>> org.apache.phoenix.coprocessor.SequenceRegionObserver] > >>> 2015-08-06 14:12:56,281 INFO [RS_CLOSE_REGION-hdp-w-1:16020-0] > >>> regionserver.HRegionServer: Dump of metrics as JSON on abort: { > >>> "beans" : [ { > >>> "name" : "java.lang:type=Memory", > >>> "modelerType" : "sun.management.MemoryImpl", > >>> "ObjectPendingFinalizationCount" : 0, > >>> "NonHeapMemoryUsage" : { > >>> "committed" : 137166848, > >>> "init" : 136773632, > >>> "max" : 184549376, > >>> "used" : 48667528 > >>> }, > >>> "HeapMemoryUsage" : { > >>> "committed" : 2104754176, > >>> "init" : 2147483648, > >>> "max" : 2104754176, > >>> "used" : 270075472 > >>> }, > >>> "Verbose" : true, > >>> "ObjectName" : "java.lang:type=Memory" > >>> } ], > >>> "beans" : [ { > >>> "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC", > >>> "modelerType" : "RegionServer,sub=IPC", > >>> "tag.Context" : "regionserver", > >>> "tag.Hostname" : "hdp-w-1" > >>> } ], > >>> "beans" : [ { > >>> "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication", > >>> "modelerType" : "RegionServer,sub=Replication", > >>> "tag.Context" : "regionserver", > >>> "tag.Hostname" : "hdp-w-1" > >>> } ], > >>> "beans" : [ { > >>> "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server", > >>> "modelerType" : "RegionServer,sub=Server", > >>> "tag.Context" : "regionserver", > >>> "tag.Hostname" : "hdp-w-1" > >>> } ] > >>> } > >>> 2015-08-06 14:12:56,284 ERROR [sync.4] wal.FSHLog: Error syncing, > >>> request close of wal > >>> java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> 2015-08-06 14:12:56,285 WARN > >>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020 > .logRoller] > >>> wal.FSHLog: Failed last sync but no outstanding unsync edits so falling > >>> through to close; java.io.IOException: All datanodes > >>> DatanodeInfoWithStorage[10.240.187.182:50010 > ,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] > >>> are bad. Aborting... > >>> 2015-08-06 14:12:56,285 ERROR > >>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020 > .logRoller] > >>> wal.ProtobufLogWriter: Got IOException while writing trailer > >>> java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> 2015-08-06 14:12:56,285 WARN > >>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020 > .logRoller] > >>> wal.FSHLog: Riding over failed WAL close of > >>> > hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359, > >>> cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010 > ,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] > >>> are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS > >>> SYNCED SO SHOULD BE OK > >>> 2015-08-06 14:12:56,287 INFO > >>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020 > .logRoller] > >>> wal.FSHLog: Rolled WAL > >>> > /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 > >>> with entries=100, filesize=30.73 KB; new WAL > >>> > /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438870376262 > >>> 2015-08-06 14:12:56,288 INFO > >>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020 > .logRoller] > >>> wal.FSHLog: Archiving > >>> > hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 > >>> to > >>> > hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 > >>> 2015-08-06 14:12:56,315 INFO [RS_CLOSE_REGION-hdp-w-1:16020-0] > >>> regionserver.HRegionServer: STOPPED: Unrecoverable exception while > closing > >>> region > >>> > SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5., > >>> still finishing close > >>> 2015-08-06 14:12:56,315 INFO > >>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020] > >>> regionserver.SplitLogWorker: Sending interrupt to stop the worker > thread > >>> 2015-08-06 14:12:56,315 ERROR [RS_CLOSE_REGION-hdp-w-1:16020-0] > >>> executor.EventHandler: Caught throwable while processing event > >>> M_RS_CLOSE_REGION > >>> java.lang.RuntimeException: java.io.IOException: All datanodes > >>> DatanodeInfoWithStorage[10.240.187.182:50010 > ,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] > >>> are bad. Aborting... > >>> at > >>> > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152) > >>> at > >>> > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > >>> at > >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >>> at > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >>> at java.lang.Thread.run(Thread.java:745) > >>> Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[ > >>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are > >>> bad. Aborting... > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909) > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) > >>> > >>> ------------- > >>> m datanode log > >>> ------------- > >>> 2015-07-27 14:11:16,082 INFO datanode.DataNode > >>> (BlockReceiver.java:run(1348)) - PacketResponder: > >>> BP-369072949-10.240.200.196-1437998325049:blk_1073742677_1857, > >>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating > >>> 2015-07-27 14:11:16,132 INFO datanode.DataNode > >>> (DataXceiver.java:writeBlock(655)) - Receiving > >>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858 src: / > >>> 10.240.200.196:56767 dest: /10.240.200.196:50010 > >>> 2015-07-27 14:11:16,155 INFO DataNode.clienttrace > >>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:56767, > >>> dest: /10.240.200.196:50010, bytes: 117761, op: HDFS_WRITE, cliID: > >>> DFSClient_NONMAPREDUCE_177514816_1, offset: 0, srvID: > >>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid: > >>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, > duration: > >>> 6385289 > >>> 2015-07-27 14:11:16,155 INFO datanode.DataNode > >>> (BlockReceiver.java:run(1348)) - PacketResponder: > >>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, > >>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating > >>> 2015-07-27 14:11:16,267 ERROR datanode.DataNode > >>> (DataXceiver.java:run(278)) - > hdp-m.c.dks-hadoop.internal:50010:DataXceiver > >>> error processing unknown operation src: /127.0.0.1:60513 dst: / > >>> 127.0.0.1:50010 > >>> java.io.EOFException > >>> at java.io.DataInputStream.readShort(DataInputStream.java:315) > >>> at > >>> > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) > >>> at > >>> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227) > >>> at java.lang.Thread.run(Thread.java:745) > >>> 2015-07-27 14:11:16,405 INFO datanode.DataNode > >>> (DataNode.java:transferBlock(1943)) - DatanodeRegistration( > >>> 10.240.200.196:50010, > datanodeUuid=329bbe62-bcea-4a6d-8c97-e800631deb81, > >>> infoPort=50075, infoSecurePort=0, ipcPort=8010, > >>> > storageInfo=lv=-56;cid=CID-1247f294-77a9-4605-b6d3-4c1398bb5db0;nsid=2032226938;c=0) > >>> Starting thread to transfer > >>> BP-369072949-10.240.200.196-1437998325049:blk_1073742649_1829 to > >>> 10.240.2.235:50010 10.240.164.0:50010 > >>> > >>> ------------- > >>> w-0 datanode log > >>> ------------- > >>> 2015-07-27 14:11:25,019 ERROR datanode.DataNode > >>> (DataXceiver.java:run(278)) - > >>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing > unknown > >>> operation src: /127.0.0.1:47993 dst: /127.0.0.1:50010 > >>> java.io.EOFException > >>> at java.io.DataInputStream.readShort(DataInputStream.java:315) > >>> at > >>> > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) > >>> at > >>> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227) > >>> at java.lang.Thread.run(Thread.java:745) > >>> 2015-07-27 14:11:25,077 INFO DataNode.clienttrace > >>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest: > >>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073742631, srvID: > >>> a5eea5a8-5112-46da-9f18-64274486c472, success: true > >>> > >>> > >>> ----------------------------- > >>> Thank you in advance, > >>> > >>> Adrià > >>> > >>> > >> > >> >
