Hi, I have HBase cluster running on HBase 1.0.0-cdh5.4.4 I do periodically get NotServingRegionException and I can't find the reason for such exception. It happens randomly on different tables.
*hbase hbase hbck my_weird_table* *reports*: ERROR: Region { meta => my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d., hdfs => hdfs://nameservice1/hbase/data/default/my_weird_table/19a02bdebe1cca4eae5509a62fdd217d, deployed => , replicaId => 0 } not deployed on any region server. ERROR: Region { meta => my_weird_table,4f5c0e14,1447343972523.69fa4ad7a33868e938f25e5cbdb8cd08., hdfs => hdfs://nameservice1/hbase/data/default/my_weird_table/69fa4ad7a33868e938f25e5cbdb8cd08, deployed => , replicaId => 0 } not deployed on any region server. ERROR: Region { meta => my_weird_table,b0a3cf6:,1448475527400.d4c2bda6f776be97e369371fed1ea674., hdfs => hdfs://nameservice1/hbase/data/default/my_weird_table/d4c2bda6f776be97e369371fed1ea674, deployed => , replicaId => 0 } not deployed on any region server. 16/05/06 23:03:13 INFO util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially. ERROR: There is a hole in the region chain between 4f5c0e14 and 51eb8510. You need to create a new .regioninfo and region dir in hdfs to plug the hole. ERROR: There is a hole in the region chain between 70a3d6f6 and 73332fbb. You need to create a new .regioninfo and region dir in hdfs to plug the hole. ERROR: There is a hole in the region chain between b0a3cf6: and b3333313. You need to create a new .regioninfo and region dir in hdfs to plug the hole. 16/05/06 23:03:13 INFO util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially. ERROR: Found inconsistency in table my_weird_table *Summary:* * hbase:meta is okay.* * Number of regions: 1* * Deployed on: node05.cluster.pro <http://node05.cluster.pro>,60020,1451388046169* * my_weird_table is okay.* * Number of regions: 98* * Deployed on: node01.cluster.pro <http://node01.cluster.pro>,60020,1453774572201 node02. cluster.pro <http://cluster.pro>,60020,1458087229508 node04. cluster.pro <http://cluster.pro>,60020,1447338864601 node05. cluster.pro <http://cluster.pro>,60020,1451388046169* *6 inconsistencies detected.* *Status: INCONSISTENT* then I run* hbase hbase hbck -repair my_weird_table* #### Output omitted for brevity. 6/05/06 23:09:43 INFO util.HBaseFsck: No integrity errors. We are done with this phase. Glorious. Number of live region servers: 5 Number of dead region servers: 0 Master: node04.cluster.pro,60000,1450130273717 Number of backup masters: 1 Average load: 167.8 Number of requests: 4884 Number of regions: 839 Number of regions in transition: 23 RROR: Region { meta => my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d., hdfs => hdfs://nameservice1/hbase/data/default/my_weird_table/19a02bdebe1cca4eae5509a62fdd217d, deployed => , replicaId => 0 } not deployed on any region server. Trying to fix unassigned region... 16/05/06 23:09:45 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {ENCODED => 19a02bdebe1cca4eae5509a62fdd217d, NAME => 'my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d.', STARTKEY => '70a3d6f6', ENDKEY => '73332fbb'} 16/05/06 23:09:46 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {ENCODED => 19a02bdebe1cca4eae5509a62fdd217d, NAME => 'my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d.', STARTKEY => '70a3d6f6', ENDKEY => '73332fbb'} 16/05/06 23:09:47 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {ENCODED => 19a02bdebe1cca4eae5509a62fdd217d, NAME => 'my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d.', STARTKEY => '70a3d6f6', ENDKEY => '73332fbb'} ERROR: Region { meta => my_weird_table,4f5c0e14,1447343972523.69fa4ad7a33868e938f25e5cbdb8cd08., hdfs => hdfs://nameservice1/hbase/data/default/my_weird_table/69fa4ad7a33868e938f25e5cbdb8cd08, deployed => , replicaId => 0 } not deployed on any region server. Trying to fix unassigned region... 16/05/06 23:09:48 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {ENCODED => 69fa4ad7a33868e938f25e5cbdb8cd08, NAME => 'my_weird_table,4f5c0e14,1447343972523.69fa4ad7a33868e938f25e5cbdb8cd08.', STARTKEY => '4f5c0e14', ENDKEY => '51eb8510'} 16/05/06 23:09:49 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {ENCODED => 69fa4ad7a33868e938f25e5cbdb8cd08, NAME => 'my_weird_table,4f5c0e14,1447343972523.69fa4ad7a33868e938f25e5cbdb8cd08.', STARTKEY => '4f5c0e14', ENDKEY => '51eb8510'} ERROR: Region { meta => my_weird_table,b0a3cf6:,1448475527400.d4c2bda6f776be97e369371fed1ea674., hdfs => hdfs://nameservice1/hbase/data/default/my_weird_table/d4c2bda6f776be97e369371fed1ea674, deployed => , replicaId => 0 } not deployed on any region server. Trying to fix unassigned region... 16/05/06 23:09:50 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {ENCODED => d4c2bda6f776be97e369371fed1ea674, NAME => 'my_weird_table,b0a3cf6:,1448475527400.d4c2bda6f776be97e369371fed1ea674.', STARTKEY => 'b0a3cf6:', ENDKEY => 'b3333313'} 16/05/06 23:09:51 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {ENCODED => d4c2bda6f776be97e369371fed1ea674, NAME => 'my_weird_table,b0a3cf6:,1448475527400.d4c2bda6f776be97e369371fed1ea674.', STARTKEY => 'b0a3cf6:', ENDKEY => 'b3333313'} 16/05/06 23:09:52 INFO util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially. ERROR: There is a hole in the region chain between 4f5c0e14 and 51eb8510. You need to create a new .regioninfo and region dir in hdfs to plug the hole. ERROR: There is a hole in the region chain between 70a3d6f6 and 73332fbb. You need to create a new .regioninfo and region dir in hdfs to plug the hole. ERROR: There is a hole in the region chain between b0a3cf6: and b3333313. You need to create a new .regioninfo and region dir in hdfs to plug the hole. 16/05/06 23:09:52 INFO util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially. ERROR: Found inconsistency in table my_weird_table 16/05/06 23:09:59 INFO zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble= node04.cluster.pro:2181,node01.cluster.pro:2181,node05.cluster.pro:2181 *16/05/06 23:09:59 INFO zookeeper.ClientCnxn: EventThread shut down* *Summary:* * hbase:meta is okay.* * Number of regions: 1* * Deployed on: node05.cluster.pro <http://node05.cluster.pro>,60020,1451388046169* * my_weird_table is okay.* * Number of regions: 98* * Deployed on: node01.cluster.pro <http://node01.cluster.pro>,60020,1453774572201 node02.cluster.pro <http://node02.cluster.pro>,60020,1458087229508 node04.cluster.pro <http://node04.cluster.pro>,60020,1447338864601 node05.cluster.pro <http://node05.cluster.pro>,60020,1451388046169* *6 inconsistencies detected.* *Status: INCONSISTENT* 16/05/06 23:10:00 INFO util.HBaseFsck: Sleeping 10000ms before re-checking after fix... Version: 1.0.0-cdh5.4.4 16/05/06 23:10:10 INFO util.HBaseFsck: Loading regioninfos HDFS 16/05/06 23:10:10 INFO util.HBaseFsck: Loading HBase regioninfo from HDFS... 16/05/06 23:10:10 INFO util.HBaseFsck: Checking HBase region split map from HDFS data... 16/05/06 23:10:10 INFO util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially. 16/05/06 23:10:10 INFO util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially. *16/05/06 23:10:10 INFO util.HBaseFsck: No integrity errors. We are done with this phase. Glorious.* *Number of live region servers: 5* *Number of dead region servers: 0* *Master: node04.cluster.pro <http://node04.cluster.pro>,60000,1450130273717* *Number of backup masters: 1* *Average load: 167.8* *Number of requests: 4884* *Number of regions: 839* *Number of regions in transition: 23* 16/05/06 23:10:10 INFO util.HBaseFsck: Loading regionsinfo from the hbase:meta table Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0 16/05/06 23:10:11 INFO util.HBaseFsck: getHTableDescriptors == tableNames => [my_weird_table] 1Number of Tables: 1 *Summary:* * hbase:meta is okay.16/05/06 23:10:18 INFO zookeeper.ClientCnxn: EventThread shut down* * Number of regions: 1* * Deployed on: node05.cluster.pro <http://node05.cluster.pro>,60020,1451388046169* * my_weird_table is okay.* * Number of regions: 101* * Deployed on: node01.cluster.pro <http://node01.cluster.pro>,60020,1453774572201 node02.cluster.pro <http://node02.cluster.pro>,60020,1458087229508 node03.cluster.pro <http://node03.cluster.pro>,60020,1461244112276 node04.cluster.pro <http://node04.cluster.pro>,60020,1447338864601 node05.cluster.pro <http://node05.cluster.pro>,60020,1451388046169* *0 inconsistencies detected.* *Status: OK* So, why do some regions are not served? Why does -repair helps? What makes my table to be broken and partially unavailable?