Region stuck in transition - Cannot repair

Henning Blohm Tue, 10 May 2016 09:23:18 -0700

While running with dfs.client.read.shortcircuit set to true I ran intoan OOM on a region server that subsequently died.


Probably this was due to too little direct memory config.

However, after bringing the cluster up again one region of a table gotstuck in transtion. More specifically the master says:

---

6400e1626085724ae20b2a6fa1914db8tt_locks,,1461919149434.6400e1626085724ae20b2a6fa1914db8.state=FAILED_CLOSE, ts=Tue May 10 17:58:29 CEST 2016 (0s ago),server=hb-desktop,16201,1462895637261

---

Running hbase hbck

I get:

---

ERROR: Region { meta =>tt_locks,,1461919149434.6400e1626085724ae20b2a6fa1914db8., hdfs =>hdfs://localhost:9000/hbase/data/default/tt_locks/6400e1626085724ae20b2a6fa1914db8,deployed => , replicaId => 0 } not deployed on any region server.ERROR: There is a hole in the region chain between and . You need tocreate a new .regioninfo and region dir in hdfs to plug the hole.

---

But the all tables are listed as "ok".

Any attempt to repair seems to have no effect. Worse, the region serveris trying like crazy to get that region opened and runs into an OOMafter a few minutes.

(It keeps saying "Started memstore flush for..." but never seems to getanywhere).

There is very little load really: 76 regions, 212 store files and Iallowed for 1.5G heap and 1.5G direct memory.

After disabling dfs.client.read.shortcircuit at least there is no OOManymore.

I have the vague suspicion that that stupid region should be simplydropped, but I have no idea how to fix this.

As we will go into production with this system shortly, any help wouldbe great!!


Thanks,
Henning

Region stuck in transition - Cannot repair

Reply via email to