I pored over a few JIRAs and this looks like an issue many of you might have
seen already. I am not sure. Do let me know if you guys have.
We are currently having some problems with our cluster. I had pointed it out
briefly in a mail titled "Unassigned holes in tables".
We use a patched version of Hbase 0.90.0.
Anyways, we have started to observe some (5 out of 30K regions to be exact)
unassigned regions in our tables. Certain observations we have made so far:
1. All these regions show up in META. They are all daughter regions after a
split had happened. The parent region shows up in META as offlined but having a
serverinfo entry. And one daughter region is assigned correctly. The other
daughter region is the one in trouble. I have provided an example below.
2. Please note that these are not the only splits that happened in this
while. (Splits are disabled by setting a large max file size but sometimes some
of our regions do hit these sizes).
3. Call the parent region P. Assigned daughter region SPLIT1 and the
unassigned region SPLIT2. I can see that the master has assigned SPLIT1 from
the logs but I see no trace of SPLIT2. I do not see P also being assigned by
the master.
4. These problems do not go away even after restarting Hbase (and once along
with ZK) which seems to bother me. Doesn't the master do the region assignment
by scanning the META table periodically? The master logs show no semblance of
these regions. (I can see the other daughter region though).
META scan of sample problem regions: (purely as an illustration).
PARENT:
column=info:regioninfo, timestamp=1315958212365, value=REGION => {NAME =>
'WCC,BLAH1.X.Y', STARTKEY => 'BLAH1', ENDKEY => 'BLAH3', ENCODED =>
bedc64dd9c56f8e072e745a3cbedc2d1, OFFLINE => true, SPLIT => true, TABLE =>
{BLAH}}
column=info:server, timestamp=1314142909658, value=<node1>:<port>
column=info:serverstartcode, timestamp=1314142909658, value=1314141232997
DAUGHTER 1 info: (Note that there isnt any regionserver info and such)
column=info:splitA, timestamp=1314196650274, value=REGION => {NAME =>
'WCC,BLAH1.X1.Y1', STARTKEY => 'BLAH1', ENDKEY => 'BLAH2', ENCODED =>
689e3ae01fe8f6ffe0591a0078fbe362, TABLE => {BLAH}}
DAUGHTER 2 info: (There is region server info here.)
column=info:regioninfo, timestamp=1315958212367, value=REGION => {NAME =>
'WCC,BLAH2.X2.Y2', STARTKEY => 'BLAH2', ENDKEY => 'BLAH3', ENCODED
=> 277007da50d04a9d71dad94de32ad876, TABLE => {BLAH}}
column=info:server, timestamp=1316205878425, value=<node2>:<port>
column=info:serverstartcode, timestamp=1316205878425, value=1316205773600
Cheers
V