Hey V,

What does 'hbase hbck' say?

Also, if you do a 'hadoop dfs -lsr
/hbase/<tablename>/689e3ae01fe8f6ffe0591a0078fbe362'  (the encoded name of
the unassigned daughter) are there large files present or a regioninfo and
no files?

Could you also share the lsr of the other encoded dirs? (
/hbase/<tablename>/bedc64dd9c56f8e072e745a3cbedc2d1
/hbase/<tablename>/277007da50d04a9d71dad94de32ad876

Thanks,
Jon.

On Wed, Sep 21, 2011 at 10:07 AM, Vidhyashankar Venkataraman <
[email protected]> wrote:

> I pored over a few JIRAs and this looks like an issue many of you might
> have seen already. I am not sure. Do let me know if you guys have.
>
>  We are currently having some problems with our cluster. I had pointed it
> out briefly in a mail titled "Unassigned holes in tables".
>
> We use a patched version of Hbase 0.90.0.
>   Anyways, we have started to observe some (5 out of 30K regions to be
> exact) unassigned regions in our tables. Certain observations we have made
> so far:
>
>  1.  All these regions show up in META. They are all daughter regions after
> a split had happened. The parent region shows up in META as offlined but
> having a serverinfo entry. And one daughter region is assigned correctly.
> The other daughter region is the one in trouble. I have provided an example
> below.
>  2.  Please note that these are not the only splits that happened in this
> while. (Splits are disabled by setting a large max file size but sometimes
> some of our regions do hit these sizes).
>  3.  Call the parent region P. Assigned daughter region SPLIT1 and the
> unassigned region SPLIT2. I can see that the master has assigned SPLIT1 from
> the logs but I see no trace of SPLIT2. I do not see P also being assigned by
> the master.
>  4.  These problems do not go away even after restarting Hbase (and once
> along with ZK) which seems to bother me. Doesn't the master do the region
> assignment by scanning the META table periodically? The master logs show no
> semblance of these regions. (I can see the other daughter region though).
>
> META scan of sample problem regions: (purely as an illustration).
>
> PARENT:
> column=info:regioninfo, timestamp=1315958212365, value=REGION => {NAME =>
> 'WCC,BLAH1.X.Y', STARTKEY => 'BLAH1', ENDKEY => 'BLAH3', ENCODED =>
> bedc64dd9c56f8e072e745a3cbedc2d1, OFFLINE => true, SPLIT => true, TABLE =>
> {BLAH}}
>
>  column=info:server, timestamp=1314142909658, value=<node1>:<port>
>
>  column=info:serverstartcode, timestamp=1314142909658, value=1314141232997
>
>
> DAUGHTER 1 info: (Note that there isnt any regionserver info and such)
> column=info:splitA, timestamp=1314196650274, value=REGION => {NAME =>
> 'WCC,BLAH1.X1.Y1', STARTKEY => 'BLAH1', ENDKEY => 'BLAH2', ENCODED =>
> 689e3ae01fe8f6ffe0591a0078fbe362, TABLE => {BLAH}}
>
> DAUGHTER 2 info: (There is region server info here.)
> column=info:regioninfo, timestamp=1315958212367, value=REGION => {NAME =>
> 'WCC,BLAH2.X2.Y2', STARTKEY => 'BLAH2', ENDKEY => 'BLAH3', ENCODED
> => 277007da50d04a9d71dad94de32ad876, TABLE => {BLAH}}
>
> column=info:server, timestamp=1316205878425, value=<node2>:<port>
>
> column=info:serverstartcode, timestamp=1316205878425, value=1316205773600
>
>
> Cheers
> V
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [email protected]

Reply via email to