Hey V, What does 'hbase hbck' say?
Also, if you do a 'hadoop dfs -lsr /hbase/<tablename>/689e3ae01fe8f6ffe0591a0078fbe362' (the encoded name of the unassigned daughter) are there large files present or a regioninfo and no files? Could you also share the lsr of the other encoded dirs? ( /hbase/<tablename>/bedc64dd9c56f8e072e745a3cbedc2d1 /hbase/<tablename>/277007da50d04a9d71dad94de32ad876 Thanks, Jon. On Wed, Sep 21, 2011 at 10:07 AM, Vidhyashankar Venkataraman < [email protected]> wrote: > I pored over a few JIRAs and this looks like an issue many of you might > have seen already. I am not sure. Do let me know if you guys have. > > We are currently having some problems with our cluster. I had pointed it > out briefly in a mail titled "Unassigned holes in tables". > > We use a patched version of Hbase 0.90.0. > Anyways, we have started to observe some (5 out of 30K regions to be > exact) unassigned regions in our tables. Certain observations we have made > so far: > > 1. All these regions show up in META. They are all daughter regions after > a split had happened. The parent region shows up in META as offlined but > having a serverinfo entry. And one daughter region is assigned correctly. > The other daughter region is the one in trouble. I have provided an example > below. > 2. Please note that these are not the only splits that happened in this > while. (Splits are disabled by setting a large max file size but sometimes > some of our regions do hit these sizes). > 3. Call the parent region P. Assigned daughter region SPLIT1 and the > unassigned region SPLIT2. I can see that the master has assigned SPLIT1 from > the logs but I see no trace of SPLIT2. I do not see P also being assigned by > the master. > 4. These problems do not go away even after restarting Hbase (and once > along with ZK) which seems to bother me. Doesn't the master do the region > assignment by scanning the META table periodically? The master logs show no > semblance of these regions. (I can see the other daughter region though). > > META scan of sample problem regions: (purely as an illustration). > > PARENT: > column=info:regioninfo, timestamp=1315958212365, value=REGION => {NAME => > 'WCC,BLAH1.X.Y', STARTKEY => 'BLAH1', ENDKEY => 'BLAH3', ENCODED => > bedc64dd9c56f8e072e745a3cbedc2d1, OFFLINE => true, SPLIT => true, TABLE => > {BLAH}} > > column=info:server, timestamp=1314142909658, value=<node1>:<port> > > column=info:serverstartcode, timestamp=1314142909658, value=1314141232997 > > > DAUGHTER 1 info: (Note that there isnt any regionserver info and such) > column=info:splitA, timestamp=1314196650274, value=REGION => {NAME => > 'WCC,BLAH1.X1.Y1', STARTKEY => 'BLAH1', ENDKEY => 'BLAH2', ENCODED => > 689e3ae01fe8f6ffe0591a0078fbe362, TABLE => {BLAH}} > > DAUGHTER 2 info: (There is region server info here.) > column=info:regioninfo, timestamp=1315958212367, value=REGION => {NAME => > 'WCC,BLAH2.X2.Y2', STARTKEY => 'BLAH2', ENDKEY => 'BLAH3', ENCODED > => 277007da50d04a9d71dad94de32ad876, TABLE => {BLAH}} > > column=info:server, timestamp=1316205878425, value=<node2>:<port> > > column=info:serverstartcode, timestamp=1316205878425, value=1316205773600 > > > Cheers > V > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [email protected]
