Re: Ghost Regions Problem

jackie macmillian Wed, 05 Aug 2020 00:43:48 -0700

Thanks for your response Wellington.

hbck2 assigns method does not work here unfortunately due to lack of table
descriptor, both in meta table and in-memory. The actual table and most of
the regions that table had been dropped successfully. When you try to
assign those remaining ghost regions, they are stuck as on HBASE-22780.


One way to get rid of those regions is to create a new table with its old
name. Suppose you have 4 ghost regions. If you create a 1-region table,
those 4 ghosts go after that 1 region composing a 5-regioned table. After
that you are able to disable that table and drop the table successfully. On
the contrary, as we have many tables & regions like this, it is so hard to
explore them.

To cut a long story short, the hmaster is assuming its in-memory
representation of the meta table is intact, but in fact it is not. I need a
way to force all masters to rebuild their in-memory representations from
clean hbase:meta table. Does a rolling restart of all masters do that or do
I have to shut all masters down to force them to proceed with
initialization on startup?

Wellington Chevreuil <[email protected]>, 4 Ağu 2020 Sal,
16:42 tarihinde şunu yazdı:

> >
> >  if you use hbck2 to bypass
> > those locks but leave them as they are, it would be only a cosmetic move,
> > regions won't become online in real
> >
> You can use hbck2 *assigns *method to bring those regions online (it
> accepts multiple regions as input)
>
>  i've read that master processes have some in-memory representation
> > of hbase:meta table
> >
> Yes, masters read meta table only during initialisation, from there
> onwards, since every change to meta is orchestrated by the active master,
> it assumes its in-memory representation of meta table is the truth. What
> exactly steps had you followed when you say you had dropped those ghost
> regions? If that means any manual deletion of region dirs/files in hdfs, or
> direct manipulation of meta table via client API, then that explains the
> master inconsistency.
>
>
> Em ter., 4 de ago. de 2020 às 12:51, jackie macmillian <
> [email protected]> escreveu:
>
> > Hi all,
> >
> > we have a cluster with hbase 2.2.0 installed on hadoop 2.9.2.
> > a few weeks ago, we had some issues on our active/standby namenode
> > selection due to some network problems and their zkfc services'
> competition
> > to select the active namenode. as a result, both our namenodes became
> > active for a short time and all region server services restarted
> > themselves. we achieved to solve that issue with some arrangements on
> > timeout parameters. but the story began afterwards.
> > after the region servers completed their reset tasks, we saw that all our
> > hbase tables became unstable. for example, think about a 200 regions-wide
> > table. 196 regions of that table got online, but 4 regions stuck at an
> > intermediate state like closing/opening. at the end, the tables stuck at
> > disabling/enabling states. on the other hand, hbase had lots of procedure
> > locks and masterprocwals directory kept enlarging.
> > to overcome that issue, i used hbck2 to release stuck regions and once i
> > managed to enable the table, i created an empty copy of that table from
> its
> > descriptor and bulk loaded all hfiles of that corrupt table to the new
> one.
> > at this point, you would ask why i did not use that enabled table. i
> > couldn't because although i was able to bypass the locked procedures
> there
> > were so many of them to resolve one by one. if you use hbck2 to bypass
> > those locks but leave them as they are, it would be only a cosmetic move,
> > regions won't become online in real. so i thought it would be much more
> > faster to create a brand new one and load all the data to that table.
> bulk
> > load was successful and the new table became online and scannable. the
> next
> > point was to disable the old one and drop it. but, as hmaster was dealing
> > lots of locks and procedures, i wasn't able to disable the old table.
> some
> > regions remain in disabling state again. so i decided to set that table's
> > state to disabled with hbck2 and then i succeeded to drop them.
> > after i put all my tables to online and all my old tables dropped
> > successfully, masterprocwals was the last stop to a clean hbase, i
> thought
> > :) i moved aside masterprocwals directory and restarted the active
> master.
> > the new master took control and voila! master procedures & locks became
> > clear, and all my tables were online as needed! i scanned hbase:meta
> table
> > and saw there is no other regions than the ones online.
> > until now.. remember those regions who were stuck and forced to close to
> > disable and drop the tables? when a region server is crashed and
> restarted
> > for some reason now, those regions are tried to be assigned by the master
> > to region servers. but region servers decline that assignment as there is
> > no table descriptor for those regions. take a look at HBASE-22780
> > <https://issues.apache.org/jira/browse/HBASE-22780>. exactly the same
> > problem is issued here.
> > i tried to create a 1-regioned table with the same name as the old table.
> > it succeeded. and the ghost region followed that table. then disabled and
> > dropped them again successfully. and again explored that hbase:meta
> doesn't
> > have that region anymore. but after a region server crash it comes again
> > from nowhere. so i figured out that when a region server comes down
> hmaster
> > does not read hbase:meta table to assign that server's regions to other
> > servers. i've read that master processes have some in-memory
> representation
> > of hbase:meta table in order to perform assignment issues as fast as
> > possible. i would clean hbase:meta from those ghost regions as explained,
> > but i have to force the masters to get this clean copy of hbase:meta to
> > their in-memory representations. how can i achieve that? assume that i
> have
> > cleared meta table and now what? rolling restart of hmasters? do standby
> > masters share the same in-memory meta table with the active one? if
> that's
> > the case i think rolling restart wouldn't solve that problem.. or should
> i
> > shut all masters down and then start them again in order to force them to
> > rebuild their in-memories from meta table?
> > any helps would be appreciated.
> > thank you for your patience :)
> >
> > jackie
> >
>

Re: Ghost Regions Problem

Reply via email to