Thanks for your response Wellington. hbck2 assigns method does not work here unfortunately due to lack of table descriptor, both in meta table and in-memory. The actual table and most of the regions that table had been dropped successfully. When you try to assign those remaining ghost regions, they are stuck as on HBASE-22780.
One way to get rid of those regions is to create a new table with its old name. Suppose you have 4 ghost regions. If you create a 1-region table, those 4 ghosts go after that 1 region composing a 5-regioned table. After that you are able to disable that table and drop the table successfully. On the contrary, as we have many tables & regions like this, it is so hard to explore them. To cut a long story short, the hmaster is assuming its in-memory representation of the meta table is intact, but in fact it is not. I need a way to force all masters to rebuild their in-memory representations from clean hbase:meta table. Does a rolling restart of all masters do that or do I have to shut all masters down to force them to proceed with initialization on startup? Wellington Chevreuil <[email protected]>, 4 Ağu 2020 Sal, 16:42 tarihinde şunu yazdı: > > > > if you use hbck2 to bypass > > those locks but leave them as they are, it would be only a cosmetic move, > > regions won't become online in real > > > You can use hbck2 *assigns *method to bring those regions online (it > accepts multiple regions as input) > > i've read that master processes have some in-memory representation > > of hbase:meta table > > > Yes, masters read meta table only during initialisation, from there > onwards, since every change to meta is orchestrated by the active master, > it assumes its in-memory representation of meta table is the truth. What > exactly steps had you followed when you say you had dropped those ghost > regions? If that means any manual deletion of region dirs/files in hdfs, or > direct manipulation of meta table via client API, then that explains the > master inconsistency. > > > Em ter., 4 de ago. de 2020 às 12:51, jackie macmillian < > [email protected]> escreveu: > > > Hi all, > > > > we have a cluster with hbase 2.2.0 installed on hadoop 2.9.2. > > a few weeks ago, we had some issues on our active/standby namenode > > selection due to some network problems and their zkfc services' > competition > > to select the active namenode. as a result, both our namenodes became > > active for a short time and all region server services restarted > > themselves. we achieved to solve that issue with some arrangements on > > timeout parameters. but the story began afterwards. > > after the region servers completed their reset tasks, we saw that all our > > hbase tables became unstable. for example, think about a 200 regions-wide > > table. 196 regions of that table got online, but 4 regions stuck at an > > intermediate state like closing/opening. at the end, the tables stuck at > > disabling/enabling states. on the other hand, hbase had lots of procedure > > locks and masterprocwals directory kept enlarging. > > to overcome that issue, i used hbck2 to release stuck regions and once i > > managed to enable the table, i created an empty copy of that table from > its > > descriptor and bulk loaded all hfiles of that corrupt table to the new > one. > > at this point, you would ask why i did not use that enabled table. i > > couldn't because although i was able to bypass the locked procedures > there > > were so many of them to resolve one by one. if you use hbck2 to bypass > > those locks but leave them as they are, it would be only a cosmetic move, > > regions won't become online in real. so i thought it would be much more > > faster to create a brand new one and load all the data to that table. > bulk > > load was successful and the new table became online and scannable. the > next > > point was to disable the old one and drop it. but, as hmaster was dealing > > lots of locks and procedures, i wasn't able to disable the old table. > some > > regions remain in disabling state again. so i decided to set that table's > > state to disabled with hbck2 and then i succeeded to drop them. > > after i put all my tables to online and all my old tables dropped > > successfully, masterprocwals was the last stop to a clean hbase, i > thought > > :) i moved aside masterprocwals directory and restarted the active > master. > > the new master took control and voila! master procedures & locks became > > clear, and all my tables were online as needed! i scanned hbase:meta > table > > and saw there is no other regions than the ones online. > > until now.. remember those regions who were stuck and forced to close to > > disable and drop the tables? when a region server is crashed and > restarted > > for some reason now, those regions are tried to be assigned by the master > > to region servers. but region servers decline that assignment as there is > > no table descriptor for those regions. take a look at HBASE-22780 > > <https://issues.apache.org/jira/browse/HBASE-22780>. exactly the same > > problem is issued here. > > i tried to create a 1-regioned table with the same name as the old table. > > it succeeded. and the ghost region followed that table. then disabled and > > dropped them again successfully. and again explored that hbase:meta > doesn't > > have that region anymore. but after a region server crash it comes again > > from nowhere. so i figured out that when a region server comes down > hmaster > > does not read hbase:meta table to assign that server's regions to other > > servers. i've read that master processes have some in-memory > representation > > of hbase:meta table in order to perform assignment issues as fast as > > possible. i would clean hbase:meta from those ghost regions as explained, > > but i have to force the masters to get this clean copy of hbase:meta to > > their in-memory representations. how can i achieve that? assume that i > have > > cleared meta table and now what? rolling restart of hmasters? do standby > > masters share the same in-memory meta table with the active one? if > that's > > the case i think rolling restart wouldn't solve that problem.. or should > i > > shut all masters down and then start them again in order to force them to > > rebuild their in-memories from meta table? > > any helps would be appreciated. > > thank you for your patience :) > > > > jackie > > >
