Thank you Viraj and Andrew, the blog posts are outstanding! And I think we'd better have a part 3, about the ServerCrashProcedure(SCP) :)
In 2.0 and 2.1, we use MoveRegionProcedure, AssignRegionProcedure and UnassignRegionProcedure, and one of the reasons why we removed them all and introduced a single TRSP to do assign/unassign/move/reopen, is because of SCP. If a region server crashed, obviously, we can not assign regions to it any more, so we should have a way to stop the procedure which are still trying to assign regions to the dead server. And even for unassigning a region, we still need to make it online first and then unassign it. For example, when disabling a table, we must make sure that all the data in memstore have been flushed to storage, so we will need make it online, and then do a clean close. In 2.0 and 2.1, we had 3 procedures for region assignment, and there were lots of corner cases when we want to interrupt them from SCP, which made the code really hard to understand and buggy. So finally, we introduced a TRSP to replace them all. So SCP only needs to interrupt one type of procedure. This is the story :) I could help if you guys want to write the part 3 about SCP :) Thanks. Viraj Jasani <vjas...@apache.org> 于2021年9月8日周三 上午2:27写道: > As some of the HBase users are still running HBase 1.x versions in their > production environment, and branch-1 is trending toward EOL, now is really > the right time to evaluate as well as understand the features and core > design changes provided by HBase 2.x versions. > > As the majority of us are already aware, one of the key features with > significant architectural changes provided by HBase 2 is > AssignmentManagerV2 (AMv2). > However, we don't seem to have one place explaining 1) *the evolution > of AM* and > 2) how it manages region assignments with better scalability, reliability > and fault-tolerance. > Keeping this in mind, Andrew and I have published a series of two-part blog > posts explaining this evolution. Part 1 provides a) some basic introduction > to HBase concepts, and b) AM and it's shortcomings from previous versions > that AMv2 is trying to resolve. Part 2 provides detailed info about Pv2 and > how AMv2 leverages it, and also state diagrams explaining some of the > complex region assignment workflows. The intention of state diagrams is for > dev/users to be able to a) understand region assignment workflows in-depth, > b) easier code walk-through and c) debug and root cause issues with > better knowledge. > > Part 1: > > https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-1-c43b1becc522 > Part 2: > > https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-2-9568fb3790b >