Re: Blog post series on "Evolution of Region assignment in HBase architecture"

Viraj Jasani Fri, 08 Oct 2021 00:14:58 -0700

We have the "Part 3" of the blog series published.
Thanks to the co-writers: Duo Zhang and Andrew Purtell.


Part 3:
https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-3-e03b814ae92

On Mon, Sep 13, 2021 at 10:53 PM Viraj Jasani <vjas...@apache.org> wrote:

> Thanks Duo for your offer to coordinate on writing "Part 3" of this
> series, sounds great!
> Although I see TRSP#assign being used by SCP directly while assigning the
> regions, I am yet to take a detailed look into HBASE-20881
> <https://issues.apache.org/jira/browse/HBASE-20881> and the relevant
> work. Let me reach out to you over Slack and we can take it from there.
>
> On Sun, Sep 12, 2021 at 7:02 PM 张铎(Duo Zhang) <palomino...@gmail.com>
> wrote:
>
>> Thank you Viraj and Andrew, the blog posts are outstanding!
>>
>> And I think we'd better have a part 3, about the ServerCrashProcedure(SCP)
>> :)
>>
>> In 2.0 and 2.1, we use MoveRegionProcedure, AssignRegionProcedure and
>> UnassignRegionProcedure, and one of the reasons why we removed them all
>> and
>> introduced a single TRSP to do assign/unassign/move/reopen, is because of
>> SCP.
>>
>> If a region server crashed, obviously, we can not assign regions to it any
>> more, so we should have a way to stop the procedure which are still trying
>> to assign regions to the dead server. And even for unassigning a region,
>> we
>> still need to make it online first and then unassign it. For example, when
>> disabling a table, we must make sure that all the data in memstore have
>> been flushed to storage, so we will need make it online, and then do a
>> clean close.
>> In 2.0 and 2.1, we had 3 procedures for region assignment, and there were
>> lots of corner cases when we want to interrupt them from SCP, which made
>> the code really hard to understand and buggy. So finally, we introduced a
>> TRSP to replace them all. So SCP only needs to interrupt one type of
>> procedure.
>>
>> This is the story :)
>>
>> I could help if you guys want to write the part 3 about SCP :)
>>
>> Thanks.
>>
>> Viraj Jasani <vjas...@apache.org> 于2021年9月8日周三 上午2:27写道：
>>
>> > As some of the HBase users are still running HBase 1.x versions in their
>> > production environment, and branch-1 is trending toward EOL, now is
>> really
>> > the right time to evaluate as well as understand the features and core
>> > design changes provided by HBase 2.x versions.
>> >
>> > As the majority of us are already aware, one of the key features with
>> > significant architectural changes provided by HBase 2 is
>> > AssignmentManagerV2 (AMv2).
>> > However, we don't seem to have one place explaining 1) *the evolution
>> > of AM* and
>> > 2) how it manages region assignments with better scalability,
>> reliability
>> > and fault-tolerance.
>> > Keeping this in mind, Andrew and I have published a series of two-part
>> blog
>> > posts explaining this evolution. Part 1 provides a) some basic
>> introduction
>> > to HBase concepts, and b) AM and it's shortcomings from previous
>> versions
>> > that AMv2 is trying to resolve. Part 2 provides detailed info about Pv2
>> and
>> > how AMv2 leverages it, and also state diagrams explaining some of the
>> > complex region assignment workflows. The intention of state diagrams is
>> for
>> > dev/users to be able to a) understand region assignment workflows
>> in-depth,
>> > b) easier code walk-through and c) debug and root cause issues with
>> > better knowledge.
>> >
>> > Part 1:
>> >
>> >
>> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-1-c43b1becc522
>> > Part 2:
>> >
>> >
>> https://engineering.salesforce.com/evolution-of-region-assignment-in-the-apache-hbase-architecture-part-2-9568fb3790b
>> >
>>
>

Re: Blog post series on "Evolution of Region assignment in HBase architecture"

Reply via email to