[ https://issues.apache.org/jira/browse/HBASE-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HBASE-28271 started by Viraj Jasani. -------------------------------------------- > Infinite waiting on lock acquisition by snapshot can result in unresponsive > master > ---------------------------------------------------------------------------------- > > Key: HBASE-28271 > URL: https://issues.apache.org/jira/browse/HBASE-28271 > Project: HBase > Issue Type: Improvement > Affects Versions: 3.0.0-alpha-4, 2.4.17, 2.5.7 > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Major > Attachments: image.png > > > When a region is stuck in transition for significant time, any attempt to > take snapshot on the table would keep master handler thread in forever > waiting state. As part of the creating snapshot on enabled or disabled table, > in order to get the table level lock, LockProcedure is executed but if any > region of the table is in transition, LockProcedure could not be executed by > the snapshot handler, resulting in forever waiting until the region > transition is completed, allowing the table level lock to be acquired by the > snapshot handler. > In cases where a region stays in RIT for considerable time, if enough > attempts are made by the client to create snapshots on the table, it can > easily exhaust all handler threads, leading to potentially unresponsive > master. Attached a sample thread dump. > Proposal: The snapshot handler should not stay stuck forever if it cannot > take table level lock, it should fail-fast. > !image.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)