[ 
https://issues.apache.org/jira/browse/HBASE-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-28271 started by Viraj Jasani.
--------------------------------------------
> Infinite waiting on lock acquisition by snapshot can result in unresponsive 
> master
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-28271
>                 URL: https://issues.apache.org/jira/browse/HBASE-28271
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 3.0.0-alpha-4, 2.4.17, 2.5.7
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>         Attachments: image.png
>
>
> When a region is stuck in transition for significant time, any attempt to 
> take snapshot on the table would keep master handler thread in forever 
> waiting state. As part of the creating snapshot on enabled or disabled table, 
> in order to get the table level lock, LockProcedure is executed but if any 
> region of the table is in transition, LockProcedure could not be executed by 
> the snapshot handler, resulting in forever waiting until the region 
> transition is completed, allowing the table level lock to be acquired by the 
> snapshot handler.
> In cases where a region stays in RIT for considerable time, if enough 
> attempts are made by the client to create snapshots on the table, it can 
> easily exhaust all handler threads, leading to potentially unresponsive 
> master. Attached a sample thread dump.
> Proposal: The snapshot handler should not stay stuck forever if it cannot 
> take table level lock, it should fail-fast.
> !image.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to