Viraj Jasani created HBASE-28271:
------------------------------------

             Summary: Infinite waiting on lock acquisition by snapshot can 
result in unresponsive master
                 Key: HBASE-28271
                 URL: https://issues.apache.org/jira/browse/HBASE-28271
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 2.5.7, 2.4.17, 3.0.0-alpha-4
            Reporter: Viraj Jasani
            Assignee: Viraj Jasani
         Attachments: image.png

When a region is stuck in transition for significant time, any attempt to take 
snapshot on the table would keep master handler thread in forever waiting 
state. As part of the creating snapshot on enabled or disabled table, in order 
to get the table level lock, LockProcedure is executed but if any region of the 
table is in transition, LockProcedure could not be executed by the snapshot 
handler, resulting in forever waiting until the region transition is completed, 
allowing the table level lock to be acquired by the snapshot handler.

In cases where a region stays in RIT for considerable time, if enough attempts 
are made by the client to create snapshots on the table, it can easily exhaust 
all handler threads, leading to potentially unresponsive master. Attached a 
sample thread dump.

Proposal: The snapshot handler should not stay stuck forever if it cannot take 
table level lock, it should fail-fast.

!image.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to