Abhishek Singh Chouhan created HBASE-14889:
----------------------------------------------

             Summary: Region stuck in transition in OPEN state indefinitely in 
corner scenario
                 Key: HBASE-14889
                 URL: https://issues.apache.org/jira/browse/HBASE-14889
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.98.14
            Reporter: Abhishek Singh Chouhan


During a failure scenario when a RS dies and the bulk assigner(BA) is assigning 
its regions to others RSs, if another RS dies(on which some regions are being 
moved) on which region is in pending open state, we end up in a situation where 
two bulk assigners try to assign the same region on the Same RS.

The following happened - 
1. While one BA was opening the region the second one sees it in pending open 
state, retries and calls unassign(...) thereby sending CLOSE RPC to the RS.
2. The RS meanwhile has already opened the region, hence changing the znode 
state to RS_ZK_REGION_OPENED which triggers event on master.
3. On master after the unassign is successful we go on to deleting the znode, 
change region state to Pending open and send open RPC to RS.
4. The earlier triggered event now sees the state as Pending open and happily 
changes it to OPEN, but is unable to delete the znode which by this time is not 
in RS_ZK_REGION_OPENED state but is in M_ZK_REGION_OFFLINE state. Hence the 
region remains in transition in the OPEN state.
5. RS goes on to changing the znode states and successfully opens the region 
(changes znode state to RS_ZK_REGION_OPENED)
6. This again triggers event on master but this time since the state is OPEN 
the folloing code path is taken 
{noformat}
case RS_ZK_REGION_OPENED:
          // Should see OPENED after OPENING but possible after PENDING_OPEN.
          if (regionState == null
              || !regionState.isPendingOpenOrOpeningOnServer(sn)) {
            LOG.warn("Received OPENED for " + prettyPrintedRegionName
              + " from " + sn + " but the region isn't PENDING_OPEN/OPENING 
here: "
              + regionStates.getRegionState(encodedName));

            if (regionState != null) {
              // Close it without updating the internal region states,
              // so as not to create double assignments in unlucky scenarios
              // mentioned in OpenRegionHandler#process
              unassign(regionState.getRegion(), null, -1, null, false, sn);
            }
            return;
          }
{noformat}
We call unassign here with transitionInZK=false and state=null
7. RS closes the region but doesn't update the ZK, also state is not changed in 
master. Region remains in transition in OPEN state, when its actually closed. 
We have to restart the RS post which it opens correctly on some other RS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to