Nick Dimiduk created HBASE-24293:
------------------------------------

             Summary: Assignment manager should never give up assigning meta
                 Key: HBASE-24293
                 URL: https://issues.apache.org/jira/browse/HBASE-24293
             Project: HBase
          Issue Type: Bug
          Components: master, Region Assignment
    Affects Versions: 2.3.0
            Reporter: Nick Dimiduk


Not yet sure how we got here, but,

{noformat}
2020-04-29 22:39:16,140 INFO 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: pid=308, 
state=RUNNABLE:SERVER_CRASH_ASSIGN_META, locked=true; ServerCrashProcedure 
server= host-a.example.com,16020,1588033841562, splitWal=true, meta=true found 
a region state=OFFLINE, location=null, table=hbase:meta, region=1588230740 
which is no longer on us host-a.example.com,16020,1588033841562, give up 
assigning...
{noformat}

Assignment manager gives up on this procedure and nothing can progress. Manual 
intervention is necessary.

>From this [conditional 
>block|https://github.com/apache/hbase/blob/1415a82d41a1e125440014a4b23364371b30d065/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L475],
> it seems the {{regionNode}} location is {{null}}.

{noformat}
        // This is possible, as when a server is dead, TRSP will fail to 
schedule a RemoteProcedure
        // to us and then try to assign the region to a new RS. And before it 
has updated the region
        // location to the new RS, we may have already called the 
am.getRegionsOnServer so we will
        // consider the region is still on us. And then before we arrive here, 
the TRSP could have
        // updated the region location, or even finished itself, so the region 
is no longer on us
        // any more, we should not try to assign it again. Please see 
HBASE-23594 for more details.
        if (!serverName.equals(regionNode.getRegionLocation())) {
          LOG.info("{} found a region {} which is no longer on us {}, give up 
assigning...", this,
            regionNode, serverName);
          continue;
        }
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to