stack created HBASE-14012: ----------------------------- Summary: Double Assignment and Dataloss when ServerCrashProcedure runs during Master failover Key: HBASE-14012 URL: https://issues.apache.org/jira/browse/HBASE-14012 Project: HBase Issue Type: Bug Components: master, Region Assignment Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical
ITBLL. Master comes up. It is joining a running cluster (all servers up except Master with most regions assigned out on cluster). ProcedureStore has two ServerCrashProcedures unfinished (RUNNABLE state). In SCP, we only check if failover in first step, not for every step, which means ServerCrashProcedure will run if on reload it is beyond the first step. {code} // Is master fully online? If not, yield. No processing of servers unless master is up if (!services.getAssignmentManager().isFailoverCleanupDone()) { throwProcedureYieldException("Waiting on master failover to complete"); } {code} There is no definitive logging but it looks like we start running at the assign step. The regions to assign were persisted before master crash. The regions to assign may not make sense post crash: i.e. here we double-assign. Checking. We shouldn't run until master is fully up regardless. -- This message was sent by Atlassian JIRA (v6.3.4#6332)