Stephen Yuan Jiang created HBASE-14016: ------------------------------------------
Summary: Procedure V2: NPE in a delete table follow by create table closely Key: HBASE-14016 URL: https://issues.apache.org/jira/browse/HBASE-14016 Project: HBase Issue Type: Bug Components: proc-v2 Affects Versions: 1.1.1, 2.0.0, 1.2.0, 1.3.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang In our internal test for HBASE 1.1, we found a race condition that delete table followed by create table closely would leak zk lock due to NPE in ProcedureFairRunQueues {noformat} Exception in thread "ProcedureExecutorThread-0" java.lang.NullPointerException at org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674) {noformat} Here is the code that cause the race condition: {code} protected boolean markTableAsDeleted(final TableName table) { TableRunQueue queue = getRunQueue(table); if (queue != null) { ... if (queue.isEmpty() && !queue.isLocked()) { fairq.remove(table); ... } public boolean tryWrite(final TableLockManager lockManager, final TableName tableName, final String purpose) { ... tableLock = lockManager.writeLock(tableName, purpose); try { tableLock.acquire(); ... wlock = true; ... } {code} The root cause is: wlock is set too late and not protect the queue be deleted. - Thread 1: create table is running; queue is empty - tryWrite() acquire the lock (now wlock is still false) - Thread 2: markTableAsDeleted see the queue empty and wlock= false - Thread 1: set wlock=true - too late - Thread 2: delete the queue - Thread 1: never able to release the lock - NPE trying to get queue -- This message was sent by Atlassian JIRA (v6.3.4#6332)