[ https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matteo Bertozzi resolved HBASE-14016. ------------------------------------- Resolution: Duplicate sorry closing as duplicate of HBASE-14017 (we don't need a full lock) > Procedure V2: NPE in a delete table follow by create table closely > ------------------------------------------------------------------ > > Key: HBASE-14016 > URL: https://issues.apache.org/jira/browse/HBASE-14016 > Project: HBase > Issue Type: Bug > Components: proc-v2 > Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0 > Reporter: Stephen Yuan Jiang > Assignee: Stephen Yuan Jiang > > In our internal test for HBASE 1.1, we found a race condition that delete > table followed by create table closely would leak zk lock due to NPE in > ProcedureFairRunQueues > {noformat} > Exception in thread "ProcedureExecutorThread-0" java.lang.NullPointerException > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279) > at > org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280) > at > org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674) > {noformat} > Here is the code that cause the race condition: > {code} > protected boolean markTableAsDeleted(final TableName table) { > TableRunQueue queue = getRunQueue(table); > if (queue != null) { > ... > if (queue.isEmpty() && !queue.isLocked()) { > fairq.remove(table); > ... > } > public boolean tryWrite(final TableLockManager lockManager, > final TableName tableName, final String purpose) { > ... > tableLock = lockManager.writeLock(tableName, purpose); > try { > tableLock.acquire(); > ... > wlock = true; > ... > } > {code} > The root cause is: wlock is set too late and not protect the queue be deleted. > - Thread 1: create table is running; queue is empty - tryWrite() acquire the > lock (now wlock is still false) > - Thread 2: markTableAsDeleted see the queue empty and wlock= false > - Thread 1: set wlock=true - too late > - Thread 2: delete the queue > - Thread 1: never able to release the lock - NPE trying to get queue -- This message was sent by Atlassian JIRA (v6.3.4#6332)