I rechecked this, and maybe it's not so bad.
I read the notes of the lock, like this:
Acquires the lock if it is not held by another thread and returns immediately,
setting the lock hold count to one.
If the current thread already holds the lock then the hold count is incremented
by one and the method returns immediately.
If the lock is held by another thread then the current thread becomes disabled
for thread scheduling purposes and lies dormant until the lock has been
acquired, at which time the lock hold count is set to one.
Specified by: lock() in Lock
Put op calling lock() like this:
this.cacheFlusher.reclaimMemStoreMemory();
So, it's still locked by cacheflusher. If so, it's locked by the same thread
(cacheFlusher), and can be locked at the same time with flushRegion.
I'm not so familiar with ReentrantLock, please check if I'm write. If not, this
is a critical priority issue.
Zhou Shuaifeng(Frank)
-----邮件原件-----
发件人: [email protected] [mailto:[email protected]] 代表 Stack
发送时间: 2011年4月29日 11:48
收件人: [email protected]
抄送: Yanlijun; Chenjian
主题: Re: found one deadlock on hbase?
Yes. The below looks viable (though strange we have not seen it up to
this). The profiler may have slowed things to bring on the deadlock
-- or the run up to the high water mark -- but its still a deadlock.
Please file a critical priority issue.
If you have a patch, that'd be excellent.
Thanks for digging in on this,
St.Ack
2011/4/28 Zhoushuaifeng <[email protected]>:
> Thanks, I will do more test.
> Maybe the deadlock hapened like this? Please point it out if it's wrong.
>
> 1,One handler is handling put op, and reclaimMemStoreMemory, but the memory
> is isAboveHighWaterMark, so this handler locked the memstoreflusher until
> global mem is lower:
>
> public synchronized void reclaimMemStoreMemory() {
> if (isAboveHighWaterMark()) {
> lock.lock();
> try {
> while (isAboveHighWaterMark() && !server.isStopped()) {
> wakeupFlushThread();
> try {
> // we should be able to wait forever, but we've seen a bug where
> // we miss a notify, so put a 5 second bound on it at least.
> flushOccurred.await(5, TimeUnit.SECONDS);
> } catch (InterruptedException ie) {
> Thread.currentThread().interrupt();
> }
> }
> } finally {
> lock.unlock();
>
> 2, flushforGlobalPressure is trigered, but to flush the memstore, it needed
> to lock the memstoreflusher:
>
> private boolean flushRegion(final HRegion region, final boolean
> emergencyFlush) {
> synchronized (this.regionsInQueue) {
> FlushRegionEntry fqe = this.regionsInQueue.remove(region);
> if (fqe != null && emergencyFlush) {
> // Need to remove from region from delay queue. When NOT an
> // emergencyFlush, then item was removed via a flushQueue.poll.
> flushQueue.remove(fqe);
> }
> lock.lock();
> }
>
> 3, because lock is locked by the ipchandler of put op, the flushRegion will
> never get the lock and flush will never happen.
> 4, no flush, memory stay in AboveHighWaterMark state, and never unlock, so,
> deadlock happend.
>
> Is it right?
>
> Zhou Shuaifeng(Frank)
>