Hmm.

The profiler is enabled when you see this?

Something is way off with the last of the threads showing in your thread dump:


"regionserver60020.cacheFlusher":
               at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
               - waiting to lock <0x00007fe7cbacbd48> (a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
               at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
               at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
               at
java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
               at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
....


How is it that we are trying to get into a synchronized hbase method,
MemStoreFlusher, in the depths of an i18n call; we're trying to append
a locale-appropriate date to a String.

Something is way off?

St.Ack

2011/4/28 Zhoushuaifeng <[email protected]>:
> I rechecked this, and maybe it's not so bad.
> I read the notes of the lock, like this:
> Acquires the lock if it is not held by another thread and returns 
> immediately, setting the lock hold count to one.
>
> If the current thread already holds the lock then the hold count is 
> incremented by one and the method returns immediately.
>
> If the lock is held by another thread then the current thread becomes 
> disabled for thread scheduling purposes and lies dormant until the lock has 
> been acquired, at which time the lock hold count is set to one.
>
> Specified by: lock() in Lock
>
> Put op calling lock() like this:
>        this.cacheFlusher.reclaimMemStoreMemory();
> So, it's still locked by cacheflusher. If so, it's locked by the same thread 
> (cacheFlusher), and can be locked at the same time with flushRegion.
> I'm not so familiar with ReentrantLock, please check if I'm write. If not, 
> this is a critical priority issue.
>
>
> Zhou Shuaifeng(Frank)
>
>
> -----邮件原件-----
> 发件人: [email protected] [mailto:[email protected]] 代表 Stack
> 发送时间: 2011年4月29日 11:48
> 收件人: [email protected]
> 抄送: Yanlijun; Chenjian
> 主题: Re: found one deadlock on hbase?
>
> Yes.  The below looks viable (though strange we have not seen it up to
> this).  The profiler may have slowed things to bring on the deadlock
> -- or the run up to the high water mark -- but its still a deadlock.
> Please file a critical priority issue.
>
> If you have a patch, that'd be excellent.
>
> Thanks for digging in on this,
> St.Ack
>
>
> 2011/4/28 Zhoushuaifeng <[email protected]>:
>> Thanks, I will do more test.
>> Maybe the deadlock hapened like this? Please point it out if it's wrong.
>>
>> 1,One handler is handling put op, and reclaimMemStoreMemory, but the memory 
>> is isAboveHighWaterMark, so this handler locked the memstoreflusher until 
>> global mem is lower:
>>
>> public synchronized void reclaimMemStoreMemory() {
>>    if (isAboveHighWaterMark()) {
>>      lock.lock();
>>      try {
>>        while (isAboveHighWaterMark() && !server.isStopped()) {
>>          wakeupFlushThread();
>>          try {
>>            // we should be able to wait forever, but we've seen a bug where
>>            // we miss a notify, so put a 5 second bound on it at least.
>>            flushOccurred.await(5, TimeUnit.SECONDS);
>>          } catch (InterruptedException ie) {
>>            Thread.currentThread().interrupt();
>>          }
>>        }
>>      } finally {
>>        lock.unlock();
>>
>> 2, flushforGlobalPressure is trigered, but to flush the memstore, it needed 
>> to lock the memstoreflusher:
>>
>>  private boolean flushRegion(final HRegion region, final boolean 
>> emergencyFlush) {
>>    synchronized (this.regionsInQueue) {
>>      FlushRegionEntry fqe = this.regionsInQueue.remove(region);
>>      if (fqe != null && emergencyFlush) {
>>        // Need to remove from region from delay queue.  When NOT an
>>        // emergencyFlush, then item was removed via a flushQueue.poll.
>>        flushQueue.remove(fqe);
>>     }
>>     lock.lock();
>>    }
>>
>> 3, because lock is locked by the ipchandler of put op, the flushRegion will 
>> never get the lock and flush will never happen.
>> 4, no flush, memory stay in AboveHighWaterMark state, and never unlock, so, 
>> deadlock happend.
>>
>> Is it right?
>>
>> Zhou Shuaifeng(Frank)
>>
>

Reply via email to