RE: Deadlocked Regionserver process

Ramkrishna S Vasudevan Thu, 14 Jul 2011 21:34:22 -0700

Sorry its not Data class

But the problem is in the use of Date class.  JD had once replied to the
mailing list with the heading Re: Possible dead lock
:)


Regards
Ram

-----Original Message-----
From: Ramkrishna S Vasudevan [mailto:[email protected]] 
Sent: Friday, July 15, 2011 9:26 AM
To: [email protected]
Subject: RE: Deadlocked Regionserver process

Hi

I think this as stack mentioned in HBASE-3830 could be due to profiler.

But the problem is in the use of Data class.  JD had once replied to the
mailing list with the heading Re: Possible dead lock

JD's reply
=============================================================
I see what you are saying, and I understand the deadlock, but what escapes
me is why ResourceBundle has to go touch all the classes every time to find
the locale as I see 2 threads doing the same. Maybe my understanding of what
it does is just poor, but I also see that you are using the yourkit profiler
so it's one more variable in the equation.

In any case, using a Date strikes me as odd. Using a long representing
System.currentTimeMillis is usually what we do.
=======================================================================
So here as per HBASE-4101 though the profiler has not run then the problem
is the Date object called from the toString of the PriorityCompactionQueue.

Regards
Ram


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stack
Sent: Friday, July 15, 2011 3:56 AM
To: [email protected]
Subject: Re: Deadlocked Regionserver process

Thank you.

I've added below to issue.  Will take a looksee.  If issue, will
include fix in 0.90.4.

St.Ack

On Thu, Jul 14, 2011 at 3:07 PM, Matt Davies <[email protected]> wrote:
> We aren't profiling right now.  Here's what is in the hbase-env.sh
>
> export TZ="US/Mountain"
> export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/gc-hbase.log "
> export HBASE_MANAGES_ZK=false
> export HBASE_PID_DIR=/home/hadoop
> export HBASE_HEAPSIZE=10240
>
> Java is
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)
>
> We were planning an upgrade to 1.6.0_25 before we ran into this issue.
>
>
>
> On Thu, Jul 14, 2011 at 3:59 PM, Stack <[email protected]> wrote:
>
>> What Lohit says but also, what jvm are you running and what options
>> are you feeding it?  The stack trace is a little crazy (especially the
>> mix in of resource bundle loading).  We saw something similar over in
>> HBASE-3830 when someone was running profiler.  Is that what is going
>> on here?
>>
>> Thanks,
>> St.Ack
>>
>> On Thu, Jul 14, 2011 at 11:36 AM, Matt Davies <[email protected]>
>> wrote:
>> > Hey everyone,
>> >
>> > We periodically see a situation where the regionserver process exists
in
>> the
>> > process list, zookeeper thread sends the keepalive so the master won't
>> > remove it from the active list, yet the regionserver will not serve
data.
>> >
>> > Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an
>> internal
>> > testing tool.
>> >
>> >
>> > I've taken a jstack of the process and found this:
>> >
>> > Found one Java-level deadlock:
>> > =============================
>> > "IPC Server handler 99 on 60020":
>> >  waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8,
a
>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
>> >  which is held by "IPC Server handler 64 on 60020"
>> > "IPC Server handler 64 on 60020":
>> >  waiting for ownable synchronizer 0x00002aaab8eea130, (a
>> > java.util.concurrent.locks.ReentrantLock$NonfairSync),
>> >  which is held by "regionserver60020.cacheFlusher"
>> > "regionserver60020.cacheFlusher":
>> >  waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8,
a
>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
>> >  which is held by "IPC Server handler 64 on 60020"
>> >
>> > Java stack information for the threads listed above:
>> > ===================================================
>> > "IPC Server handler 99 on 60020":
>> >        at
>> >
>>
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M
emStoreFlusher.java:434)
>> >        - waiting to lock <0x00002aaab8ef07e8> (a
>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
>> >        at
>> >
>>
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:
2529)
>> >        at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
>> >        at
>> >
>>
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
>> >        at java.lang.reflect.Method.invoke(Method.java:597)
>> >        at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>> >        at
>> >
>>
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
>> > "IPC Server handler 64 on 60020":
>> >        at sun.misc.Unsafe.park(Native Method)
>> >        - parking to wait for  <0x00002aaab8eea130> (a
>> > java.util.concurrent.locks.ReentrantLock$NonfairSync)
>> >        at
>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>> >        at
>> >
>>
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(
AbstractQueuedSynchronizer.java:747)
>> >        at
>> >
>>
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Abstract
QueuedSynchronizer.java:778)
>> >        at
>> >
>>
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueued
Synchronizer.java:1114)
>> >        at
>> >
>>
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java
:186)
>> >        at
>> > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>> >        at
>> >
>>
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M
emStoreFlusher.java:435)
>> >        - locked <0x00002aaab8ef07e8> (a
>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
>> >        at
>> >
>>
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:
2529)
>> >        at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
>> >        at
>> >
>>
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
>> >        at java.lang.reflect.Method.invoke(Method.java:597)
>> >        at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>> >        at
>> >
>>
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
>> > "regionserver60020.cacheFlusher":
>> >        at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
>> >        - waiting to lock <0x00002aaab8ef07e8> (a
>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
>> >        at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
>> >        at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
>> >        at
>> java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
>> >        at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
>> >        at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
>> >        at java.security.AccessController.doPrivileged(Native Method)
>> >        at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
>> >        at
>> > sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
>> >        at
>> > sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
>> >        at
>> >
>>
sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:8
0)
>> >        at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
>> >        at java.util.TimeZone.getDisplayName(TimeZone.java:350)
>> >        at java.util.Date.toString(Date.java:1025)
>> >        at java.lang.String.valueOf(String.java:2826)
>> >        at java.lang.StringBuilder.append(StringBuilder.java:115)
>> >        at
>> >
>>
org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionReque
st.toString(PriorityCompactionQueue.java:114)
>> >        at java.lang.String.valueOf(String.java:2826)
>> >        at java.lang.StringBuilder.append(StringBuilder.java:115)
>> >        at
>> >
>>
org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQ
ueue(PriorityCompactionQueue.java:145)
>> >        - locked <0x00002aaab8f2dc58> (a java.util.HashMap)
>> >        at
>> >
>>
org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCom
pactionQueue.java:188)
>> >        at
>> >
>>
org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co
mpactSplitThread.java:140)
>> >        - locked <0x00002aaab8894048> (a
>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread)
>> >        at
>> >
>>
org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co
mpactSplitThread.java:118)
>> >        - locked <0x00002aaab8894048> (a
>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread)
>> >        at
>> >
>>
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
sher.java:393)
>> >        at
>> >
>>
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
sher.java:366)
>> >        at
>> >
>>
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.jav
a:240)
>> >
>> >
>> > Any ideas on how I could prevent this or let the master know about it?
>> I've
>> > written an app that will check all regionservers periodically for such
a
>> > lockup, but I can't run it constantly.
>> >
>> > I can provide more of the jstack if that is helpful.
>> >
>> > -Matt
>> >
>>
>

RE: Deadlocked Regionserver process

Reply via email to