Add a comment to the issue Ram. Use of heavy-weight Date seems odd for sure. St.Ack
On Thu, Jul 14, 2011 at 9:33 PM, Ramkrishna S Vasudevan <[email protected]> wrote: > Sorry its not Data class > > But the problem is in the use of Date class. JD had once replied to the > mailing list with the heading Re: Possible dead lock > :) > > Regards > Ram > > -----Original Message----- > From: Ramkrishna S Vasudevan [mailto:[email protected]] > Sent: Friday, July 15, 2011 9:26 AM > To: [email protected] > Subject: RE: Deadlocked Regionserver process > > Hi > > I think this as stack mentioned in HBASE-3830 could be due to profiler. > > But the problem is in the use of Data class. JD had once replied to the > mailing list with the heading Re: Possible dead lock > > JD's reply > ============================================================= > I see what you are saying, and I understand the deadlock, but what escapes > me is why ResourceBundle has to go touch all the classes every time to find > the locale as I see 2 threads doing the same. Maybe my understanding of what > it does is just poor, but I also see that you are using the yourkit profiler > so it's one more variable in the equation. > > In any case, using a Date strikes me as odd. Using a long representing > System.currentTimeMillis is usually what we do. > ======================================================================= > So here as per HBASE-4101 though the profiler has not run then the problem > is the Date object called from the toString of the PriorityCompactionQueue. > > Regards > Ram > > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of Stack > Sent: Friday, July 15, 2011 3:56 AM > To: [email protected] > Subject: Re: Deadlocked Regionserver process > > Thank you. > > I've added below to issue. Will take a looksee. If issue, will > include fix in 0.90.4. > > St.Ack > > On Thu, Jul 14, 2011 at 3:07 PM, Matt Davies <[email protected]> wrote: >> We aren't profiling right now. Here's what is in the hbase-env.sh >> >> export TZ="US/Mountain" >> export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC >> -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintGCDetails >> -XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/gc-hbase.log " >> export HBASE_MANAGES_ZK=false >> export HBASE_PID_DIR=/home/hadoop >> export HBASE_HEAPSIZE=10240 >> >> Java is >> java version "1.6.0_17" >> Java(TM) SE Runtime Environment (build 1.6.0_17-b04) >> Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode) >> >> We were planning an upgrade to 1.6.0_25 before we ran into this issue. >> >> >> >> On Thu, Jul 14, 2011 at 3:59 PM, Stack <[email protected]> wrote: >> >>> What Lohit says but also, what jvm are you running and what options >>> are you feeding it? The stack trace is a little crazy (especially the >>> mix in of resource bundle loading). We saw something similar over in >>> HBASE-3830 when someone was running profiler. Is that what is going >>> on here? >>> >>> Thanks, >>> St.Ack >>> >>> On Thu, Jul 14, 2011 at 11:36 AM, Matt Davies <[email protected]> >>> wrote: >>> > Hey everyone, >>> > >>> > We periodically see a situation where the regionserver process exists > in >>> the >>> > process list, zookeeper thread sends the keepalive so the master won't >>> > remove it from the active list, yet the regionserver will not serve > data. >>> > >>> > Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an >>> internal >>> > testing tool. >>> > >>> > >>> > I've taken a jstack of the process and found this: >>> > >>> > Found one Java-level deadlock: >>> > ============================= >>> > "IPC Server handler 99 on 60020": >>> > waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, > a >>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher), >>> > which is held by "IPC Server handler 64 on 60020" >>> > "IPC Server handler 64 on 60020": >>> > waiting for ownable synchronizer 0x00002aaab8eea130, (a >>> > java.util.concurrent.locks.ReentrantLock$NonfairSync), >>> > which is held by "regionserver60020.cacheFlusher" >>> > "regionserver60020.cacheFlusher": >>> > waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, > a >>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher), >>> > which is held by "IPC Server handler 64 on 60020" >>> > >>> > Java stack information for the threads listed above: >>> > =================================================== >>> > "IPC Server handler 99 on 60020": >>> > at >>> > >>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M > emStoreFlusher.java:434) >>> > - waiting to lock <0x00002aaab8ef07e8> (a >>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) >>> > at >>> > >>> > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java: > 2529) >>> > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) >>> > at >>> > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl > .java:25) >>> > at java.lang.reflect.Method.invoke(Method.java:597) >>> > at >>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) >>> > at >>> > >>> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) >>> > "IPC Server handler 64 on 60020": >>> > at sun.misc.Unsafe.park(Native Method) >>> > - parking to wait for <0x00002aaab8eea130> (a >>> > java.util.concurrent.locks.ReentrantLock$NonfairSync) >>> > at >>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>> > at >>> > >>> > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt( > AbstractQueuedSynchronizer.java:747) >>> > at >>> > >>> > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Abstract > QueuedSynchronizer.java:778) >>> > at >>> > >>> > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueued > Synchronizer.java:1114) >>> > at >>> > >>> > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java > :186) >>> > at >>> > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) >>> > at >>> > >>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M > emStoreFlusher.java:435) >>> > - locked <0x00002aaab8ef07e8> (a >>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) >>> > at >>> > >>> > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java: > 2529) >>> > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) >>> > at >>> > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl > .java:25) >>> > at java.lang.reflect.Method.invoke(Method.java:597) >>> > at >>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) >>> > at >>> > >>> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) >>> > "regionserver60020.cacheFlusher": >>> > at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506) >>> > - waiting to lock <0x00002aaab8ef07e8> (a >>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) >>> > at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379) >>> > at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292) >>> > at >>> java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234) >>> > at java.util.ResourceBundle.getBundle(ResourceBundle.java:832) >>> > at sun.util.resources.LocaleData$1.run(LocaleData.java:127) >>> > at java.security.AccessController.doPrivileged(Native Method) >>> > at sun.util.resources.LocaleData.getBundle(LocaleData.java:125) >>> > at >>> > sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97) >>> > at >>> > sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115) >>> > at >>> > >>> > sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:8 > 0) >>> > at java.util.TimeZone.getDisplayNames(TimeZone.java:399) >>> > at java.util.TimeZone.getDisplayName(TimeZone.java:350) >>> > at java.util.Date.toString(Date.java:1025) >>> > at java.lang.String.valueOf(String.java:2826) >>> > at java.lang.StringBuilder.append(StringBuilder.java:115) >>> > at >>> > >>> > org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionReque > st.toString(PriorityCompactionQueue.java:114) >>> > at java.lang.String.valueOf(String.java:2826) >>> > at java.lang.StringBuilder.append(StringBuilder.java:115) >>> > at >>> > >>> > org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQ > ueue(PriorityCompactionQueue.java:145) >>> > - locked <0x00002aaab8f2dc58> (a java.util.HashMap) >>> > at >>> > >>> > org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCom > pactionQueue.java:188) >>> > at >>> > >>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co > mpactSplitThread.java:140) >>> > - locked <0x00002aaab8894048> (a >>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread) >>> > at >>> > >>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co > mpactSplitThread.java:118) >>> > - locked <0x00002aaab8894048> (a >>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread) >>> > at >>> > >>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu > sher.java:393) >>> > at >>> > >>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu > sher.java:366) >>> > at >>> > >>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.jav > a:240) >>> > >>> > >>> > Any ideas on how I could prevent this or let the master know about it? >>> I've >>> > written an app that will check all regionservers periodically for such > a >>> > lockup, but I can't run it constantly. >>> > >>> > I can provide more of the jstack if that is helpful. >>> > >>> > -Matt >>> > >>> >> > >
