Sorry its not Data class But the problem is in the use of Date class. JD had once replied to the mailing list with the heading Re: Possible dead lock :)
Regards Ram -----Original Message----- From: Ramkrishna S Vasudevan [mailto:[email protected]] Sent: Friday, July 15, 2011 9:26 AM To: [email protected] Subject: RE: Deadlocked Regionserver process Hi I think this as stack mentioned in HBASE-3830 could be due to profiler. But the problem is in the use of Data class. JD had once replied to the mailing list with the heading Re: Possible dead lock JD's reply ============================================================= I see what you are saying, and I understand the deadlock, but what escapes me is why ResourceBundle has to go touch all the classes every time to find the locale as I see 2 threads doing the same. Maybe my understanding of what it does is just poor, but I also see that you are using the yourkit profiler so it's one more variable in the equation. In any case, using a Date strikes me as odd. Using a long representing System.currentTimeMillis is usually what we do. ======================================================================= So here as per HBASE-4101 though the profiler has not run then the problem is the Date object called from the toString of the PriorityCompactionQueue. Regards Ram -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Stack Sent: Friday, July 15, 2011 3:56 AM To: [email protected] Subject: Re: Deadlocked Regionserver process Thank you. I've added below to issue. Will take a looksee. If issue, will include fix in 0.90.4. St.Ack On Thu, Jul 14, 2011 at 3:07 PM, Matt Davies <[email protected]> wrote: > We aren't profiling right now. Here's what is in the hbase-env.sh > > export TZ="US/Mountain" > export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC > -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/gc-hbase.log " > export HBASE_MANAGES_ZK=false > export HBASE_PID_DIR=/home/hadoop > export HBASE_HEAPSIZE=10240 > > Java is > java version "1.6.0_17" > Java(TM) SE Runtime Environment (build 1.6.0_17-b04) > Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode) > > We were planning an upgrade to 1.6.0_25 before we ran into this issue. > > > > On Thu, Jul 14, 2011 at 3:59 PM, Stack <[email protected]> wrote: > >> What Lohit says but also, what jvm are you running and what options >> are you feeding it? The stack trace is a little crazy (especially the >> mix in of resource bundle loading). We saw something similar over in >> HBASE-3830 when someone was running profiler. Is that what is going >> on here? >> >> Thanks, >> St.Ack >> >> On Thu, Jul 14, 2011 at 11:36 AM, Matt Davies <[email protected]> >> wrote: >> > Hey everyone, >> > >> > We periodically see a situation where the regionserver process exists in >> the >> > process list, zookeeper thread sends the keepalive so the master won't >> > remove it from the active list, yet the regionserver will not serve data. >> > >> > Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an >> internal >> > testing tool. >> > >> > >> > I've taken a jstack of the process and found this: >> > >> > Found one Java-level deadlock: >> > ============================= >> > "IPC Server handler 99 on 60020": >> > waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a >> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher), >> > which is held by "IPC Server handler 64 on 60020" >> > "IPC Server handler 64 on 60020": >> > waiting for ownable synchronizer 0x00002aaab8eea130, (a >> > java.util.concurrent.locks.ReentrantLock$NonfairSync), >> > which is held by "regionserver60020.cacheFlusher" >> > "regionserver60020.cacheFlusher": >> > waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a >> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher), >> > which is held by "IPC Server handler 64 on 60020" >> > >> > Java stack information for the threads listed above: >> > =================================================== >> > "IPC Server handler 99 on 60020": >> > at >> > >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M emStoreFlusher.java:434) >> > - waiting to lock <0x00002aaab8ef07e8> (a >> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java: 2529) >> > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) >> > at >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at >> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) >> > at >> > >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) >> > "IPC Server handler 64 on 60020": >> > at sun.misc.Unsafe.park(Native Method) >> > - parking to wait for <0x00002aaab8eea130> (a >> > java.util.concurrent.locks.ReentrantLock$NonfairSync) >> > at >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >> > at >> > >> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt( AbstractQueuedSynchronizer.java:747) >> > at >> > >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Abstract QueuedSynchronizer.java:778) >> > at >> > >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueued Synchronizer.java:1114) >> > at >> > >> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java :186) >> > at >> > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) >> > at >> > >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M emStoreFlusher.java:435) >> > - locked <0x00002aaab8ef07e8> (a >> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java: 2529) >> > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) >> > at >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at >> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) >> > at >> > >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) >> > "regionserver60020.cacheFlusher": >> > at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506) >> > - waiting to lock <0x00002aaab8ef07e8> (a >> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) >> > at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379) >> > at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292) >> > at >> java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234) >> > at java.util.ResourceBundle.getBundle(ResourceBundle.java:832) >> > at sun.util.resources.LocaleData$1.run(LocaleData.java:127) >> > at java.security.AccessController.doPrivileged(Native Method) >> > at sun.util.resources.LocaleData.getBundle(LocaleData.java:125) >> > at >> > sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97) >> > at >> > sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115) >> > at >> > >> sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:8 0) >> > at java.util.TimeZone.getDisplayNames(TimeZone.java:399) >> > at java.util.TimeZone.getDisplayName(TimeZone.java:350) >> > at java.util.Date.toString(Date.java:1025) >> > at java.lang.String.valueOf(String.java:2826) >> > at java.lang.StringBuilder.append(StringBuilder.java:115) >> > at >> > >> org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionReque st.toString(PriorityCompactionQueue.java:114) >> > at java.lang.String.valueOf(String.java:2826) >> > at java.lang.StringBuilder.append(StringBuilder.java:115) >> > at >> > >> org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQ ueue(PriorityCompactionQueue.java:145) >> > - locked <0x00002aaab8f2dc58> (a java.util.HashMap) >> > at >> > >> org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCom pactionQueue.java:188) >> > at >> > >> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co mpactSplitThread.java:140) >> > - locked <0x00002aaab8894048> (a >> > org.apache.hadoop.hbase.regionserver.CompactSplitThread) >> > at >> > >> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co mpactSplitThread.java:118) >> > - locked <0x00002aaab8894048> (a >> > org.apache.hadoop.hbase.regionserver.CompactSplitThread) >> > at >> > >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu sher.java:393) >> > at >> > >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu sher.java:366) >> > at >> > >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.jav a:240) >> > >> > >> > Any ideas on how I could prevent this or let the master know about it? >> I've >> > written an app that will check all regionservers periodically for such a >> > lockup, but I can't run it constantly. >> > >> > I can provide more of the jstack if that is helpful. >> > >> > -Matt >> > >> >
