What Lohit says but also, what jvm are you running and what options are you feeding it? The stack trace is a little crazy (especially the mix in of resource bundle loading). We saw something similar over in HBASE-3830 when someone was running profiler. Is that what is going on here?
Thanks, St.Ack On Thu, Jul 14, 2011 at 11:36 AM, Matt Davies <[email protected]> wrote: > Hey everyone, > > We periodically see a situation where the regionserver process exists in the > process list, zookeeper thread sends the keepalive so the master won't > remove it from the active list, yet the regionserver will not serve data. > > Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal > testing tool. > > > I've taken a jstack of the process and found this: > > Found one Java-level deadlock: > ============================= > "IPC Server handler 99 on 60020": > waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a > org.apache.hadoop.hbase.regionserver.MemStoreFlusher), > which is held by "IPC Server handler 64 on 60020" > "IPC Server handler 64 on 60020": > waiting for ownable synchronizer 0x00002aaab8eea130, (a > java.util.concurrent.locks.ReentrantLock$NonfairSync), > which is held by "regionserver60020.cacheFlusher" > "regionserver60020.cacheFlusher": > waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a > org.apache.hadoop.hbase.regionserver.MemStoreFlusher), > which is held by "IPC Server handler 64 on 60020" > > Java stack information for the threads listed above: > =================================================== > "IPC Server handler 99 on 60020": > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:434) > - waiting to lock <0x00002aaab8ef07e8> (a > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529) > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > "IPC Server handler 64 on 60020": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00002aaab8eea130> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:435) > - locked <0x00002aaab8ef07e8> (a > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529) > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > "regionserver60020.cacheFlusher": > at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506) > - waiting to lock <0x00002aaab8ef07e8> (a > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) > at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379) > at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292) > at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234) > at java.util.ResourceBundle.getBundle(ResourceBundle.java:832) > at sun.util.resources.LocaleData$1.run(LocaleData.java:127) > at java.security.AccessController.doPrivileged(Native Method) > at sun.util.resources.LocaleData.getBundle(LocaleData.java:125) > at > sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97) > at > sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115) > at > sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80) > at java.util.TimeZone.getDisplayNames(TimeZone.java:399) > at java.util.TimeZone.getDisplayName(TimeZone.java:350) > at java.util.Date.toString(Date.java:1025) > at java.lang.String.valueOf(String.java:2826) > at java.lang.StringBuilder.append(StringBuilder.java:115) > at > org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114) > at java.lang.String.valueOf(String.java:2826) > at java.lang.StringBuilder.append(StringBuilder.java:115) > at > org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145) > - locked <0x00002aaab8f2dc58> (a java.util.HashMap) > at > org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188) > at > org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140) > - locked <0x00002aaab8894048> (a > org.apache.hadoop.hbase.regionserver.CompactSplitThread) > at > org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118) > - locked <0x00002aaab8894048> (a > org.apache.hadoop.hbase.regionserver.CompactSplitThread) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:393) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:366) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:240) > > > Any ideas on how I could prevent this or let the master know about it? I've > written an app that will check all regionservers periodically for such a > lockup, but I can't run it constantly. > > I can provide more of the jstack if that is helpful. > > -Matt >
