Is it possible to open JIRA with full stack trace. Or, if you point to full stack trace one of us can open JIRA for you. 0.90.4 will be out soon and may be we should see if there is a fix for the below problem?
2011/7/14 Matt Davies <[email protected]> > Hey everyone, > > We periodically see a situation where the regionserver process exists in > the > process list, zookeeper thread sends the keepalive so the master won't > remove it from the active list, yet the regionserver will not serve data. > > Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal > testing tool. > > > I've taken a jstack of the process and found this: > > Found one Java-level deadlock: > ============================= > "IPC Server handler 99 on 60020": > waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a > org.apache.hadoop.hbase.regionserver.MemStoreFlusher), > which is held by "IPC Server handler 64 on 60020" > "IPC Server handler 64 on 60020": > waiting for ownable synchronizer 0x00002aaab8eea130, (a > java.util.concurrent.locks.ReentrantLock$NonfairSync), > which is held by "regionserver60020.cacheFlusher" > "regionserver60020.cacheFlusher": > waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a > org.apache.hadoop.hbase.regionserver.MemStoreFlusher), > which is held by "IPC Server handler 64 on 60020" > > Java stack information for the threads listed above: > =================================================== > "IPC Server handler 99 on 60020": > at > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:434) > - waiting to lock <0x00002aaab8ef07e8> (a > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529) > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > "IPC Server handler 64 on 60020": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00002aaab8eea130> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) > at > > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) > at > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:435) > - locked <0x00002aaab8ef07e8> (a > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529) > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > "regionserver60020.cacheFlusher": > at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506) > - waiting to lock <0x00002aaab8ef07e8> (a > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) > at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379) > at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292) > at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234) > at java.util.ResourceBundle.getBundle(ResourceBundle.java:832) > at sun.util.resources.LocaleData$1.run(LocaleData.java:127) > at java.security.AccessController.doPrivileged(Native Method) > at sun.util.resources.LocaleData.getBundle(LocaleData.java:125) > at > sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97) > at > sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115) > at > > sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80) > at java.util.TimeZone.getDisplayNames(TimeZone.java:399) > at java.util.TimeZone.getDisplayName(TimeZone.java:350) > at java.util.Date.toString(Date.java:1025) > at java.lang.String.valueOf(String.java:2826) > at java.lang.StringBuilder.append(StringBuilder.java:115) > at > > org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114) > at java.lang.String.valueOf(String.java:2826) > at java.lang.StringBuilder.append(StringBuilder.java:115) > at > > org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145) > - locked <0x00002aaab8f2dc58> (a java.util.HashMap) > at > > org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188) > at > > org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140) > - locked <0x00002aaab8894048> (a > org.apache.hadoop.hbase.regionserver.CompactSplitThread) > at > > org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118) > - locked <0x00002aaab8894048> (a > org.apache.hadoop.hbase.regionserver.CompactSplitThread) > at > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:393) > at > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:366) > at > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:240) > > > Any ideas on how I could prevent this or let the master know about it? I've > written an app that will check all regionservers periodically for such a > lockup, but I can't run it constantly. > > I can provide more of the jstack if that is helpful. > > -Matt > -- Have a Nice Day! Lohit
