Thanks. I've created HBase-4101.
On Thu, Jul 14, 2011 at 3:44 PM, lohit <[email protected]> wrote: > Is it possible to open JIRA with full stack trace. > Or, if you point to full stack trace one of us can open JIRA for you. > 0.90.4 will be out soon and may be we should see if there is a fix for the > below problem? > > 2011/7/14 Matt Davies <[email protected]> > > > Hey everyone, > > > > We periodically see a situation where the regionserver process exists in > > the > > process list, zookeeper thread sends the keepalive so the master won't > > remove it from the active list, yet the regionserver will not serve data. > > > > Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an > internal > > testing tool. > > > > > > I've taken a jstack of the process and found this: > > > > Found one Java-level deadlock: > > ============================= > > "IPC Server handler 99 on 60020": > > waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher), > > which is held by "IPC Server handler 64 on 60020" > > "IPC Server handler 64 on 60020": > > waiting for ownable synchronizer 0x00002aaab8eea130, (a > > java.util.concurrent.locks.ReentrantLock$NonfairSync), > > which is held by "regionserver60020.cacheFlusher" > > "regionserver60020.cacheFlusher": > > waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher), > > which is held by "IPC Server handler 64 on 60020" > > > > Java stack information for the threads listed above: > > =================================================== > > "IPC Server handler 99 on 60020": > > at > > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:434) > > - waiting to lock <0x00002aaab8ef07e8> (a > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529) > > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > > at > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > > "IPC Server handler 64 on 60020": > > at sun.misc.Unsafe.park(Native Method) > > - parking to wait for <0x00002aaab8eea130> (a > > java.util.concurrent.locks.ReentrantLock$NonfairSync) > > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > > at > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) > > at > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) > > at > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) > > at > > > > > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) > > at > > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) > > at > > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:435) > > - locked <0x00002aaab8ef07e8> (a > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529) > > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > > at > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > > "regionserver60020.cacheFlusher": > > at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506) > > - waiting to lock <0x00002aaab8ef07e8> (a > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher) > > at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379) > > at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292) > > at > java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234) > > at java.util.ResourceBundle.getBundle(ResourceBundle.java:832) > > at sun.util.resources.LocaleData$1.run(LocaleData.java:127) > > at java.security.AccessController.doPrivileged(Native Method) > > at sun.util.resources.LocaleData.getBundle(LocaleData.java:125) > > at > > sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97) > > at > > sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115) > > at > > > > > sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80) > > at java.util.TimeZone.getDisplayNames(TimeZone.java:399) > > at java.util.TimeZone.getDisplayName(TimeZone.java:350) > > at java.util.Date.toString(Date.java:1025) > > at java.lang.String.valueOf(String.java:2826) > > at java.lang.StringBuilder.append(StringBuilder.java:115) > > at > > > > > org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114) > > at java.lang.String.valueOf(String.java:2826) > > at java.lang.StringBuilder.append(StringBuilder.java:115) > > at > > > > > org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145) > > - locked <0x00002aaab8f2dc58> (a java.util.HashMap) > > at > > > > > org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188) > > at > > > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140) > > - locked <0x00002aaab8894048> (a > > org.apache.hadoop.hbase.regionserver.CompactSplitThread) > > at > > > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118) > > - locked <0x00002aaab8894048> (a > > org.apache.hadoop.hbase.regionserver.CompactSplitThread) > > at > > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:393) > > at > > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:366) > > at > > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:240) > > > > > > Any ideas on how I could prevent this or let the master know about it? > I've > > written an app that will check all regionservers periodically for such a > > lockup, but I can't run it constantly. > > > > I can provide more of the jstack if that is helpful. > > > > -Matt > > > > > > -- > Have a Nice Day! > Lohit >
