Thanks. I've created HBase-4101.


On Thu, Jul 14, 2011 at 3:44 PM, lohit <[email protected]> wrote:

> Is it possible to open JIRA with full stack trace.
> Or, if you point to full stack trace one of us can open JIRA for you.
> 0.90.4 will be out soon and may be we should see if there is a fix for the
> below problem?
>
> 2011/7/14 Matt Davies <[email protected]>
>
> > Hey everyone,
> >
> > We periodically see a situation where the regionserver process exists in
> > the
> > process list, zookeeper thread sends the keepalive so the master won't
> > remove it from the active list, yet the regionserver will not serve data.
> >
> > Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an
> internal
> > testing tool.
> >
> >
> > I've taken a jstack of the process and found this:
> >
> > Found one Java-level deadlock:
> > =============================
> > "IPC Server handler 99 on 60020":
> >  waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
> >  which is held by "IPC Server handler 64 on 60020"
> > "IPC Server handler 64 on 60020":
> >  waiting for ownable synchronizer 0x00002aaab8eea130, (a
> > java.util.concurrent.locks.ReentrantLock$NonfairSync),
> >  which is held by "regionserver60020.cacheFlusher"
> > "regionserver60020.cacheFlusher":
> >  waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
> >  which is held by "IPC Server handler 64 on 60020"
> >
> > Java stack information for the threads listed above:
> > ===================================================
> > "IPC Server handler 99 on 60020":
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:434)
> >        - waiting to lock <0x00002aaab8ef07e8> (a
> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
> >        at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
> >        at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
> >        at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
> > "IPC Server handler 64 on 60020":
> >        at sun.misc.Unsafe.park(Native Method)
> >        - parking to wait for  <0x00002aaab8eea130> (a
> > java.util.concurrent.locks.ReentrantLock$NonfairSync)
> >        at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> >        at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
> >        at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
> >        at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
> >        at
> >
> >
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
> >        at
> > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:435)
> >        - locked <0x00002aaab8ef07e8> (a
> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
> >        at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
> >        at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
> >        at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
> > "regionserver60020.cacheFlusher":
> >        at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
> >        - waiting to lock <0x00002aaab8ef07e8> (a
> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
> >        at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
> >        at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
> >        at
> java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
> >        at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
> >        at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
> >        at java.security.AccessController.doPrivileged(Native Method)
> >        at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
> >        at
> > sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
> >        at
> > sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
> >        at
> >
> >
> sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80)
> >        at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
> >        at java.util.TimeZone.getDisplayName(TimeZone.java:350)
> >        at java.util.Date.toString(Date.java:1025)
> >        at java.lang.String.valueOf(String.java:2826)
> >        at java.lang.StringBuilder.append(StringBuilder.java:115)
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114)
> >        at java.lang.String.valueOf(String.java:2826)
> >        at java.lang.StringBuilder.append(StringBuilder.java:115)
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145)
> >        - locked <0x00002aaab8f2dc58> (a java.util.HashMap)
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188)
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140)
> >        - locked <0x00002aaab8894048> (a
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread)
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118)
> >        - locked <0x00002aaab8894048> (a
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread)
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:393)
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:366)
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:240)
> >
> >
> > Any ideas on how I could prevent this or let the master know about it?
> I've
> > written an app that will check all regionservers periodically for such a
> > lockup, but I can't run it constantly.
> >
> > I can provide more of the jstack if that is helpful.
> >
> > -Matt
> >
>
>
>
> --
> Have a Nice Day!
> Lohit
>

Reply via email to